Jouni Sirén
@jltsiren.bsky.social
📤 137
📥 74
📝 47
Researcher at UCSC Genomics Institute. Space-efficient data structures and pangenome graphs.
Inspired by this, I decided to try if Claude could convert my GAF sorting code from C++ to Rust. It's basically external memory multi-way mergesort with integer keys, opaque values, special handling for header lines, and compressed temporary files. 1/
add a skeleton here at some point
4 months ago
1
4
0
Our latest vg release introduces GBZ v2 with better compression for sequences. I originally assumed that the total sequence length in a pangenome graph would be similar to the size of the genome. This does not hold in the full HPRC graphs due to unaligned centromeres.
loading . . .
Release vg 1.72.0 - Littlefoot · vgteam/vg
Don't forget to mark the static binary executable: chmod +x vg Docker Image: quay.io/vgteam/vg:v1.72.0 Buildable Source Tarball: vg-v1.72.0.tar.gz Includes source for vg and all submodules. Use th...
https://github.com/vgteam/vg/releases/tag/v1.72.0
4 months ago
1
7
3
reposted by
Jouni Sirén
Andrea Guarracino
4 months ago
Looking for a postdoc to build my new lab at TGen (Phoenix, AZ) focused on pangenome methods for cancer and complex disease. Full stack — from pangenome assembly and compression to association studies and somatic variant discovery. Reach out if interested!
guarracinolab.github.io#join
loading . . .
Guarracino Lab | Pangenome Research
We develop methods to build and analyze pangenomes, with applications in cancer and complex disease. Translational Genomics Research Institute, Phoenix, AZ.
https://guarracinolab.github.io#join
0
11
10
reposted by
Jouni Sirén
Heng Li
5 months ago
I am looking for a postdoc to develop high-performance algorithms in computational genomics. Email or DM me if interested. For more information, see
hlilab.github.io/vacancies
. RTs appreciated!
loading . . .
HLi Lab - Vacancies
Openings
https://hlilab.github.io/vacancies
1
44
64
reposted by
Jouni Sirén
Andrew Gallant
6 months ago
First time seeing this and it is really great!
abseil.io/fast/hints.h...
loading . . .
abseil / Performance Hints
An open-source collection of core C++ library code
https://abseil.io/fast/hints.html
0
27
5
VG will soon start adding headers to the GAF files it generates. The specifics are still uncertain, but if you maintain a GAF parser, it may be a good idea to skip lines starting with "@". Here is a draft specification for the vg flavor of GAF.
loading . . .
https://github.com/jltsiren/gbz-base/blob/main/GAF.md
7 months ago
1
1
1
reposted by
Jouni Sirén
Mohsen Zakeri
8 months ago
1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster.
github.com/mohsenzakeri...
loading . . .
GitHub - mohsenzakeri/Movi: Fast, Cache-Efficient, and Scalable Queries on Pangenomes
Fast, Cache-Efficient, and Scalable Queries on Pangenomes - mohsenzakeri/Movi
https://github.com/mohsenzakeri/Movi
1
44
26
reposted by
Jouni Sirén
Rob Patro
8 months ago
For the weekend crowd. I'm hiring a postdoc! If you're interested in algorithms, data structures and high-dimensional inference, and if you want to invent new methods for genomics and implement them in high-performance, robust and easy-to-use software, do I have a lab for you; ours!
add a skeleton here at some point
0
15
5
reposted by
Jouni Sirén
Xian Chang
8 months ago
🦒Long read giraffe is out!🦒 Mapping long reads to pangenome graphs is ~10x faster than with GraphAligner, with veeery slightly better mapping accuracy, short variant calling, and SV genotyping than GraphAligner or Minimap2
add a skeleton here at some point
1
43
22
reposted by
Jouni Sirén
Giulio Ermanno Pibiri
9 months ago
We are glad to announce that the next workshop “Data Structures in Bioinformatics” (DSB 2026) will take place in Venice, Italy, on *February 18-19*, 2026.
dsb-meeting.github.io/DSB2026/
Book the dates!
#DSB26
loading . . .
DSB 2026 Venice - February 18-19
Workshop Data Structures in Bioinformatics
https://dsb-meeting.github.io/DSB2026/
1
14
8
GBZ-base has been a side project for me for a couple of years. It's basically a GBZ graph stored in SQLite instead of a custom file format. You can convert a GBZ graph to GBZ-base quickly and then extract subgraphs around nodes / reference positions on a laptop. 1/n
loading . . .
GitHub - jltsiren/gbz-base: Prototype for an immutable pangenome graph in SQLite
Prototype for an immutable pangenome graph in SQLite - jltsiren/gbz-base
https://github.com/jltsiren/gbz-base
9 months ago
2
5
2
reposted by
Jouni Sirén
Rob Patro
10 months ago
Last talk of the day (before posters) "Lossless Pangenome Indexing Using Tag Arrays" presented by Parsa Eskandar!
#WABI25
0
11
3
There was a workshop on 25 years of the FM-index and the CSA after SEA. I would have liked to attend, but I had other commitments. The invited speakers were Giovanni Manzini and Roberto Grossi, as the other purpose of the workshop was to present them Festschrifts for their 60th birthdays. 1/6
loading . . .
SEA 2025
https://regindex.github.io/sea2025.github.io/workshop.html
10 months ago
1
9
5
A new preprint on indexing pangenome graphs using an FM-index of the haplotypes and a tag array. Joint work with Parsa Eskandar and
@benedictpaten.bsky.social
.
loading . . .
Lossless Pangenome Indexing Using Tag Arrays
Pangenome graphs represent the genomic variation by encoding multiple haplotypes within a unified graph structure. However, efficient and lossless indexing of such structures remains challenging due t...
https://www.biorxiv.org/content/10.1101/2025.05.12.653561v1
about 1 year ago
1
36
15
We use personalized references with our Giraffe aligner. Each chromosome is partitioned into a sequence of blocks. We sample the most relevant haplotypes in each block using kmer counts. Mapping to this personalized reference improves variant calling accuracy.
www.nature.com/articles/s41...
loading . . .
Personalized pangenome references - Nature Methods
This work introduces a k-mer-based approach to customizing a pangenome reference, making it more relevant to a new sample of interest. This method enhances the accuracy of genotyping small variants an...
https://www.nature.com/articles/s41592-024-02407-2
over 1 year ago
1
17
8
Coming up soon in vg: faster GAF sorting. The old algorithm was spending too much time parsing and serializing alignments. The new algorithm just deals with blobs and integer keys. With that and some algorithmic improvements, you can now expect to sort 30x short reads in 15-20 minutes on a laptop.
over 1 year ago
0
1
0
you reached the end!!
feeds!
log in