Troubleshooting common errors in assemblies of long-read metagenomes


Contact
florian.trigodet [ at ] hifmb.de

Abstract

Assessing the accuracy of long-read assemblies, especially from complex environmental metagenomes that include underrepresented organisms, is challenging. Here we benchmark four state-of-the-art long-read assembly software programs, HiCanu, hifiasm-meta, metaFlye and metaMDBG, on 21 PacBio HiFi metagenomes spanning mock communities, gut microbiomes and ocean samples. By quantifying read clipping events, in which long reads are systematically split during mapping to maximize the agreement with assembled contigs, we identify where assemblies diverge from their source reads. Our analyses reveal that long-read metagenome assemblies can include >40 errors per 100 million base pairs of assembled contigs, including multi-domain chimeras, prematurely circularized sequences, haplotyping errors, excessive repeats and phantom sequences. We provide an open-source tool and a reproducible workflow for rigorous evaluation of assembly errors, charting a path toward more reliable genome recovery from long-read metagenomes.



Item Type
Article
Authors
Divisions
Primary Division
Programs
Primary Topic
Publication Status
Published
Eprint ID
60560
DOI 10.1038/s41587-025-02971-8

Cite as
Trigodet, F. , Sachdeva, R. , Banfield, J. F. and Eren, A. M. (2026): Troubleshooting common errors in assemblies of long-read metagenomes , Nature Biotechnology, p. 10 . doi: 10.1038/s41587-025-02971-8


Download
[thumbnail of Trigodet_2026_nature_biotechnology.pdf]
Preview
PDF
Trigodet_2026_nature_biotechnology.pdf - Other

Download (6MB) | Preview

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Citation

Research Platforms
N/A


Actions
Edit Item Edit Item