Journal
NATURE METHODS
Volume 19, Issue 4, Pages 441-+Publisher
NATURE PORTFOLIO
DOI: 10.1038/s41592-022-01432-3
Keywords
-
Categories
Funding
- National Science Centre, Poland [DEC-2019/33/B/ST6/02040]
- US National Institutes of Health [R01HG010040, U01HG010971, U41HG010972]
Ask authors/readers for more resources
The cost of maintaining a large amount of data generated by third-generation sequencing has become a significant concern in genomic research. Existing algorithms for compressing long reads have only a slight advantage over general-purpose gzip. In this study, we introduce CoLoRd, an algorithm that can significantly reduce the size of third-generation sequencing data without compromising the accuracy of downstream analyses.
The cost of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomic research. In spite of the increasing popularity of third-generation sequencing, the existing algorithms for compressing long reads exhibit a minor advantage over the general-purpose gzip. We present CoLoRd, an algorithm able to reduce the size of third-generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyses. CoLoRd achieves high compression rates for long-read sequencing data without affecting downstream analyses.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available