4.8 Article

CoLoRd: compressing long reads

Journal

NATURE METHODS
Volume 19, Issue 4, Pages 441-+

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41592-022-01432-3

Keywords

-

Funding

  1. National Science Centre, Poland [DEC-2019/33/B/ST6/02040]
  2. US National Institutes of Health [R01HG010040, U01HG010971, U41HG010972]

Ask authors/readers for more resources

The cost of maintaining a large amount of data generated by third-generation sequencing has become a significant concern in genomic research. Existing algorithms for compressing long reads have only a slight advantage over general-purpose gzip. In this study, we introduce CoLoRd, an algorithm that can significantly reduce the size of third-generation sequencing data without compromising the accuracy of downstream analyses.
The cost of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomic research. In spite of the increasing popularity of third-generation sequencing, the existing algorithms for compressing long reads exhibit a minor advantage over the general-purpose gzip. We present CoLoRd, an algorithm able to reduce the size of third-generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyses. CoLoRd achieves high compression rates for long-read sequencing data without affecting downstream analyses.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available