4.7 Article

mspack: efficient lossless and lossy mass spectrometry data compression

Journal

BIOINFORMATICS
Volume 37, Issue 21, Pages 3923-3925

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab636

Keywords

-

Funding

  1. Gipuzkoa Fellows award
  2. Ramon y Cajal grant

Ask authors/readers for more resources

The study introduces a new compression algorithm, mspack, for mass spectrometry data, which exploits additional redundancy to achieve higher compression ratios, supporting both lossless and lossy compression for mzML and mzXML formats. In experiments, mspack achieved an average reduction of 76% in file sizes for lossless compression and 94% for lossy compression compared to the original files. Additionally, mspack outperforms existing algorithms in compression efficiency and runtime performance.
Motivation: Mass spectrometry (MS) data, used for proteomics and metabolomics analyses, have seen considerable growth in the last years. Aiming at reducing the associated storage costs, dedicated compression algorithms for MS data have been proposed, such as MassComp and MSNumpress. However, these algorithms focus on either lossless or lossy compression, respectively, and do not exploit the additional redundancy existing across scans contained in a single file. We introduce mspack, a compression algorithm for MS data that exploits this additional redundancy and that supports both lossless and lossy compression, as well as the mzML and the legacy mzXML formats. mspack applies several preprocessing lossless transforms and optional lossy transforms with a configurable error, followed by the general purpose compressors gzip or bsc to achieve a higher compression ratio. Results: We tested mspack on several datasets generated by commonly used MS instruments. When used with the bsc compression backend, mspack achieves on average 76% smaller file sizes for lossless compression and 94% smaller file sizes for lossy compression, as compared with the original files. Lossless mspack achieves 10-60% lower file sizes than MassComp, and lossy mspack compresses 36-60% better than the lossy MSNumpress, for the same error, while exhibiting comparable accuracy and running time. Supplementary information: Supplementary data are available at Bioinformatics online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available