4.7 Article

mspack: efficient lossless and lossy mass spectrometry data compression

期刊

BIOINFORMATICS
卷 37, 期 21, 页码 3923-3925

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab636

关键词

-

资金

  1. Gipuzkoa Fellows award
  2. Ramon y Cajal grant

向作者/读者索取更多资源

The study introduces a new compression algorithm, mspack, for mass spectrometry data, which exploits additional redundancy to achieve higher compression ratios, supporting both lossless and lossy compression for mzML and mzXML formats. In experiments, mspack achieved an average reduction of 76% in file sizes for lossless compression and 94% for lossy compression compared to the original files. Additionally, mspack outperforms existing algorithms in compression efficiency and runtime performance.
Motivation: Mass spectrometry (MS) data, used for proteomics and metabolomics analyses, have seen considerable growth in the last years. Aiming at reducing the associated storage costs, dedicated compression algorithms for MS data have been proposed, such as MassComp and MSNumpress. However, these algorithms focus on either lossless or lossy compression, respectively, and do not exploit the additional redundancy existing across scans contained in a single file. We introduce mspack, a compression algorithm for MS data that exploits this additional redundancy and that supports both lossless and lossy compression, as well as the mzML and the legacy mzXML formats. mspack applies several preprocessing lossless transforms and optional lossy transforms with a configurable error, followed by the general purpose compressors gzip or bsc to achieve a higher compression ratio. Results: We tested mspack on several datasets generated by commonly used MS instruments. When used with the bsc compression backend, mspack achieves on average 76% smaller file sizes for lossless compression and 94% smaller file sizes for lossy compression, as compared with the original files. Lossless mspack achieves 10-60% lower file sizes than MassComp, and lossy mspack compresses 36-60% better than the lossy MSNumpress, for the same error, while exhibiting comparable accuracy and running time. Supplementary information: Supplementary data are available at Bioinformatics online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据