4.7 Article

Numerical Compression Schemes for Proteomics Mass Spectrometry Data

期刊

MOLECULAR & CELLULAR PROTEOMICS
卷 13, 期 6, 页码 1537-1542

出版社

AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC
DOI: 10.1074/mcp.O114.037879

关键词

-

资金

  1. Swedish Research Council [2008:3356, 621-2012-3559]
  2. Swedish Foundation for Strategic Research [FFL4, RBb08-0006]
  3. Crafoord Foundation [20100892]
  4. Wallenberg Academy Fellow KAW [2012.0178]
  5. European research council [ERC-2012-StG-309831]
  6. UK Biotechnology and Biological Sciences Research Council (BBSRC) [BB/K016733/1]
  7. BBSRC [BB/I00095X/1, BB/K01997X/1]
  8. National Institute of General Medical Sciences [R01 GM087221]
  9. National Science Foundation MRI [0923536]
  10. National Human Genome Research Institute [RC2 HG005805]
  11. EU [260558]
  12. Swedish Research Council (BILS) [829-2009-6257]
  13. Mistra Biotech program
  14. Biotechnology and Biological Sciences Research Council [BB/K016733/1, BB/K01997X/1, BB/I00095X/1] Funding Source: researchfish
  15. BBSRC [BB/I00095X/1, BB/K01997X/1, BB/K016733/1] Funding Source: UKRI

向作者/读者索取更多资源

The open XML format mzML, used for representation of MS data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naive mzML representation is fourfold or even up to 18-fold larger compared with the original vendor file. In disk I/O limited setups, a larger data file also leads to longer processing times, which is a problem given the data production rates of modern mass spectrometers. In an attempt to reduce this problem, we here present a family of numerical compression algorithms called MS-Numpress, intended for efficient compression of MS data. To facilitate ease of adoption, the algorithms target the binary data in the mzML standard, and support in main proteomics tools is already available. Using a test set of 10 representative MS data files we demonstrate typical file size decreases of 90% when combined with traditional compression, as well as read time decreases of up to 50%. It is envisaged that these improvements will be beneficial for data handling within the MS community.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据