☆ 4.7 Article

Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps

JOURNAL OF PROTEOME RESEARCH (2021)

期刊

JOURNAL OF PROTEOME RESEARCH

卷 20, 期 4, 页码 1849-1854

出版社

AMER CHEMICAL SOC

DOI: 10.1021/acs.jproteome.0c00711

关键词

sorting; false discovery rate; Percolator; peptide search; layer-ordered heap; nonparametric statistical test; partition; algorithms; performance; tandem mass spectrometry

类别

Biochemical Research Methods

资金

National Science Foundation [1845465]
Chan Zuckerberg Initiative [EOSS2-0000000115]
Div Of Biological Infrastructure
Direct For Biological Sciences [1845465] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Nonparametric statistical tests are essential in scientific experiments, but sorting becomes a bottleneck in computation. Selection and partitioning are usually performed on transformed statistical significance values due to the unknown threshold, necessitating data sorting. Layer-ordered heaps provide a solution by partially sorting values efficiently.

Nonparametric statistical tests are an integral part of scientific experiments in a diverse range of fields. When performing such tests, it is standard to sort values; however, this requires Omega(n log(n)) time to sort n values. Thus given enough data, sorting becomes the computational bottleneck, even with very optimized implementations such as the C++ standard library routine, std::sort. Frequently, a nonparametric statistical test is only used to partition values above and below a threshold in the sorted ordering, where the threshold corresponds to a significant statistical result. Linear-time selection and partitioning algorithms cannot be directly used because the selection and partitioning are performed on the transformed statistical significance values rather than on the sorted statistics. Usually, those transformed statistical significance values (e.g., the p value when investigating the family-wise error rate and q values when investigating the false discovery rate (FDR)) can only be computed at a threshold. Because this threshold is unknown, this leads to sorting the data. Layer-ordered heaps, which can be constructed in O(n), only partially sort values and thus can be used to get around the slow runtime required to fully sort. Here we introduce a layer-ordering-based method for selection and partitioning on the transformed values (e.g., p values or q values). We demonstrate the use of this method to partition peptides using an FDR threshold. This approach is applied to speed up Percolator, a postprocessing algorithm used in mass-spectrometry-based proteomics to evaluate the quality of peptide-spectrum matches (PSMs), by >70% on data sets with 100 million PSMs.

Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps

期刊

JOURNAL OF PROTEOME RESEARCH

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps

期刊

JOURNAL OF PROTEOME RESEARCH

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文