4.8 Article

paraGSEA: a scalable approach for large-scale gene expression profiling

期刊

NUCLEIC ACIDS RESEARCH
卷 45, 期 17, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkx679

关键词

-

资金

  1. National Nature Science Foundation of China [U1435222]
  2. National Natural Science Foundation of China [61540052, 61625202, 61272056]
  3. National Key R&D Program of China [2017YFB0202600, 2016YFC1302500]
  4. National Natural Science Foundation of China

向作者/读者索取更多资源

More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github. com/ysycloud/ paraGSEA.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据