4.7 Article

Efficient Test and Visualization of Multi-Set Intersections

期刊

SCIENTIFIC REPORTS
卷 5, 期 -, 页码 -

出版社

NATURE PUBLISHING GROUP
DOI: 10.1038/srep16923

关键词

-

资金

  1. NIH/National Institute on Aging (NIA) [R01AG046170]
  2. NIH/National Cancer Institute (NCI) [R01CA163772]
  3. NIH/National Institute of Allergy and Infectious Diseases (NIAID) [U01AI111598-01]

向作者/读者索取更多资源

Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据