4.4 Article

FEATURE SELECTION FOR DATA INTEGRATION WITH MIXED MULTIVIEW DATA

期刊

ANNALS OF APPLIED STATISTICS
卷 14, 期 4, 页码 1676-1698

出版社

INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/20-AOAS1389

关键词

Data fusion; multimodal data; integrative genomics; variable selection; Lasso/GLM Lasso; stability selection; mixed graphical models

资金

  1. NIH/NCI [T32 CA096520, 5T32CA09652011]
  2. NSF Graduate Research Fellowship Program [DGE1752814]
  3. ONR [N00014-16-1-2664]
  4. NSF [DMS-1264058, NeuroNex-1707400, DMS1554821]

向作者/读者索取更多资源

Data integration methods that analyze multiple sources of data simultaneously can often provide more holistic insights than can separate inquiries of each data source. Motivated by the advantages of data integration in the era of big data, we investigate feature selection for high-dimensional multiview data with mixed data types (e.g., continuous, binary, count-valued). This heterogeneity of multiview data poses numerous challenges for existing feature selection methods. However, after critically examining these issues through empirical and theoretically-guided lenses, we develop a practical solution, the Block Randomized Adaptive Iterative Lasso (B-RAIL) which combines the strengths of the randomized Lasso, adaptive weighting schemes and stability selection. B-RAIL serves as a versatile data integration method for sparse regression and graph selection, and we demonstrate the effectiveness of B-RAIL through extensive simulations and a case study to infer the ovarian cancer gene regulatory network. In this case study, B-RAIL successfully identifies well-known biomarkers associated with ovarian cancer and hints at novel candidates for future ovarian cancer research.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据