☆ 4.7 Article

Enriched Random Forest for High Dimensional Genomic Data

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

卷 19, 期 5, 页码 2817-2828

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TCBB.2021.3089417

关键词

Feature extraction; Genomics; Bioinformatics; Random forests; Proteins; Ontologies; Support vector machines; Ensemble methods; weighted random sampling; enriched random forest; high-dimensional data; genomic analyses

类别

Biochemical Research Methods Computer Science, Interdisciplinary Applications Mathematics, Interdisciplinary Applications Statistics & Probability

资金

National Heart, Lung, and Blood Institute (NHLBI), National Institutes of Health [R01-HL150065]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Enriched Random Forest is developed to enhance the performance of traditional random forest by reducing the contribution of less informative features. It improves the prediction accuracy, especially when relevant features are few.

Ensemble methods such as random forest works well on high-dimensional datasets. However, when the number of features is extremely large compared to the number of samples and the percentage of truly informative feature is very small, performance of traditional random forest decline significantly. To this end, we develop a novel approach that enhance the performance of traditional random forest by reducing the contribution of trees whose nodes are populated with less informative features. The proposed method selects eligible subsets at each node by weighted random sampling as opposed to simple random sampling in traditional random forest. We refer to this modified random forest algorithm as Enriched Random Forest. Using several high-dimensional micro-array datasets, we evaluate the performance of our approach in both regression and classification settings. In addition, we also demonstrate the effectiveness of balanced leave-one-out cross-validation to reduce computational load and decrease sample size while computing feature weights. Overall, the results indicate that enriched random forest improves the prediction accuracy of traditional random forest, especially when relevant features are very few.

Enriched Random Forest for High Dimensional Genomic Data

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Enriched Random Forest for High Dimensional Genomic Data

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文