4.7 Article

Enriched Random Forest for High Dimensional Genomic Data

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2021.3089417

关键词

Feature extraction; Genomics; Bioinformatics; Random forests; Proteins; Ontologies; Support vector machines; Ensemble methods; weighted random sampling; enriched random forest; high-dimensional data; genomic analyses

资金

  1. National Heart, Lung, and Blood Institute (NHLBI), National Institutes of Health [R01-HL150065]

向作者/读者索取更多资源

Enriched Random Forest is developed to enhance the performance of traditional random forest by reducing the contribution of less informative features. It improves the prediction accuracy, especially when relevant features are few.
Ensemble methods such as random forest works well on high-dimensional datasets. However, when the number of features is extremely large compared to the number of samples and the percentage of truly informative feature is very small, performance of traditional random forest decline significantly. To this end, we develop a novel approach that enhance the performance of traditional random forest by reducing the contribution of trees whose nodes are populated with less informative features. The proposed method selects eligible subsets at each node by weighted random sampling as opposed to simple random sampling in traditional random forest. We refer to this modified random forest algorithm as Enriched Random Forest. Using several high-dimensional micro-array datasets, we evaluate the performance of our approach in both regression and classification settings. In addition, we also demonstrate the effectiveness of balanced leave-one-out cross-validation to reduce computational load and decrease sample size while computing feature weights. Overall, the results indicate that enriched random forest improves the prediction accuracy of traditional random forest, especially when relevant features are very few.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据