☆ 4.2 Article Proceedings Paper

CLASSIFICATION OF LARGE MICROARRAY DATASETS USING FAST RANDOM FOREST CONSTRUCTION

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (2011)

期刊

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY

卷 9, 期 2, 页码 251-267

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

DOI: 10.1142/S021972001100546X

关键词

Algorithm; data mining; genomic; classifier; random forest; ensemble algorithm; optimize; bootstrap samples; machine learning; microarray; analysis; gene expression; file-based implementation; multi-dimensional data; high-dimensional data

类别

Biochemical Research Methods Computer Science, Interdisciplinary Applications Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

CLASSIFICATION OF LARGE MICROARRAY DATASETS USING FAST RANDOM FOREST CONSTRUCTION

期刊

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

CLASSIFICATION OF LARGE MICROARRAY DATASETS USING FAST RANDOM FOREST CONSTRUCTION

期刊

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文