☆ 4.6 Article

Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data

PLOS ONE (2012)

期刊

PLOS ONE

卷 7, 期 7, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0039932

关键词

类别

Multidisciplinary Sciences

资金

Marie-Curie Early-Stage-Training programme [MEST-CT-2004-007597]
United Kingdom Engineering and Physical Sciences Research Council [EP/E017215/1, EP/H016597/1, EP/J004111/1]
Biotechnology and Biological Sciences Research Council [BB/F01855X/1]
Biotechnology and Biological Sciences Research Council [BB/F01855X/1] Funding Source: researchfish
Engineering and Physical Sciences Research Council [EP/H016597/1, EP/E017215/1, EP/J004111/1] Funding Source: researchfish
BBSRC [BB/F01855X/1] Funding Source: UKRI
EPSRC [EP/J004111/1, EP/E017215/1, EP/H016597/1] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文