☆ 3.8 Article

A comparative study on feature selection methods for drug discovery

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES (2004)

期刊

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

卷 44, 期 5, 页码 1823-1828

出版社

AMER CHEMICAL SOC

DOI: 10.1021/ci049875d

关键词

类别

Chemistry, Multidisciplinary Computer Science, Information Systems Computer Science, Interdisciplinary Applications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Feature selection is frequently used as a preprocessing step to machine learning. The removal of irrelevant and redundant information often improves the performance of learning algorithms. This paper is a comparative study of feature selection in drug discovery. The focus is on aggressive dimensionality reduction. Five methods were evaluated, including information gain, mutual information, a chi(2)-test, odds ratio, and GSS coefficient. Two well-known classification algorithms, Naive Bayesian and Support Vector Machine (SVM), were used to classify the chemical compounds. The results showed that Maive Bayesian benefited significantly from the feature selection, while SVM performed better when all features were used. In this experiment, information gain and chi(2)-test were most effective feature selection methods. Using information gain with a Naive Bayesian classifier, removal of up to 96% of the features yielded an improved classification accuracy measured by sensitivity. When information gain was used to select the features, SVM was much less sensitive to the reduction of feature space. The feature set size was reduced by 99%, while losing only a few percent in terms of sensitivity (from 58.7% to 52.5%) and specificity (from 98.4% to 97.2%). In contrast to information gain and chi(2)-test, mutual information had relatively poor performance due to its bias toward favoring rare features and its sensitivity to probability estimation errors.

A comparative study on feature selection methods for drug discovery

期刊

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

出版社

AMER CHEMICAL SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A comparative study on feature selection methods for drug discovery

期刊

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

出版社

AMER CHEMICAL SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文