4.7 Article

Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity

Journal

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
Volume 103, Issue 2, Pages 129-136

Publisher

ELSEVIER
DOI: 10.1016/j.chemolab.2010.06.008

Keywords

Feature selection; Bagging; Boosting; Random Forest (RF); Classification and Regression Tree (CART); Ensemble learning

Funding

  1. National Nature Foundation Committee of P.R. China [20875104, 10771217]
  2. Ministry of science and technology of China [2007DFA40680]

Ask authors/readers for more resources

In the structure-activity relationship (SAR) study, a learning algorithm is usually faced with the problem of selecting a compact subset of descriptors related to the property of interest, while ignoring the rest. This paper presents a new method of molecular descriptor selection utilizing three commonly used decision tree (DT)-based ensemble methods coupled with a backward elimination strategy (BES). Our proposed method eliminates descriptor redundancy automatically and searches for more compact descriptor subset tailored to DT-based ensemble methods. Six real SAR datasets related to different categorical bioactivities of compounds are used to evaluate the proposed method. The results obtained in this study indicate that DT-based ensemble methods coupled with BES, especially boosting tree model, yield better classification performance for compounds related to ADMET. (C) 2010 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available