Journal
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
Volume 103, Issue 2, Pages 129-136Publisher
ELSEVIER
DOI: 10.1016/j.chemolab.2010.06.008
Keywords
Feature selection; Bagging; Boosting; Random Forest (RF); Classification and Regression Tree (CART); Ensemble learning
Categories
Funding
- National Nature Foundation Committee of P.R. China [20875104, 10771217]
- Ministry of science and technology of China [2007DFA40680]
Ask authors/readers for more resources
In the structure-activity relationship (SAR) study, a learning algorithm is usually faced with the problem of selecting a compact subset of descriptors related to the property of interest, while ignoring the rest. This paper presents a new method of molecular descriptor selection utilizing three commonly used decision tree (DT)-based ensemble methods coupled with a backward elimination strategy (BES). Our proposed method eliminates descriptor redundancy automatically and searches for more compact descriptor subset tailored to DT-based ensemble methods. Six real SAR datasets related to different categorical bioactivities of compounds are used to evaluate the proposed method. The results obtained in this study indicate that DT-based ensemble methods coupled with BES, especially boosting tree model, yield better classification performance for compounds related to ADMET. (C) 2010 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available