4.7 Article

Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 61, Issue 1, Pages 76-94

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.0c00908

Keywords

-

Funding

  1. Spanish Ministry of Science and Innovation [PID2019-109481GBI00/AEI/10.13039/501100011033]
  2. Junta de Andalucia Excellence in Research program [UCO-1264182]
  3. FEDER funds [PP2019Submod-1.2]

Ask authors/readers for more resources

During drug development, toxicity tests and adverse effect studies are crucial for ensuring patient safety and research success. The imbalance in data distribution between active and inactive samples, known as the class-imbalance problem, can negatively impact the performance of learned models. This paper proposes a feature selection method to address this issue, utilizing ensemble techniques and demonstrating improved classification performance compared to standard methods.
During the drug development process, it is common to carry out toxicity tests and adverse effect studies, which are essential to guarantee patient safety and the success of the research. The use of in silico quantitative structure-activity relationship (QSAR) approaches for this task involves processing a huge amount of data that, in many cases, have an imbalanced distribution of active and inactive samples. This is usually termed the class-imbalance problem and may have a significant negative effect on the performance of the learned models. The performance of feature selection (FS) for QSAR models is usually damaged by the class-imbalance nature of the involved datasets. This paper proposes the use of an FS method focused on dealing with the class-imbalance problems. The method is based on the use of FS ensembles constructed by boosting and using two well-known FS methods, fast clustering-based FS and the fast correlation-based filter. The experimental results demonstrate the efficiency of the proposal in terms of the classification performance compared to standard methods. The proposal can be extended to other FS methods and applied to other problems in cheminformatics.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available