☆ 4.7 Article

Large-Scale Learning of Structure-Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2011)

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Volume 51, Issue 2, Pages 203-213

Publisher

AMER CHEMICAL SOC

DOI: 10.1021/ci100073w

Keywords

Funding

Nycomed GmbH, Konstanz, Germany

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The goal of this study was to adapt a recently proposed linear large-scale support vector machine to large-scale binary cheminformatics classification problems and to assess its performance on various benchmarks using virtual screening performance measures. We extended the large-scale linear support vector machine library LIBLINEAR with state-of-the-art virtual high-throughput screening metrics to train classifiers on whole large and unbalanced data sets. The formulation of this linear support machine has an excellent performance if applied to high-dimensional sparse feature vectors. An additional advantage is the average linear complexity in the number of non zero features of a prediction. Nevertheless, the approach assumes that a problem is linearly separable. Therefore, we conducted an extensive benchmarking to evaluate the performance on large-scale problems up to a size of 175000 samples. To examine the virtual screening performance, we determined the chemotype clusters using Feature Trees and integrated this information to compute weighted AUC-based performance measures and a leave-cluster-out cross-validation. We also considered the BEDROC score, a metric that was suggested to tackle the early enrichment problem. The performance on each problem was evaluated by a nested cross-validation and a nested leave-cluster-out cross-validation. We compared LIBLINEAR against a Naive Bayes classifier, a random decision forest classifier, and a maximum similarity ranking approach. These reference approaches were outperformed in a direct comparison by LIBLINEAR A comparison to literature results showed that the LIBLINEAR performance is competitive but without achieving results as good as the top-ranked nonlinear machines on these benchmarks. However, considering the overall convincing performance and computation time of the large-scale support vector machine, the approach provides an excellent alternative to established large-scale classification approaches.

Large-Scale Learning of Structure-Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Large-Scale Learning of Structure-Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper