☆ 4.5 Article

Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets

CHEMICAL RESEARCH IN TOXICOLOGY (2023)

期刊

CHEMICAL RESEARCH IN TOXICOLOGY

卷 36, 期 8, 页码 1300-1312

出版社

AMER CHEMICAL SOC

DOI: 10.1021/acs.chemrestox.3c00042

关键词

类别

Chemistry, Medicinal Chemistry, Multidisciplinary Toxicology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study investigates the distribution of positive and nonpositive entries within the ChEMBL database and its impact on the performance of classification models. The results indicate that models trained on publicly available data tend to overpredict positives, while models based on industry data sets predict negatives more often. The visualization of the prediction space further strengthens these findings by identifying regions where predictions converge. Furthermore, the utilization of these models for consensus modeling for potential adverse events prediction is highlighted.

Each year, publicly available databases are updated withnew compoundsfrom different research institutions. Positive experimental outcomesare more likely to be reported; therefore, they account for a considerablefraction of these entries. Established publicly available databasessuch as ChEMBL allow researchers to use information without constrictionsand create predictive tools for a broad spectrum of applications inthe field of toxicology. Therefore, we investigated the distributionof positive and nonpositive entries within ChEMBL for a set of off-targetsand its impact on the performance of classification models when appliedto pharmaceutical industry data sets. Results indicate that modelstrained on publicly available data tend to overpredict positives,and models based on industry data sets predict negatives more oftenthan those built using publicly available data sets. This is strengthenedeven further by the visualization of the prediction space for a setof 10,000 compounds, which makes it possible to identify regions inthe chemical space where predictions converge. Finally, we highlightthe utilization of these models for consensus modeling for potentialadverse events prediction.

Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets

期刊

CHEMICAL RESEARCH IN TOXICOLOGY

出版社

AMER CHEMICAL SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets

期刊

CHEMICAL RESEARCH IN TOXICOLOGY

出版社

AMER CHEMICAL SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文