4.5 Article

Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets

期刊

CHEMICAL RESEARCH IN TOXICOLOGY
卷 36, 期 8, 页码 1300-1312

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.chemrestox.3c00042

关键词

-

向作者/读者索取更多资源

This study investigates the distribution of positive and nonpositive entries within the ChEMBL database and its impact on the performance of classification models. The results indicate that models trained on publicly available data tend to overpredict positives, while models based on industry data sets predict negatives more often. The visualization of the prediction space further strengthens these findings by identifying regions where predictions converge. Furthermore, the utilization of these models for consensus modeling for potential adverse events prediction is highlighted.
Each year, publicly available databases are updated withnew compoundsfrom different research institutions. Positive experimental outcomesare more likely to be reported; therefore, they account for a considerablefraction of these entries. Established publicly available databasessuch as ChEMBL allow researchers to use information without constrictionsand create predictive tools for a broad spectrum of applications inthe field of toxicology. Therefore, we investigated the distributionof positive and nonpositive entries within ChEMBL for a set of off-targetsand its impact on the performance of classification models when appliedto pharmaceutical industry data sets. Results indicate that modelstrained on publicly available data tend to overpredict positives,and models based on industry data sets predict negatives more oftenthan those built using publicly available data sets. This is strengthenedeven further by the visualization of the prediction space for a setof 10,000 compounds, which makes it possible to identify regions inthe chemical space where predictions converge. Finally, we highlightthe utilization of these models for consensus modeling for potentialadverse events prediction.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据