4.8 Article

Predicting chemical ecotoxicity by learning latent space chemical representations

期刊

ENVIRONMENT INTERNATIONAL
卷 163, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.envint.2022.107224

关键词

Autoencoder; Machine learning; Chemical ecotoxicity; Dimension reduction; Representation learning

资金

  1. National Institute of Environmental Health Sciences [R35ES031688, P30ES009089]

向作者/读者索取更多资源

The in-silico prediction of chemical ecotoxicity (HC50) plays a crucial role in enhancing toxicological assessment of manufactured chemicals. A novel autoencoder model was developed to learn latent chemical representations, achieving state-of-the-art prediction performance for HC50.
In silico prediction of chemical ecotoxicity (HC50) represents an important complement to improve in vivo and in vitro toxicological assessment of manufactured chemicals. Recent application of machine learning models to predict chemical HC50 yields variable prediction performance that depends on effectively learning chemical representations from high-dimension data. To improve HC50 prediction performance, we developed an autoencoder model by learning latent space chemical embeddings. This novel approach achieved state-of-the-art prediction performance of HC50 with R2 of 0.668 +/- 0.003 and mean absolute error (MAE) of 0.572 +/- 0.001, and outperformed other dimension reduction methods including principal component analysis (PCA) (R2 = 0.601 +/- 0.031 and MAE = 0.629 +/- 0.005), kernel PCA (R2 = 0.631 +/- 0.008 and MAE = 0.625 +/- 0.006), and uniform manifold approximation and projection dimensionality reduction (R2 = 0.400 +/- 0.008 and MAE = 0.801 +/- 0.002). A simple linear layer with chemical embeddings learned from the autoencoder model performed better than random forest (R2 = 0.663 +/- 0.007 and MAE = 0.591 +/- 0.008), fully connected neural network (R2 = 0.614 +/- 0.016 and MAE = 0.610 +/- 0.008), least absolute shrinkage and selection operator (R2 = 0.617 +/- 0.037 and MAE = 0.619 +/- 0.007), and ridge regression (R2 = 0.638 +/- 0.007 and MAE = 0.613 +/- 0.005) using unlearned raw input features. Our results highlighted the usefulness of learning latent chemical representations, and our autoencoder model provides an alternative approach for robust HC50 prediction.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据