期刊
BIOLOGY DIRECT
卷 16, 期 1, 页码 -出版社
BMC
DOI: 10.1186/s13062-020-00286-z
关键词
Machine learning; Random forest; Data integration
类别
资金
- Polish Ministry of Science and Higher Education under Institute of Computer Science, University of Bialystok
The study aimed to predict drug-induced liver injury (DILI) using gene expression profiles in cancer cell lines and drug chemical properties. Machine learning models were built, with significantly improved accuracy using the Super Learner approach, categorizing substances into low-risk and high-risk categories.
Motivation: Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI can bring a significant reduction in the cost of clinical trials. In this work we examined whether occurrence of DILI can be predicted using gene expression profile in cancer cell lines and chemical properties of drugs. Methods: We used gene expression profiles from 13 human cell lines, as well as molecular properties of drugs to build Machine Learning models of DILI. To this end, we have used a robust cross-validated protocol based on feature selection and Random Forest algorithm. In this protocol we first identify the most informative variables and then use them to build predictive models. The models are first built using data from single cell lines, and chemical properties. Then they are integrated using Super Learner method with several underlying methods for integration. The entire modelling process is performed using nested cross-validation. Results: We have obtained weakly predictive ML models when using either molecular descriptors, or some individual cell lines (AUC is an element of(0.55-0.61)). Models obtained with the Super Learner approach have a significantly improved accuracy (AUC=0.73), which allows to divide substances in two categories: low-risk and high-risk.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据