☆ 4.5 Article

Artificial intelligence paradigm for ligand-based virtual screening on the drug discovery of type 2 diabetes mellitus

JOURNAL OF BIG DATA (2021)

期刊

JOURNAL OF BIG DATA

卷 8, 期 1, 页码 -

出版社

SPRINGERNATURE

DOI: 10.1186/s40537-021-00465-3

关键词

Quantitative structure-activity relationship; K-modes clustering; CatBoost; Rotation Forest; principal component analysis; Sparse principal component analysis; Deep neural network; Fingerprint

类别

Computer Science, Theory & Methods

资金

Universitas Indonesia [NKB-1381/UN2.RST/HKP.05.00/2020]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study aims to develop new DPP-4 inhibitors for the treatment of type 2 diabetes with low adverse effects using QSAR models built with Rotation Forest and Deep Neural Network. K-modes clustering and CatBoost are utilized for molecule selection and feature selection, resulting in QSAR models with high performance metrics. The study concludes that feature selection using CatBoost before building QSAR models is essential for accurate predictions.

Background New dipeptidyl peptidase-4 (DPP-4) inhibitors need to be developed to be used as agents with low adverse effects for the treatment of type 2 diabetes mellitus. This study aims to build quantitative structure-activity relationship (QSAR) models using the artificial intelligence paradigm. Rotation Forest and Deep Neural Network (DNN) are used to predict QSAR models. We compared principal component analysis (PCA) with sparse PCA (SPCA) as methods for transforming Rotation Forest. K-modes clustering with Levenshtein distance was used for the selection method of molecules, and CatBoost was used for the feature selection method. Results The amount of the DPP-4 inhibitor molecules resulting from the selection process of molecules using K-Modes clustering algorithm is 1020 with logP range value of -1.6693 to 4.99044. Several fingerprint methods such as extended connectivity fingerprint and functional class fingerprint with diameters of 4 and 6 were used to construct four fingerprint datasets, ECFP_4, ECFP_6, FCFP_4, and FCFP_6. There are 1024 features from the four fingerprint datasets that are then selected using the CatBoost method. CatBoost can represent QSAR models with good performance for machine learning and deep learning methods respectively with evaluation metrics, such as Sensitivity, Specificity, Accuracy, and Matthew's correlation coefficient, all valued above 70% with a feature importance level of 60%, 70%, 80%, and 90%. Conclusion The K-modes clustering algorithm can produce a representative subset of DPP-4 inhibitor molecules. Feature selection in the fingerprint dataset using CatBoost is best used before making QSAR Classification and QSAR Regression models. QSAR Classification using Machine Learning and QSAR Classification using Deep Learning, each of which has an accuracy of above 70%. The QSAR RFC-PCA and QSAR RFR-PCA models performed better than QSAR RFC-SPCA and QSAR RFR-SPCA models because QSAR RFC-PCA and QSAR RFR-PCA models have more effective time than the QSAR RFC-SPCA and QSAR RFR-SPCA models.

Artificial intelligence paradigm for ligand-based virtual screening on the drug discovery of type 2 diabetes mellitus

期刊

JOURNAL OF BIG DATA

出版社

SPRINGERNATURE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Artificial intelligence paradigm for ligand-based virtual screening on the drug discovery of type 2 diabetes mellitus

期刊

JOURNAL OF BIG DATA

出版社

SPRINGERNATURE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文