4.6 Article

IDriveGenes: Cancer Driver Genes Prediction Using Machine Learning

期刊

IEEE ACCESS
卷 11, 期 -, 页码 28439-28453

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3259907

关键词

Cancer; Proteins; Bioinformatics; Genomics; Feature extraction; DNA; Tumors; Machine learning; Accuracy; cancer; driver genes; machine learning; NGS

向作者/读者索取更多资源

The development of high throughput sequencing technologies is revolutionizing cancer exploration. The proposed model uses PRIM and AAPIV to extract robust features from sequence data and converts it into 2-dimensional numeric data. SVM, NN, and RF are used to train the model and predict whether the given primary structure corresponds to cancer driver genes.
The development of high throughput sequencing technologies i.e. Next Generation Sequencing (NGS) is revolutionizing the exploration of cancer. Though sequence datasets are highly complex, mutation can occur randomly in DNA or RNA sequences that can make cells sicker or less fit. The unusual growth and behavior of genes in cells cause cancer. Cancer-driver gene cells grow when mutation occurs. Identification of cancer driver genes is a critical and challenging issue for researchers. In the proposed work, initially, robust features are extracted from the sequence dataset through Position Relative Incidence Matrix (PRIM) integrated with Accumulative Absolute Position Incidence Vector (AAPIV) generation. PRIM and AAPIV convert the single-dimensional sequence data into 2-dimensional numeric data. Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) are used to train the model. The proposed model is validated with different validation methods i.e., independent testing, k-fold cross-validation, self-consistency, and jackknife testing. The proposed model predicts whether the given primary structure corresponds to cancer driver genes or not. Results analyses show 95%, 92%, and 69% accuracy on RF, Artificial Neural Networks (ANN), and SVM respectively. The comparative analysis with existing state-of-the-art models i.e., 20/20+ and Multimodal Deep Neural Network by integrating Multi-dimensional Data (NDNNMD) shows that the proposed model outperforms the existing techniques.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据