4.2 Article Proceedings Paper

DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD
DOI: 10.1142/S0219720021400175

关键词

Somatic mutation classification; XGBoost; deep neural network; feature selection

资金

  1. Institute of Information and Communications Technology Planning and Evaluation (IITP) - Korean government (MSIT) [2020-0-01450]
  2. National Research Foundation of Korea (NRF) - Korean government (MSIT) [2021R1A2C2010775]
  3. National Research Foundation of Korea [2021R1A2C2010775] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

This study developed a machine learning approach, DNN-Boost, utilizing Deep Neural Network and XGBoost to identify somatic mutations in tumor-only exome sequencing data. The XGBoost algorithm extracts features from variant caller results as input for the DNN model. The DNN-Boost classification model outperformed benchmark methods in classifying somatic mutations in paired tumor-normal exome data and tumor-only exome data.
Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据