☆ 4.7 Article

High-performance deep learning pipeline predicts individuals in mixtures of DNA using sequencing data

BRIEFINGS IN BIOINFORMATICS (2021)

Journal

BRIEFINGS IN BIOINFORMATICS

Volume 22, Issue 6, Pages -

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bib/bbab283

Keywords

deep learning; next-generation sequencing; forensic; breast cancer; DNA mixture

Funding

Center of Genomic and Precision Medicine, National Taiwan University
Ministry of Science and Technology, Taiwan [MOST-110-2634-F-002-044]
Center for Biotechnology, National Taiwan University, Taiwan [GTZ300]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study proposed a deep learning model for classifying individuals from mixtures of DNA samples with high accuracy. The model was also demonstrated to be effective in classifying subtypes of breast cancer patients, showcasing its versatility across different NGS platforms.

In this study, we proposed a deep learning (DL) model for classifying individuals from mixtures of DNA samples using 27 short tandem repeats and 94 single nucleotide polymorphisms obtained through massively parallel sequencing protocol. The model was trained/tested/validated with sequenced data from 6 individuals and then evaluated using mixtures from forensic DNA samples. The model successfully identified both the major and the minor contributors with 100% accuracy for 90 DNA mixtures, that were manually prepared by mixing sequence reads of 3 individuals at different ratios. Furthermore, the model identified 100% of the major contributors and 50-80% of the minor contributors in 20 two-sample external-mixed-samples at ratios of 1:39 and 1:9, respectively. To further demonstrate the versatility and applicability of the pipeline, we tested it on whole exome sequence data to classify subtypes of 20 breast cancer patients and achieved an area under curve of 0.85. Overall, we present, for the first time, a complete pipeline, including sequencing data processing steps and DL steps, that is applicable across different NGS platforms. We also introduced a sliding window approach, to overcome the sequence length variation problem of sequencing data, and demonstrate that it improves the model performance dramatically.

High-performance deep learning pipeline predicts individuals in mixtures of DNA using sequencing data

Journal

BRIEFINGS IN BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

High-performance deep learning pipeline predicts individuals in mixtures of DNA using sequencing data

Journal

BRIEFINGS IN BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper