4.6 Article

MitoScape: A big-data, machine-learning platform for obtaining mitochondrial DNA from next-generation sequencing data

期刊

PLOS COMPUTATIONAL BIOLOGY
卷 17, 期 11, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1009594

关键词

-

资金

  1. [MH110185]
  2. [NS021328]
  3. [MH108592]
  4. [OD010944]

向作者/读者索取更多资源

The study introduces a novel software, MitoScape, which accurately extracts mitochondrial DNA sequences using machine learning to model the unique characteristics of mitochondrial genetics, showing superior performance in heteroplasmy estimation. By applying MitoScape to common disease examples, important heteroplasmy-disease associations are discovered, highlighting its importance in personalized medicine and clinical diagnostics.
The growing number of next-generation sequencing (NGS) data presents a unique opportunity to study the combined impact of mitochondrial and nuclear-encoded genetic variation in complex disease. Mitochondrial DNA variants and in particular, heteroplasmic variants, are critical for determining human disease severity. While there are approaches for obtaining mitochondrial DNA variants from NGS data, these software do not account for the unique characteristics of mitochondrial genetics and can be inaccurate even for homoplasmic variants. We introduce MitoScape, a novel, big-data, software for extracting mitochondrial DNA sequences from NGS. MitoScape adopts a novel departure from other algorithms by using machine learning to model the unique characteristics of mitochondrial genetics. We also employ a novel approach of using rho-zero (mitochondrial DNA-depleted) data to model nuclear-encoded mitochondrial sequences. We showed that MitoScape produces accurate heteroplasmy estimates using gold-standard mitochondrial DNA data. We provide a comprehensive comparison of the most common tools for obtaining mtDNA variants from NGS and showed that MitoScape had superior performance to compared tools in every statistically category we compared, including false positives and false negatives. By applying MitoScape to common disease examples, we illustrate how MitoScape facilitates important heteroplasmy-disease association discoveries by expanding upon a reported association between hypertrophic cardiomyopathy and mitochondrial haplogroup T in men (adjusted p-value = 0.003). The improved accuracy of mitochondrial DNA variants produced by MitoScape will be instrumental in diagnosing disease in the context of personalized medicine and clinical diagnostics. Author summary Recent studies have highlighted the importance of mitochondrial DNA variation in both primary mitochondrial disease and complex, human pathology including COVID-19, and space-flight stress. The vast amount of existing, next-generation sequencing (NGS) data can be leveraged to interrogate both nuclear and mitochondrial DNA (mtDNA) sequence simultaneously, allowing for analysis of the interplay between mitochondrial and nuclear encoded genes in mitochondrial function. Identifying mtDNA sequence accurately is complicated by the presence of nuclear encoded mitochondrial sequences (NUMTs), which are homologous to mtDNA. Current software for analyzing mtDNA from NGS do not accurately model the unique characteristics of mitochondrial genetics. We introduce MitoScape, a novel, big-data, software which models mitochondrial genetics through machine learning to accurately identify mtDNA sequence from NGS data. MitoScape takes advantage of rho-zero cell data to model the characteristics of NUMTs. We show that MitoScape produces more accurate heteroplasmy estimates compared to published software. We provide an example of applying MitoScape in replicating an association between hypertrophic cardiomyopathy and mitochondrial haplogroup T in men. MitoScape is an important contribution to mitochondrial genomics allowing for accurate mtDNA variants, and the ability to tailor mtDNA analysis in different population and disease contexts, which is not available in other software.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据