4.6 Article

seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data

期刊

FRONTIERS IN GENETICS
卷 13, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2022.850804

关键词

massively parallel sequencing data; Gaussian mixture model; gender; sex chromosomal abnormality; aneuploidy

资金

  1. 1.3.5 project for disciplines of excellence, West China Hospital, Sichuan University

向作者/读者索取更多资源

This study validated a new gender-inference tool, seGMM, which accurately determines the gender of samples and identifies sex chromosomal abnormalities. The seGMM tool demonstrated high accuracy and outperformed other existing tools in gender inference on different gene panel datasets. Furthermore, seGMM accurately identified sex chromosomal abnormalities through the analysis of sex chromosome marker genes.
In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affect molecular diagnosis and treatment decisions. Targeted gene sequencing (TGS) is widely recommended as a first-tier diagnostic step in clinical genetic testing. However, the existing gender-inference tools are optimized for whole genome and whole exome data and are not adequate and accurate for analyzing TGS data. In this study, we validated a new gender-inference tool, seGMM, which uses unsupervised clustering (Gaussian mixture model) to determine the gender of a sample. The seGMM tool can also identify sex chromosomal abnormalities in samples by aligning the sequencing reads from the genotype data. The seGMM tool consistently demonstrated > 99% gender-inference accuracy in a publicly available 1,000-gene panel dataset from the 1,000 Genomes project, an in-house 785 hearing loss gene panel dataset of 16,387 samples, and a 187 autism risk gene panel dataset from the Autism Clinical and Genetic Resources in China (ACGC) database. The performance and accuracy of seGMM was significantly higher for the targeted gene sequencing (TGS), whole exome sequencing (WES), and whole genome sequencing (WGS) datasets compared to the other existing gender-inference tools such as PLINK, seXY, and XYalign. The results of seGMM were confirmed by the short tandem repeat analysis of the sex chromosome marker gene, amelogenin. Furthermore, our data showed that seGMM accurately identified sex chromosomal abnormalities in the samples. In conclusion, the seGMM tool shows great potential in clinical genetics by determining the sex chromosomal karyotypes of samples from massively parallel sequencing data with high accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据