4.6 Article

A structural variation genotyping algorithm enhanced by CNV quantitative transfer

Journal

FRONTIERS OF COMPUTER SCIENCE
Volume 16, Issue 6, Pages -

Publisher

HIGHER EDUCATION PRESS
DOI: 10.1007/s11704-021-1177-z

Keywords

genotyping; copy number variations; transfer earning

Funding

  1. National Natural Science Foundation of China [31701150]
  2. Fundamental Research Funds for the Central Universities [CXTD2017003]

Ask authors/readers for more resources

This paper introduces a transfer learning-based method to accurately genotype structural variations while considering copy number variations (CNVs). By adjusting the weights of instances with different allelic copy numbers, the method maximizes the contribution of all instances to genotyping and minimizes the genotyping errors caused by CNVs.
Genotyping of structural variations considering copy number variations (CNVs) is an infancy and challenging problem. CNVs, a prevalent form of critical genetic variations that cause abnormal copy numbers of large genomic regions in cells, often affect transcription and contribute to a variety of diseases. The characteristics of CNVs often lead to the ambiguity and confusion of existing genotyping features and algorithms, which may cause heterozygous variations to be erroneously genotyped as homozygous variations and seriously affect the accuracy of downstream analysis. As the allelic copy number increases, the error rate of genotyping increases sharply. Some instances with different copy numbers play an auxiliary role in the genotyping classification problem, but some will seriously interfere with the accuracy of the model. Motivated by these, we propose a transfer learning-based method to genotype structural variations accurately considering CNVs. The method first divides the instances with different allelic copy numbers and trains the basic machine learning framework with different genotype datasets. It maximizes the weights of the instances that contribute to classification and minimizes the weights of the instances that hinder correct genotyping. By adjusting the weights of the instances with different allelic copy numbers, the contribution of all the instances to genotyping can be maximized, and the genotyping errors of heterozygote variations caused by CNVs can be minimized. We applied the proposed method to both the simulated and real datasets, and compared it to some popular algorithms including GATK, Facets and Gindel. The experimental results demonstrate that the proposed method outperforms the others in terms of accuracy, stability and efficiency. The source codes have been uploaded at github/TrinaZ/CNVtransfer for academic use only.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available