4.6 Article

Marker imputation efficiency for genotyping-by-sequencing data in rice (Oryza sativa) and alfalfa (Medicago sativa)

期刊

MOLECULAR BREEDING
卷 36, 期 6, 页码 -

出版社

SPRINGER
DOI: 10.1007/s11032-016-0490-y

关键词

SNP; Genotyping by sequencing (GBS); K-nearest neighbors imputation (KNNI); Random Forest imputation (RFI); Singular value decomposition imputation (SVDI); Beagle; FILLIN; Alfalfa; Rice; Imputation; Reference genome

资金

  1. AGER Foundation [2010-2369]
  2. projects Genomic selection in alfalfa (GENALFA) by the Italian Ministry of Foreign Affairs and International Cooperation in the framework of the Italy-USA scientific cooperation program
  3. Italian share of the FP7-ArimNet project Resilient, water-and energy-efficient forage and feed crops for Mediterranean agricultural systems (REFORMA) by Italian Ministry of Agricultural and Forestry Policies

向作者/读者索取更多资源

Genotyping-by-sequencing (GBS) is a rapid and cost-effective genome-wide genotyping technique applicable whether a reference genome is available or not. Due to the cost-coverage trade-off, however, GBS typically produces large amounts of missing marker genotypes, whose imputation becomes therefore both challenging and critical for later analyses. In this work, the performance of four general imputation methods (K-nearest neighbors, Random Forest, singular value decomposition, and mean value) and two genotype-specific methods (Beagle and FILLIN) was measured on GBS data from alfalfa (Medicago sativa L., autotetraploid, heterozygous, without reference genome) and rice (Oryza sativa L., diploid, 100 % homozygous, with reference genome). Alfalfa SNP were aligned on the genome of the closely related species Medicago truncatula L.. Benchmarks consisted in progressive data filtering for marker call rate (up to 70 %) and increasing proportions (up to 20 %) of known genotypes masked for imputation. The relative performance was measured as the total proportion of correctly imputed genotypes, globally and within each genotype class (two homozygotes in rice, two homozygotes and one heterozygote in alfalfa). We found that imputation accuracy was robust to increasing missing rates, and consistently higher in rice than in alfalfa. Accuracy was as high as 90-100 % for the major (most frequent) homozygous genotype, but dropped to 80-90 %(rice) and below 30 %(alfalfa) in the minor homozygous genotype. Beagle was the best performing method, both accuracy-and time-wise, in rice. In alfalfa, KNNI and RFI gave the highest accuracies, but KNNI was much faster.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据