4.6 Article

Machine Learning as an Effective Method for Identifying True Single Nucleotide Polymorphisms in Polyploid Plants

Journal

PLANT GENOME
Volume 12, Issue 1, Pages -

Publisher

WILEY
DOI: 10.3835/plantgenome2018.05.0023

Keywords

-

Funding

  1. Peanut Foundation
  2. Agriculture and Food Research Initiative competitive grant of the USDA National Institute of Food and Agriculture [2012-85117-19435]
  3. Feed the Future Innovation Lab for Collaborative Research on Peanut Productivity and Mycotoxin Control (Peanut and Mycotoxin Innovation Lab)
  4. United States Agency for International Development (USAID)
  5. NIFA [578715, 2012-85117-19435] Funding Source: Federal RePORTER

Ask authors/readers for more resources

Single nucleotide polymorphisms (SNPs) have many advantages as molecular markers since they are ubiquitous and codominant. However, the discovery of true SNPs in polyploid species is difficult. Peanut (Arachis hypogaea L.) is an allopolyploid, which has a very low rate of true SNP calling. A large set of true and false SNPs identified from the Axiom_Arachis 58k array was leveraged to train machine-earning models to enable identification of true SNPs directly from sequence data to reduce ascertainment bias. These models achieved accuracy rates above 80% using real peanut RNA sequencing (RNA-seq) and whole-genome shotgun (WGS) resequencing data, which is higher than previously reported for polyploids and at least a twofold improvement for peanut. A 48K SNP array, Axiom_Arachis2, was designed using this approach resulting in 75% accuracy of calling SNPs from different tetraploid peanut genotypes. Using the method to simulate SNP variation in several polyploids, models achieved >98% accuracy in selecting true SNPs. Additionally, models built with simulated genotypes were able to select true SNPs at >80% accuracy using real peanut data. This work accomplished the objective to create an effective approach for calling highly reliable SNPs from polyploids using machine learning. A novel tool was developed for predicting true SNPs from sequence data, designated as SNP machine learning (SNP-ML), using the described models. The SNP-ML additionally provides functionality to train new models not included in this study for customized use, designated SNP machine learner (SNP-MLer). The SNP-ML is publicly available.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available