4.7 Article

Robust and efficient software for reference-free genomic diversity analysis of genotyping-by-sequencing data on diploid and polyploid species

期刊

MOLECULAR ECOLOGY RESOURCES
卷 22, 期 1, 页码 439-454

出版社

WILEY
DOI: 10.1111/1755-0998.13477

关键词

bioinformatics; clustering; diversity; genotyping-by-sequencing; sequencing; software

资金

  1. Universidad de los Andes

向作者/读者索取更多资源

Genotyping-by-sequencing (GBS) is a cost-effective technique for obtaining genetic markers from populations, and reference-free approaches are needed for species without a reference genome. However, available tools for de novo analysis of GBS reads face usability and accuracy issues. The novel algorithm NGSEP shows better accuracy and computational efficiency in benchmark experiments, making it a useful tool for population genetic studies in various species.
Genotyping-by-sequencing (GBS) is a widely used and cost-effective technique for obtaining large numbers of genetic markers from populations by sequencing regions adjacent to restriction cut sites. Although a standard reference-based pipeline can be followed to analyse GBS reads, a reference genome is still not available for a large number of species. Hence, reference-free approaches are required to generate the genetic variability information that can be obtained from a GBS experiment. Unfortunately, available tools to perform de novo analysis of GBS reads face issues of usability, accuracy and performance. Furthermore, few available tools are suitable for analysing data sets from polyploid species. In this manuscript, we describe a novel algorithm to perform reference-free variant detection and genotyping from GBS reads. Nonexact searches on a dynamic hash table of consensus sequences allow for efficient read clustering and sorting. This algorithm was integrated in the Next Generation Sequencing Experience Platform (NGSEP) to integrate the state-of-the-art variant detector already implemented in this tool. We performed benchmark experiments with three different empirical data sets of plants and animals with different population structures and ploidies, and sequenced with different GBS protocols at different read depths. These experiments show that NGSEP has comparable and in some cases better accuracy and always better computational efficiency compared to existing solutions. We expect that this new development will be useful for many research groups conducting population genetic studies in a wide variety of species.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据