4.5 Article

Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks

期刊

BIODATA MINING
卷 8, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s13040-014-0034-0

关键词

Double-ended DNA sequence; High throughput Solexa 454 nextgen NGS sequence query; Rapid fuzzy string matching; Homo sapiens genome reference consortium HG19

资金

  1. EPSRC [EP/J017515/1] Funding Source: UKRI
  2. Engineering and Physical Sciences Research Council [EP/J017515/1] Funding Source: researchfish

向作者/读者索取更多资源

Background: Genetic studies are increasingly based on short noisy next generation scanners. Typically complete DNA sequences are assembled by matching short NextGen sequences against reference genomes. Despite considerable algorithmic gains since the turn of the millennium, matching both single ended and paired end strings to a reference remains computationally demanding. Further tailoring Bioinformatics tools to each new task or scanner remains highly skilled and labour intensive. With this in mind, we recently demonstrated a genetic programming based automated technique which generated a version of the state-of-the-art alignment tool Bowtie2 which was considerably faster on short sequences produced by a scanner at the Broad Institute and released as part of The Thousand Genome Project. Results: Bowtie2(GP) and the original Bowtie2 release were compared on bioplanet's GCAT synthetic benchmarks. Bowtie2(GP) enhancements were also applied to the latest Bowtie2 release (2.2.3, 29 May 2014) and retained both the GP and the manually introduced improvements. Conclusions: On both singled ended and paired-end synthetic next generation DNA sequence GCAT benchmarks Bowtie2GP runs up to 45% faster than Bowtie2. The lost in accuracy can be as little as 0.2-0.5% but up to 2.5% for longer sequences.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据