4.7 Article

A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures

期刊

BIOINFORMATICS
卷 34, 期 1, 页码 171-178

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btx432

关键词

-

资金

  1. National Science Foundation [CAREER award] [1054631, CNS-1701681]
  2. National Institutes of Health [P30CA177558, 5R01HG006272-03]
  3. Direct For Computer & Info Scie & Enginr [1054631, 1701681] Funding Source: National Science Foundation
  4. Division Of Computer and Network Systems [1701681] Funding Source: National Science Foundation
  5. Div Of Information & Intelligent Systems [1054631] Funding Source: National Science Foundation

向作者/读者索取更多资源

Motivation: Metagenomic read classification is a critical step in the identification and quantification of microbial species sampled by high-throughput sequencing. Although many algorithms have been developed to date, they suffer significant memory and/or computational costs. Due to the growing popularity of metagenomic data in both basic science and clinical applications, as well as the increasing volume of data being generated, efficient and accurate algorithms are in high demand. Results: We introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequencing reads. The algorithm employs a novel data structure, called l-Othello, to support efficient querying of a taxon using its k-mer signatures. MetaOthello is an order-of-magnitude faster than the current state-of-the-art algorithms Kraken and Clark, and requires only one-third of the RAM. In comparison to Kaiju, a metagenomic classification tool using protein sequences instead of genomic sequences, MetaOthello is three times faster and exhibits 20-30% higher classification sensitivity. We report comparative analyses of both scalability and accuracy using a number of simulated and empirical datasets. Availability and implementation: MetaOthello is a stand-alone program implemented in C++ thornthorn. The current version (1.0) is accessible via https://doi.org/10.5281/zenodo.808941. Contact: liuj@cs.uky.edu Supplementary information: Supplementary data are available at Bioinformatics online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据