4.7 Article

UMGAP: the Unipept MetaGenomics Analysis Pipeline

期刊

BMC GENOMICS
卷 23, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12864-022-08542-4

关键词

Shotgun metagenomics; Biodiversity analysis; Taxonomic profiling

资金

  1. Research Foundation - Flanders (FWO)
  2. Flemish Government
  3. Research Foundation -Flanders (FWO) [1164420N, 1174621N, 1512619N, 12I5220N]

向作者/读者索取更多资源

This study developed a tool suite called UMGAP for shotgun metagenomics taxonomic profiling, which achieved competitive performance with state-of-the-art tools. UMGAP utilizes protein coding regions to analyze the data, leading to low runtime, manageable memory footprint, high accuracy, and interactive visualizations for easy exploration.
Background Shotgun metagenomics yields ever richer and larger data volumes on the complex communities living in diverse environments. Extracting deep insights from the raw reads heavily depends on the availability of fast, accurate and user-friendly biodiversity analysis tools. Results Because environmental samples may contain strains and species that are not covered in reference databases and because protein sequences are more conserved than the genes encoding them, we explore the alternative route of taxonomic profiling based on protein coding regions translated from the shotgun metagenomics reads, instead of directly processing the DNA reads. We therefore developed the Unipept MetaGenomics Analysis Pipeline (UMGAP), a highly versatile suite of open source tools that are implemented in Rust and support parallelization to achieve optimal performance. Six preconfigured pipelines with different performance trade-offs were carefully selected, and benchmarked against a selection of state-of-the-art shotgun metagenomics taxonomic profiling tools. Conclusions UMGAP's protein space detour for taxonomic profiling makes it competitive with state-of-the-art shotgun metagenomics tools. Despite our design choices of an extra protein translation step, a broad spectrum index that can identify both archaea, bacteria, eukaryotes and viruses, and a highly configurable non-monolithic design, UMGAP achieves low runtime, manageable memory footprint and high accuracy. Its interactive visualizations allow for easy exploration and comparison of complex communities.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据