4.8 Article

Strain/species identification in metagenomes using genome-specific markers

期刊

NUCLEIC ACIDS RESEARCH
卷 42, 期 8, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gku138

关键词

-

资金

  1. ENIGMA, a Scientific Focus Area [DE-AC02-05CH11231]
  2. U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (OBER), 'Genomics: GTL Foundational Science'
  3. OBER Biological Systems Research on the Role of Microbial Communities in Carbon Cycling Program [DE-SC0004601]
  4. U.S. National Science Foundation MacroSystems Biology program [NSF EF-1065844]
  5. Oklahoma Center for the Advancement of Science and Technology (OCAST) through the Oklahoma Applied Research Support (OARS) Project [AR11-035]

向作者/读者索取更多资源

Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with a parts per thousand yen0.25x coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据