4.8 Article

Unifying the known and unknown microbial coding sequence space

期刊

ELIFE
卷 11, 期 -, 页码 -

出版社

eLIFE SCIENCES PUBL LTD
DOI: 10.7554/eLife.67667

关键词

microbial genomics; bioinformatics; gene clusters; functional metageomics; phylogenomics; unknown function; Other

类别

资金

  1. BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) [031A537B, 031A533A, 031A538A, 031A533B, 031A535A, 031A537C, 031A534A, 031A532B]
  2. Max Planck Society
  3. European Unions Horizon 2020 research and innovation program Blue Growth: Unlocking the potential of Seas and Oceans [634,486]
  4. Biotechnology and Biological Sciences Research Council [BB/M011755/1, BB/R015228/1]
  5. RDF by the European Molecular Biology Laboratory [RTI2018-101205-B-I00]
  6. Spanish Agency of Science MICIU/AEI/FEDER [CTM2017-87736-R]
  7. Spanish Ministry of Economy and Competitiveness
  8. Spanish Ministry of Economy and Competitiveness (MINECO) through the Consolider-Ingenio program [CSD2008-00077]

向作者/读者索取更多资源

Genes of unknown function pose a major challenge in molecular biology, especially in microbial systems. This study presents a computational framework to bridge the gap between known and unknown genes, and provides valuable insights into the diversity and relevance of the unknown fraction. The findings highlight the importance of investigating unknown genes and their potential implications in various organisms and environments.
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a signifi-cant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the genera-tion of hypotheses that can be used to augment experimental data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据