4.8 Article

Deciphering microbial gene function using natural language processing

Journal

NATURE COMMUNICATIONS
Volume 13, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41467-022-33397-4

Keywords

-

Funding

  1. ISF [1692/18]
  2. Edmond J. Safra Center for Bioinformatics at Tel-Aviv University
  3. Azrieli Foundation

Ask authors/readers for more resources

In this study, the authors propose a concept to reveal the function of uncharacterized genes using deep learning methods adopted from natural language processing. By repurposing NLP algorithms to model gene semantics, they are able to predict functional categories for genes and demonstrate the potential of combining microbial genomics and language models to discover gene functions in microbes.
Revealing the function of uncharacterized genes is a fundamental challenge in an era of ever-increasing volumes of sequencing data. Here, we present a concept for tackling this challenge using deep learning methodologies adopted from natural language processing (NLP). We repurpose NLP algorithms to model gene semantics based on a biological corpus of more than 360 million microbial genes within their genomic context. We use the language models to predict functional categories for 56,617 genes and find that out of 1369 genes associated with recently discovered defense systems, 98% are inferred correctly. We then systematically evaluate the discovery potential of different functional categories, pinpointing those with the most genes yet to be characterized. Finally, we demonstrate our method's ability to discover systems associated with microbial interaction and defense. Our results highlight that combining microbial genomics and language models is a promising avenue for revealing gene functions in microbes. The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore gene semantics and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available