4.4 Article

Rapid annotation of nifH gene sequences using classification and regression trees facilitates environmental functional gene analysis

Journal

ENVIRONMENTAL MICROBIOLOGY REPORTS
Volume 8, Issue 5, Pages 905-916

Publisher

WILEY
DOI: 10.1111/1758-2229.12455

Keywords

-

Funding

  1. Gordon and Betty Moore Foundation Marine Investigator award
  2. National Science Foundation Science Center for Microbial Oceanography Research and Education (C-MORE) [EF-0424599]

Ask authors/readers for more resources

The nifH gene is a widely used molecular proxy for studying nitrogen fixation. Phylogenetic classification of nifH gene sequences is an essential step in diazotroph community analysis that requires a fast automated solution due to increasing size of environmental sequence libraries and increasing yield of nifH sequences from high-throughput technologies. A novel approach to rapidly classify nifH amino acid sequences into well-defined phylogenetic clusters that provides a common platform for comparative analysis across studies is presented. Phylogenetic group membership can be accurately predicted with decision tree-type statistical models that identify and utilize signature residues in the amino acid sequences. Our classification models were trained and evaluated with a publicly available and manually curated nifH gene database containing cluster annotations. Model-independent sequence sets from diverse ecosystems were used for further assessment of the models' prediction accuracy. The utility of this novel sequence binning approach was demonstrated in a comparative study where joint treatment of diazotroph assemblages from a wide range of habitats identified habitat-specific and widely-distributed diazotrophs and revealed a marine - terrestrial distinction in community composition. Our rapid and automated phylogenetic cluster assignment circumvents extensive phylogenetic analysis of nifH sequences; hence, it saves substantial time and resources in nitrogen fixation studies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available