4.4 Article

The curse of the uncultured fungus

Journal

MYCOKEYS
Volume -, Issue 86, Pages 177-194

Publisher

PENSOFT PUBLISHERS
DOI: 10.3897/mycokeys.86.76053

Keywords

Data interoperability; data mining; DNA barcoding; scientific practice; species identification; taxonomic; annotation

Categories

Funding

  1. Estonian Research Council [PRG1170]
  2. German Research Foundation [DFG: WU890/2-1]
  3. Swedish Research Council Formas (project HerbEvol grant) [2015-1464]
  4. Spanish Ministry of Economy and Competitiveness [CGL2015-67459-P, BES-2016-077793]
  5. International Association for Plant Taxonomy

Ask authors/readers for more resources

The international DNA sequence databases contain many fungal sequences that are only annotated at the kingdom level, resulting in low-resolution mycological results and more poorly annotated entries. This study analyzes a dataset of 767,918 public full-length sequences to identify truly unidentifiable fungal taxa and determine the proportion of them that could have been easily identified at a higher taxonomic level. The findings suggest that over 70% of these sequences could have been identified at least to the order/family level at the time of sequence deposition, indicating that factors other than the availability of reference sequences contribute to the use of low-resolution names. The study also highlights the importance of addressing this problem, as a significant proportion of poorly annotated sequences are deposited by mycologists.
The international DNA sequence databases abound in fungal sequences not annotated beyond the kingdom level, typically bearing names such as uncultured fungus. These sequences beget lowresolution mycological results and invite further deposition of similarly poorly annotated entries. What do these sequences represent? This study uses a 767,918-sequence corpus of public full-length that represent truly unidentifiable fungal taxa - and what proportion of them that would have deposition. Our results suggest that more than 70% of these sequences would have been trivial to identify to at least the order/family level at the time of sequence deposition, hinting that factors other than poor availability of relevant reference sequences explain the low-resolution names. We speculate that researchers' perceived lack of time and lack of insight into the ramifications of this problem are the main explanations for the low-resolution names. We were surprised to find that more than a fifth of these sequences seem to have been deposited by mycologists rather than researchers unfamiliar with the consequences of poorly annotated fungal sequences in molecular repositories. The proportion of these needlessly poorly annotated sequences does not decline over time, suggesting that this problem must not be left unchecked.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available