4.0 Article

MetaTransformer: deep metagenomic sequencing read classification using self-attention models

Journal

NAR GENOMICS AND BIOINFORMATICS
Volume 5, Issue 3, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/nargab/lqad082

Keywords

-

Ask authors/readers for more resources

Deep learning has had a significant impact on scientific research and this paper introduces a self-attention-based deep learning tool called MetaTransformer for metagenomic analysis. MetaTransformer outperforms previous methods in species and genus classification and achieves improved performance and reduced memory consumption through different embedding schemes.
Deep learning has emerged as a paradigm that revolutionizes numerous domains of scientific research. Transformers have been utilized in language modeling outperforming previous approaches. Therefore, the utilization of deep learning as a tool for analyzing the genomic sequences is promising, yielding convincing results in fields such as motif identification and variant calling. DeepMicrobes, a machine learning-based classifier, has recently been introduced for taxonomic prediction at species and genus level. However, it relies on complex models based on bidirectional long short-term memory cells resulting in slow runtimes and excessive memory requirements, hampering its effective usability. We present MetaTransformer, a self-attention-based deep learning metagenomic analysis tool. Our transformer-encoder-based models enable efficient parallelization while outperforming DeepMicrobes in terms of species and genus classification abilities. Furthermore, we investigate approaches to reduce memory consumption and boost performance using different embedding schemes. As a result, we are able to achieve 2x to 5x speedup for inference compared to DeepMicrobes while keeping a significantly smaller memory footprint. MetaTransformer can be trained in 9 hours for genus and 16 hours for species prediction. Our results demonstrate performance improvements due to self-attention models and the impact of embedding schemes in deep learning on metagenomic sequencing data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available