4.7 Article

miRe2e: a full end-to-end deep model based on transformers for prediction of pre-miRNAs

Journal

BIOINFORMATICS
Volume 38, Issue 5, Pages 1191-1197

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab823

Keywords

-

Funding

  1. ANPCyT [3384, 2905]
  2. UNL [082, 115]

Ask authors/readers for more resources

In this study, the first full end-to-end deep learning model for pre-miRNA prediction, miRe2e, was developed. The model is based on Transformers and can accept raw genome-wide data as input without any preprocessing or feature engineering. Experimental results showed that the model achieved 10 times better performance compared to state-of-the-art algorithms when tested using the human genome.
Motivation: MicroRNAs (miRNAs) are small RNA sequences with key roles in the regulation of gene expression at post-transcriptional level in different species. Accurate prediction of novel miRNAs is needed due to their importance in many biological processes and their associations with complicated diseases in humans. Many machine learning approaches were proposed in the last decade for this purpose, but requiring handcrafted features extraction to identify possible de novo miRNAs. More recently, the emergence of deep learning (DL) has allowed the automatic feature extraction, learning relevant representations by themselves. However, the state-of-art deep models require complex pre-processing of the input sequences and prediction of their secondary structure to reach an acceptable performance. Results: In this work, we present miRe2e, the first full end-to-end DL model for pre-miRNA prediction. This model is based on Transformers, a neural architecture that uses attention mechanisms to infer global dependencies between inputs and outputs. It is capable of receiving the raw genome-wide data as input, without any pre-processing nor feature engineering. After a training stage with known pre-miRNAs, hairpin and non-harpin sequences, it can identify all the pre-miRNA sequences within a genome. The model has been validated through several experimental setups using the human genome, and it was compared with state-of-the-art algorithms obtaining 10 times better performance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available