☆ 4.7 Review

TERL: classification of transposable elements by convolutional neural networks

BRIEFINGS IN BIOINFORMATICS (2021)

期刊

BRIEFINGS IN BIOINFORMATICS

卷 22, 期 3, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bib/bbaa185

关键词

transposable elements; sequence classification; deep learning; representation learning; convolutional neural networks

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

Coordination for the Improvement of Higher Education Personnel (CAPES) [88882.432275/2019-01]
National Council of Technological and Scientific Development (CNPq) [309642/2015-9, 431668/2016-7, 454505/2014-0, 422811/2016-5]
Fundacao Araucaria
SETI
UTFPR

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study introduces a method called TERL, which preprocesses and transforms TE sequences into two-dimensional space for classification using deep convolutional neural networks. Through six experiments, TERL shows excellent performance with high accuracy and significantly faster speed compared to other methods.

Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL.

TERL: classification of transposable elements by convolutional neural networks

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

TERL: classification of transposable elements by convolutional neural networks

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文