☆ 4.7 Article

NeRNA: A negative data generation framework for machine learning applications of noncoding RNAs

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

卷 159, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.compbiomed.2023.106861

关键词

RNA; Noncoding RNA; Data generation; Machine learning

类别

Biology Computer Science, Interdisciplinary Applications Engineering, Biomedical Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Many supervised machine learning methods have been developed to classify and identify novel noncoding RNA (ncRNA) sequences. However, there is a lack of standardized methodologies and databases for generating high-quality negative examples. In this work, a novel method called NeRNA is developed to generate negative sequences for ncRNA analysis. NeRNA is tested on four different ncRNA datasets and demonstrates high performance in miRNA prediction. It is released as an easy-to-use, updatable, and modifiable KNIME workflow for RNA sequence data analysis.

Many supervised machine learning based noncoding RNA (ncRNA) analysis methods have been developed to classify and identify novel sequences. During such analysis, the positive learning datasets usually consist of known examples of ncRNAs and some of them might even have weak or strong experimental validation. On the contrary, there are neither databases listing the confirmed negative sequences for a specific ncRNA class nor standardized methodologies developed to generate high quality negative examples. To overcome this challenge, a novel negative data generation method, NeRNA (negative RNA), is developed in this work. NeRNA uses known examples of given ncRNA sequences and their calculated structures for octal representation to create negative sequences in a manner similar to frameshift mutations but without deletion or insertion. NeRNA is tested individually with four different ncRNA datasets including microRNA (miRNA), transfer RNA (tRNA), long noncoding RNA (lncRNA), and circular RNA (circRNA). Furthermore, a species-specific case analysis is per-formed to demonstrate and compare the performance of NeRNA for miRNA prediction. The results of 1000 fold cross-validation on Decision Tree, Naive Bayes and Random Forest classifiers, and deep learning algorithms such as Multilayer Perceptron, Convolutional Neural Network, and Simple feedforward Neural Networks indicate that models obtained by using NeRNA generated datasets, achieves substantially high prediction performance. NeRNA is released as an easy-to-use, updatable and modifiable KNIME workflow that can be downloaded with example datasets and required extensions. In particular, NeRNA is designed to be a powerful tool for RNA sequence data analysis.

NeRNA: A negative data generation framework for machine learning applications of noncoding RNAs

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

NeRNA: A negative data generation framework for machine learning applications of noncoding RNAs

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文