4.5 Article

LncMachine: a machine learning algorithm for long noncoding RNA annotation in plants

期刊

FUNCTIONAL & INTEGRATIVE GENOMICS
卷 21, 期 2, 页码 195-204

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s10142-021-00769-w

关键词

LncRNA; Machine learning; Random Forest; Plants

资金

  1. Junta de Andalucia (Andalusian Regional Government), Spain [P18-RT-992]
  2. FEDER
  3. TUBITAK
  4. Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the US Department of Energy (DOE)
  5. US Department of Agriculture (USDA) Agricultural Research Service (ARS)
  6. USDA-ARS [2030-21000-024-00-D]

向作者/读者索取更多资源

The critical roles of lncRNAs in biological processes have gained vast attention, and with the advancement of high-throughput sequencing technologies, high-quality data is now available for annotation. This study compares prediction accuracies of machine learning algorithms and presents a crop-specific coding potential prediction tool, LncMachine, with high accuracy when used with the Random Forest algorithm. LncMachine can be effortlessly applied to a wide range of studies by deploying user-provided algorithms in real time.
Following the elucidation of the critical roles they play in numerous important biological processes, long noncoding RNAs (lncRNAs) have gained vast attention in recent years. Manual annotation of lncRNAs is restricted by known gene annotations and is prone to false prediction due to the incompleteness of available data. However, with the advent of high-throughput sequencing technologies, a magnitude of high-quality data has become available for annotation, especially for plant species such as wheat. Here, we compared prediction accuracies of several machine learning algorithms using a 10-fold cross-validation. This study includes a comprehensive feature selection step to refine irrelevant and repeated features. We present a crop-specific, alignment-free coding potential prediction tool, LncMachine, that performs at higher prediction accuracies than the currently available popular tools (CPC2, CPAT, and CNIT) when used with the Random Forest algorithm. Further, LncMachine with Random Forest performed well on human and mouse data, with an average accuracy of 92.67%. LncMachine only requires either a FASTA file or a TAB separated CSV file containing features as input files. LncMachine can deploy several user-provided algorithms in real time and therefore be effortlessly applied to a wide range of studies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据