4.4 Article

A systematic evaluation of bioinformatics tools for identification of long noncoding RNAs

期刊

RNA
卷 27, 期 1, 页码 80-98

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1261/rna.074724.120

关键词

long noncoding RNA identification; tools comparison; simulated and biological data sets; joint prediction; non-model species

资金

  1. Wuhan Branch, Supercomputing Center, Chinese Academy of Sciences, China
  2. National Natural Science Foundation of China [31571275]
  3. Strategic Priority Research Program of the Chinese Academy of Sciences [XDA08020201]
  4. National High-Technology Research and Development Program (863 Program) [2011AA100403]

向作者/读者索取更多资源

An investigation comparing the performance of 41 analysis models for lncRNA identification revealed that no single model excelled under all test conditions. The efficiency of different models depended largely on the source of transcripts and the quality of assemblies. Recommendations were summarized for lncRNA identification under different situations, emphasizing the need for careful selection of appropriate tools based on actual data.
High-throughput RNA sequencing unveiled the complexity of transcriptome and significantly increased the records of long noncoding RNAs (lncRNAs), which were reported to participate in a variety of biological processes. Identification of lncRNAs is a key step in lncRNA analysis, and a bunch of bioinformatics tools have been developed for this purpose in recent years. While these tools allow us to identify lncRNA more efficiently and accurately, they may produce inconsistent results, making selection a confusing issue. We compared the performance of 41 analysis models based on 14 software packages and different data sets, including high-quality data and low-quality data from 33 species. In addition, computational efficiency, robustness, and joint prediction of the models were explored. As a practical guidance, key points for lncRNA identification under different situations were summarized. In this investigation, no one of these models could be superior to others under all test conditions. The performance of a model relied to a great extent on the source of transcripts and the quality of assemblies. As general references, FEELnc_all_cl, CPC, and CPAT_mouse work well in most species while COME, CNCl, and IncScore are good choices for model organisms. Since these tools are sensitive to different factors such as the species involved and the quality of assembly, researchers must carefully select the appropriate tool based on the actual data. Alternatively, our test suggests that joint prediction could behave better than any single model if proper models were chosen. All scripts/data used in this research can be accessed at http://bioinfo.ihb.ac.cn/elit.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据