4.6 Article

Multiobjective Genome-Wide RNA-Binding Event Identification From CLIP-Seq Data

期刊

IEEE TRANSACTIONS ON CYBERNETICS
卷 51, 期 12, 页码 5811-5824

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCYB.2019.2960515

关键词

Radio frequency; Vegetation; RNA; Genomics; Bioinformatics; Proteins; Optimization; Crosslinking immunoprecipitation sequencing (CLIP-seq) data; multiobjective optimization; RNA-binding proteins (RBPs)

资金

  1. Research Grants Council of the Hong Kong [CityU 11203217, CityU 11200218]
  2. Hong Kong Institute for Data Science at City University of Hong Kong
  3. City University of Hong Kong [CityU 11202219]
  4. National Natural Science Foundation of China [61603087]
  5. Natural Science Foundation of Jilin Province [20190103006JH]

向作者/读者索取更多资源

RNA-binding proteins play a crucial role in mRNA processing and post-transcriptional gene expression control. CLIP-seq technologies have enabled sequencing of genome-wide RNA-binding event data, leading to the development of methods for identifying protein-RNA interactions on a genome-wide scale, facilitating the understanding of protein functions in cellular processes. The computational method presented in this study, MFA, utilizing multiobjective biogeography-based optimization with random forest, demonstrates superior performance in identifying protein-RNA interactions compared to current methods, with insights gained through various analyses.
RNA-binding proteins (RBPs) are the master regulators of mRNA processing, which are vital players for the post-transcriptional control of gene expression. In recent years, crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled us to sequence massive amounts of genome-wide RNA-binding event data. Its increasing availability provides opportunities to identify protein-RNA interactions on a genome-wide scale. Genome-wide RNA-binding event detection methods have been developed to the understanding of the proteins' functions within cellular processes. Unfortunately, those methods often suffer from realistic restrictions, such as high costs, intensive computation, high dimensionality, numerical instability, and data sparsity. We present a computational method [multiobjective forest algorithm (MFA)] to identify protein-RNA interactions from CLIP-seq data by synergizing multiobjective biogeography-based optimization (BBO) with random forest (RF). Since most of the tree-structured classifiers in RF are unnecessarily bulky with extra time costs and memory consumption, multiobjective BBO is designed to prune the unsuitable tree-structured classifiers dynamically. Moreover, to direct the evolution dynamics of the MFA, two objective functions are formulated to balance model generality and complexity for robust performance. To validate our MFA method, we compare its performance across 31 large-scale CLIP-seq datasets. The experimental results demonstrate that MFA can obtain superior performance over the current state-of-the-art methods. Mechanistic insights are also revealed and discussed to explore the multifaceted aspects of MFA through data source importance analysis, matrix rank estimations, seeding component perturbations, and multiobjective optimization methodology comparisons.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据