期刊
IEEE TRANSACTIONS ON CYBERNETICS
卷 51, 期 12, 页码 5811-5824出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCYB.2019.2960515
关键词
Radio frequency; Vegetation; RNA; Genomics; Bioinformatics; Proteins; Optimization; Crosslinking immunoprecipitation sequencing (CLIP-seq) data; multiobjective optimization; RNA-binding proteins (RBPs)
类别
资金
- Research Grants Council of the Hong Kong [CityU 11203217, CityU 11200218]
- Hong Kong Institute for Data Science at City University of Hong Kong
- City University of Hong Kong [CityU 11202219]
- National Natural Science Foundation of China [61603087]
- Natural Science Foundation of Jilin Province [20190103006JH]
RNA-binding proteins play a crucial role in mRNA processing and post-transcriptional gene expression control. CLIP-seq technologies have enabled sequencing of genome-wide RNA-binding event data, leading to the development of methods for identifying protein-RNA interactions on a genome-wide scale, facilitating the understanding of protein functions in cellular processes. The computational method presented in this study, MFA, utilizing multiobjective biogeography-based optimization with random forest, demonstrates superior performance in identifying protein-RNA interactions compared to current methods, with insights gained through various analyses.
RNA-binding proteins (RBPs) are the master regulators of mRNA processing, which are vital players for the post-transcriptional control of gene expression. In recent years, crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled us to sequence massive amounts of genome-wide RNA-binding event data. Its increasing availability provides opportunities to identify protein-RNA interactions on a genome-wide scale. Genome-wide RNA-binding event detection methods have been developed to the understanding of the proteins' functions within cellular processes. Unfortunately, those methods often suffer from realistic restrictions, such as high costs, intensive computation, high dimensionality, numerical instability, and data sparsity. We present a computational method [multiobjective forest algorithm (MFA)] to identify protein-RNA interactions from CLIP-seq data by synergizing multiobjective biogeography-based optimization (BBO) with random forest (RF). Since most of the tree-structured classifiers in RF are unnecessarily bulky with extra time costs and memory consumption, multiobjective BBO is designed to prune the unsuitable tree-structured classifiers dynamically. Moreover, to direct the evolution dynamics of the MFA, two objective functions are formulated to balance model generality and complexity for robust performance. To validate our MFA method, we compare its performance across 31 large-scale CLIP-seq datasets. The experimental results demonstrate that MFA can obtain superior performance over the current state-of-the-art methods. Mechanistic insights are also revealed and discussed to explore the multifaceted aspects of MFA through data source importance analysis, matrix rank estimations, seeding component perturbations, and multiobjective optimization methodology comparisons.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据