☆ 4.6 Article

A deep learning framework for identifying essential proteins based on multiple biological information

BMC BIOINFORMATICS (2022)

期刊

BMC BIOINFORMATICS

卷 23, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s12859-022-04868-8

关键词

Essential protein; Deep learning; Protein-protein interaction network; Subcellular localization; Gene expression

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Mathematical & Computational Biology

资金

key scientific and technological breakthroughs in Anhui Province Innovation of excellent wheat germplasm resources, discovery of important new genes and application in wheat molecular design breeding [2021d06050003]
Three Renewal and One Creation Innovation Platform Fund-Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information (Anhui Development and Reform Innovation) [[2020]555]
Open Fund of State Key Laboratory of Tea Plant Biology and Utilization [SKLTOF20150103]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study proposes a deep learning framework to predict essential proteins by integrating features from protein-protein interaction networks, subcellular localization, and gene expression profiles. Experimental results show that the model outperforms traditional centrality methods and machine learning methods, and the processing strategy of multiple biological information is preferable.

Background Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein-protein interaction (PPI) networks. Machine learning approaches based on high-throughput data lack the exploitation of the temporal and spatial dimensions of biological information. Results We put forward a deep learning framework to predict essential proteins by integrating features obtained from the PPI network, subcellular localization, and gene expression profiles. In our model, the node2vec method is applied to learn continuous feature representations for proteins in the PPI network, which capture the diversity of connectivity patterns in the network. The concept of depthwise separable convolution is employed on gene expression profiles to extract properties and observe the trends of gene expression over time under different experimental conditions. Subcellular localization information is mapped into a long one-dimensional vector to capture its characteristics. Additionally, we use a sampling method to mitigate the impact of imbalanced learning when training the model. With experiments carried out on the data of Saccharomyces cerevisiae, results show that our model outperforms traditional centrality methods and machine learning methods. Likewise, the comparative experiments have manifested that our process of various biological information is preferable. Conclusions Our proposed deep learning framework effectively identifies essential proteins by integrating multiple biological data, proving a broader selection of subcellular localization information significantly improves the results of prediction and depthwise separable convolution implemented on gene expression profiles enhances the performance.

A deep learning framework for identifying essential proteins based on multiple biological information

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A deep learning framework for identifying essential proteins based on multiple biological information

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文