4.7 Article

A feature extraction free approach for protein interactome inference from co-elution data

期刊

BRIEFINGS IN BIOINFORMATICS
卷 -, 期 -, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbad229

关键词

co-fractionation coupled with mass spectrometry; convolutional neural networks; protein interactome; feature extraction free; data imbalance

向作者/读者索取更多资源

Protein complexes play a crucial role in cellular processes. High-throughput techniques like co-fractionation coupled with mass spectrometry (CF-MS) have revolutionized protein complex studies, but distinguishing true interactions from false positives remains challenging. To address this, researchers have developed computational methods that utilize CF-MS data for probabilistic protein-protein interaction (PPI) network construction. However, existing methods suffer from bias and overfitting due to imbalanced data distribution and reliance on handcrafted features. In this study, a balanced end-to-end learning architecture called SPIFFED is introduced, which integrates raw CF-MS data feature representation and interactome prediction using a convolutional neural network. SPIFFED outperforms current state-of-the-art methods in PPI prediction and allows users to infer high-confidence protein complexes using clustering software. The source code for SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.
Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic protein-protein interaction (PPI) networks. Current methods usually first infer PPIs based on handcrafted CF -MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high-confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据