4.7 Article

Large-Scale Prediction of Human Protein-Protein Interactions from Amino Acid Sequence Based on Latent Topic Features

Journal

JOURNAL OF PROTEOME RESEARCH
Volume 9, Issue 10, Pages 4992-5001

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/pr100618t

Keywords

protein-protein interaction; latent dirichlet allocation; latent semantic space; latent topic feature; random forest

Funding

  1. National Natural Science Foundation of China [60704047]
  2. Science and Technology Commission of Shanghai Municipality [08ZR1410600, 08JC1410600]
  3. Shanghai Municipal Education Commission [10ZZ17]

Ask authors/readers for more resources

Protein-protein interaction (PPI) is at the core of the entire interactomic system of any living organism. Although there are many human protein-protein interaction links being experimentally determined, the number is still relatively very few compared to the estimation that there are similar to 300 000 protein-protein interactions in human beings. Hence, it is still urgent and challenging to develop automated computational methods to accurately and efficiently predict protein-protein interactions. In this paper, we propose a novel hierarchical LDA-RF (latent dirichlet allocation-random forest) model to predict human protein-protein interactions from protein primary sequences directly, which is featured by a high success rate and strong ability for handling large-scale data sets by digging the hidden internal structures buried into the noisy amino acid sequences in low dimensional latent semantic space. First, the local sequential features represented by conjoint triads are constructed from sequences. Then the generative LDA model is used to project the original feature space into the latent semantic space to obtain low dimensional latent topic features, which reflect the hidden structures between proteins. Finally, the powerful random forest model is used to predict the probability for interaction of two proteins. Our results show that the proposed latent topic feature is very promising for PPI prediction and could also become a powerful strategy to deal with many other bioinformatics problems. As a web server, LDA-RF is freely available at http://www.csbio.sjtu.edu.cn/bioinf/LR_PPI for academic use.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available