4.6 Article

An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction

Journal

CURRENT BIOINFORMATICS
Volume 11, Issue 1, Pages 4-12

Publisher

BENTHAM SCIENCE PUBL LTD
DOI: 10.2174/1574893611666151119221435

Keywords

Features fusion; features selection; Random Forests; protein-protein interaction

Funding

  1. Natural Science Foundation of China [61370010, 61472333, 81101115]
  2. Natural Science Foundation of Fujian Province of China [2014J01253, 2011J01371]
  3. Shanghai Key Laboratory of Intelligent Information Processing, China [IIPL-2014-004]

Ask authors/readers for more resources

With recent development of bioinformatics, the importance of understanding protein function has been widely acknowledged. Most proteins perform their functions by interacting with other proteins. Hence, it is urgent to explore the protein-protein interaction (PPI). At present, the prediction of PPIs is still a tough problem. Despite the fact that a variety of computational methods have been proposed to identify PPIs; unfortunately, most of them are complex and with low accuracy. Traditional methods extract features following two steps: firstly, they extract features from two proteins of a PPI; secondly, they regard two features as strings, and do concatenation operator. Concatenation is an outcome of an addition operation on strings. The concatenation operator increases redundancy features with the result being associated with the order of concatenation. Based on this, in this paper, we study the features fusion and features selection. The presented framework consists of three stages: in the first stage, we get the negative data set from off-the-shelf database. The reliability of negative data set of previous studies has not been of concern to us. While in the second stage, the n-gram frequency method was used to preprocess the PPIs sequences. The third one was applied to splice the final feature, and then the features were selected to find the optimal feature. Finally, an effective parameter for the Random Forest Classifier was selected. Experiments carried out on real data set showed that our features fusion method outperformed traditional methods in terms of protein-protein interaction prediction. The encouraging results can be helpful for future research in protein function.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available