4.1 Article

Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis

Journal

MOLECULAR BIOSYSTEMS
Volume 10, Issue 8, Pages 2229-2235

Publisher

ROYAL SOC CHEMISTRY
DOI: 10.1039/c4mb00316k

Keywords

-

Funding

  1. National Nature Scientific Foundation of China [61202256, 61301260, 61100092]
  2. Nature Scientific Foundation of Hebei Province [C2013209105]
  3. Fundamental Research Funds for the Central Universities [ZYGX2012J113, ZYGX2013J102]

Ask authors/readers for more resources

The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (http://lin.uestc.edu.cn/server/PVPred). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available