4.7 Article

PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations

Journal

Publisher

MDPI
DOI: 10.3390/ijms22042120

Keywords

pupylation; feature encoding; chi-squared; machine learning

Funding

  1. Japan Society for the Promotion of Science (JSPS) [19H04208, 19F19377]
  2. Ministry of Economy, Trade and Industry, Japan (METI)
  3. Japan Agency for Medical Research and Development (AMED)
  4. Grants-in-Aid for Scientific Research [19F19377] Funding Source: KAKEN

Ask authors/readers for more resources

The PUP-Fuse is a new prediction model for pupylation site prediction that integrates multiple sequence representations and achieves good prediction results based on machine learning algorithms.
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available