4.7 Article

MpsLDA-ProSVM: Predicting multi-label protein subcellular localization by wMLDAe dimensionality reduction and ProSVM classifier

Journal

Publisher

ELSEVIER
DOI: 10.1016/j.chemolab.2020.104216

Keywords

Multi-label protein subcellular localization; Multi-information fusion; wMLDAe dimensionality reduction; ProSVM classifier

Funding

  1. National Natural Science Foundation of China [61863010]
  2. Key Research and Development Program of Shandong Province of China [2019GGX101001]
  3. Natural Science Foundation of Shandong Province of China [ZR2018MC007]
  4. Key Laboratory Open Foundation of Hainan Province [JSKX202001]

Ask authors/readers for more resources

The paper introduces a new prediction model, MpsLDA-ProSVM, which accurately predicts the specific subcellular localization of multi-label proteins in cells. By utilizing various coding algorithms and a weighted multi-label linear discriminant analysis framework, the model demonstrates high accuracy in virus, plant, Gram-positive bacteria, and Gram-negative bacteria datasets.
Mull-label proteins play a significant role in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of mull-label proteins in cells. This paper firstly presents a new prediction model named MpsLDA-ProSVM which predicts the SCL of mull-label proteins. Firstly, we utilize four coding algorithms including pseudo position-specific scoring matrix (PsePSSM), gene ontology (GO), conjoint triad (CT) and pseudo amino acid composition (PseAAC) to draw the feature information from protein sequences. Then, for the first time, we use a weighted mull-label linear discriminant analysis framework based on entropy weight form (wMLDAe) to refine and purify features. Finally, we input the optimal feature subset into the mull-label learning with label-specific features (LIFT) and mull-label k-nearest neighbor (ML-KNN) algorithms to obtain a synthetic ranking of relevant labels, and then use Prediction and Relevance Ordering based SVM (ProSVM) classifier to predict the SCLs. Tested by leave-one-out cross-validation (LOOCV), the overall actual accuracy on virus, plant, Gram-positive bacteria and Gram-negative bacteria datasets are 98.06%, 98.97%, 99.81% and 98.49%, which are 0.56%-9.16%, 1.07%30.87%, 0.21%-6.91% and 3.99%-8.59% higher than other advanced methods respectively. By comparison, the model MpsLDA-ProSVM can effectively predict the specific location of mull-label proteins in cells.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available