4.7 Review

Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 4, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa312

Keywords

anti-cancer peptides; bioinformatics; prediction; sequence analysis; ensemble learning; performance assessment

Funding

  1. National Natural Science Foundation of China [61972322]
  2. National Health and Medical Research Council of Australia (NHMRC) [1092262]
  3. Australian Research Council (ARC) [LP110200333, DP120104460]
  4. National Institute of Allergy and Infectious Diseases of the National Institutes of Health [R01 AI111965]
  5. Monash University
  6. Collaborative Research Program of Institute for Chemical Research, Kyoto University [2018-28]

Ask authors/readers for more resources

Anti-cancer peptides (ACPs) are potential therapeutics for cancer with the ability to target cancer cells specifically. Various machine learning methods have been developed for in silico identification of ACPs, leading to extensive research on their therapeutic mechanisms. Summarizing the advantages and disadvantages of existing methods and providing suggestions for improvement is necessary to advance the accurate identification of ACPs.
Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naive Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available