4.7 Article

STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 23, Issue 1, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbab376

Keywords

lysine acetylation sites; bioinformatics; stacking strategy; machine learning; feature optimization; performance assessment

Funding

  1. National Research Foundation of Korea (NRF) - Korean government (MSIT) [2021R1A2C1014338, 2019R1I1A1A01062260, 2020R1A4A4079722]
  2. National Research Foundation of Korea [2019R1I1A1A01062260, 2020R1A4A4079722, CG042502, 2021R1A2C1014338] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

Protein post-translational modification (PTM) is a crucial regulatory mechanism. Identifying protein lysine acetylation (Kace) sites is challenging, but this study proposes a novel predictor STALLION that accurately identifies Kace sites in prokaryotic species. The predictor utilizes multiple models and feature encodings, outperforming existing predictors in benchmarking experiments.
Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available