4.5 Article

Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net

Journal

ANALYTICAL BIOCHEMISTRY
Volume 609, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ab.2020.113903

Keywords

Crotonylation sites; Multi-feature fusion; SMOTE; Elastic net; LightGBM

Funding

  1. National Natural [27] Science Foundation of China [61863010]
  2. Key Research and Development Program of Shandong Province of China [2019GGX101001]
  3. Natural Science Foundation of Shandong [29] Province of China [ZR2018MCOO7, ZR2019MEEO66]

Ask authors/readers for more resources

Lysine crotonylation is an important protein post-translational modification, which plays an important role in the process of chromosome organization and nucleic acid metabolism. Recognition of crotonylation sites is important to understand the function and mechanism of proteins. Traditional experimental methods are time-consuming and expensive, and can't predict crotonylation sites quickly and accurately. Therefore, this paper proposes a novel crotonylation sites prediction method called LightGBM-CroSite. First, binary encoding (BE), position weight amino acid composition (PWAA), encoding based on grouped weight (EBGW), k nearest neighbors (KNN), pseudo-position specific scoring matrix (PsePSSM) are used to extract features of protein sequences and obtain the original feature space. Second, the elastic net is used to remove redundant information and select the optimal feature subset. Third, the synthetic minority oversampling technique (SMOTE) is used to balance the samples. Finally, the balanced feature vectors are input into LightGBM to predict the crotonylation sites. According to the result of jackknife test, the Accuracy (ACC), Matthew's correlation coefficient (MCC) and area under ROC curve (AUC) are 98.99%, 0.9798 and 0.9996, respectively. Compared with other state-of-the-art methods, the results show that our method has a better model performance on the crotonylation sites prediction. The source code and all datasets are available at https://github.com/QUST/LightGBM-CroSitc/.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available