4.7 Article

Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 24, Issue 5, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbad319

Keywords

crystallization; feature selection; machine learning; protein sequence; prediction model; support vector machine

Ask authors/readers for more resources

This study created a new pipeline to predict protein crystallization propensity using protein sequence. The pipeline includes feature selection, dimensionality reduction, and algorithm training, and achieved higher accuracy rates on three different datasets, providing a new solution for the challenge of multistage protein crystallization in computational biology.
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (chi(2)) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available