4.5 Article

Statistical Learning from Single-Molecule Experiments: Support Vector Machines and Expectation-Maximization Approaches to Understanding Protein Unfolding Data

Journal

JOURNAL OF PHYSICAL CHEMISTRY B
Volume 125, Issue 22, Pages 5794-5808

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jpcb.1c02334

Keywords

-

Funding

  1. NIH [R01HL148227]
  2. NSF [MCB-2027530]

Ask authors/readers for more resources

Single-molecule force spectroscopy is a powerful tool for exploring dynamic processes involving proteins, but interpreting experimental data remains a challenge. This study tested Support Vector Machines and Expectation Maximization approaches for statistical learning from dynamic force experiments using molecular modeling output as training sets. An EM-based method was designed to directly analyze experimental data without the need for data classification, showing good performance even with small sample sizes and overlapping force ranges for unfolding transitions.
Single-molecule force spectroscopy has become a powerful tool for the exploration of dynamic processes that involve proteins; yet, meaningful interpretation of the experimental data remains challenging. Owing to low signal-to-noise ratio, experimental force-extension spectra contain force signals due to nonspecific interactions, tip or substrate detachment, and protein desorption. Unravelling of complex protein structures results in the unfolding transitions of different types. Here, we test the performance of Support Vector Machines (SVM) and Expectation Maximization (EM) approaches in statistical learning from dynamic force experiments. When the output from molecular modeling in silico (or other studies) is used as a training set, SVM and EM can be applied to understand the unfolding force data. The maximal margin or maximum likelihood classifier can be used to separate experimental test observations into the unfolding transitions of different types, and EM optimization can then be utilized to resolve the statistics of unfolding forces: weights, average forces, and standard deviations. We designed an EM-based approach, which can be directly applied to the experimental data without data classification and division into training and test observations. This approach performs well even when the sample size is small and when the unfolding transitions are characterized by overlapping force ranges.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available