4.5 Article

Statistical Learning from Single-Molecule Experiments: Support Vector Machines and Expectation-Maximization Approaches to Understanding Protein Unfolding Data

期刊

JOURNAL OF PHYSICAL CHEMISTRY B
卷 125, 期 22, 页码 5794-5808

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jpcb.1c02334

关键词

-

资金

  1. NIH [R01HL148227]
  2. NSF [MCB-2027530]

向作者/读者索取更多资源

Single-molecule force spectroscopy is a powerful tool for exploring dynamic processes involving proteins, but interpreting experimental data remains a challenge. This study tested Support Vector Machines and Expectation Maximization approaches for statistical learning from dynamic force experiments using molecular modeling output as training sets. An EM-based method was designed to directly analyze experimental data without the need for data classification, showing good performance even with small sample sizes and overlapping force ranges for unfolding transitions.
Single-molecule force spectroscopy has become a powerful tool for the exploration of dynamic processes that involve proteins; yet, meaningful interpretation of the experimental data remains challenging. Owing to low signal-to-noise ratio, experimental force-extension spectra contain force signals due to nonspecific interactions, tip or substrate detachment, and protein desorption. Unravelling of complex protein structures results in the unfolding transitions of different types. Here, we test the performance of Support Vector Machines (SVM) and Expectation Maximization (EM) approaches in statistical learning from dynamic force experiments. When the output from molecular modeling in silico (or other studies) is used as a training set, SVM and EM can be applied to understand the unfolding force data. The maximal margin or maximum likelihood classifier can be used to separate experimental test observations into the unfolding transitions of different types, and EM optimization can then be utilized to resolve the statistics of unfolding forces: weights, average forces, and standard deviations. We designed an EM-based approach, which can be directly applied to the experimental data without data classification and division into training and test observations. This approach performs well even when the sample size is small and when the unfolding transitions are characterized by overlapping force ranges.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据