4.5 Article

Machine Learning Methods for X-Ray Scattering Data Analysis from Biomacromolecular Solutions

Journal

BIOPHYSICAL JOURNAL
Volume 114, Issue 11, Pages 2485-2492

Publisher

CELL PRESS
DOI: 10.1016/j.bpj.2018.04.018

Keywords

-

Categories

Funding

  1. iNEXT [653706]
  2. Horizon 2020 program of the European Union
  3. Bundesministerium fur Bildung und Forschung (grant TTSAS) [05K2016]
  4. Human Frontier Science Program [RGP0017/2012]

Ask authors/readers for more resources

Small-angle x-ray scattering (SAXS) of biological macromolecules in solutions is a widely employed method in structural biology. SAXS patterns include information about the overall shape and low-resolution structure of dissolved particles. Here, we describe how to transform experimental SAXS patterns to feature vectors and how a simple k-nearest neighbor approach is able to retrieve information on overall particle shape and maximal diameter (D-max) as well as molecular mass directly from experimental scattering data. Based on this transformation, we develop a rapid multiclass shape-classification ranging from compact, extended, and flat categories to hollow and random-chain-like objects. This classification may be employed, e.g., as a decision block in automated data analysis pipelines. Further, we map protein structures from the Protein Data Bank into the classification space and, in a second step, use this mapping as a data source to obtain accurate estimates for the structural parameters (D-max,D- molecular mass) of the macromolecule under study based on the experimental scattering pattern alone, without inverse Fourier transform for D-max. All methods presented are implemented in a Fortran binary DATCLASS, part of the ATSAS data analysis suite, available on Linux, Mac, and Windows and free for academic use.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available