4.7 Article

Support vector machine: Classifying and predicting mutagenicity of complex mixtures based on pollution profiles

Journal

TOXICOLOGY
Volume 313, Issue 2-3, Pages 151-159

Publisher

ELSEVIER IRELAND LTD
DOI: 10.1016/j.tox.2013.01.016

Keywords

Support vector machine; Complex mixture; Pollution profile; Mutagenicity

Funding

  1. National Key Technology R&D Program in the 11th Five Year Plan [2006BAI19B02]
  2. National Natural Science Foundation of China [30972438, 30771770, 81273035, 81202165]
  3. Key Project of National High-tech R&D Program of China (863 Program) [2008AA062501, 2013AA065204]
  4. Shanghai Municipal Health Bureau Leading Academic Discipline Project [08GWD14]
  5. Dawn Program of Shanghai Education Commission [07SG01]
  6. Non-profit Foundation of National Health Ministry in the 12th Five Year Plan [201302004, 2012BAJ25805]

Ask authors/readers for more resources

Powerful, robust in silico approaches offer great promise for classifying and predicting biological effects of complex mixtures and for identifying the constituents of greatest concern. Support vector machine (SVM) methods can deal with high dimensional data and small sample size and examine multiple interrelationships among samples. In this work, we applied SVM methods to examine pollution profiles and mutagenicity of 60 water samples obtained from 6 cities in China during 2006-2011. Pollutant profiles were characterized in water extracts by gas chromatography-mass spectrometry (GC/MS) and mutagenicity examined by Ames assays. We encoded feature vectors of GS-MS peaks in the mixtures and used 48 samples as the training set, reserving 12 samples as the test set. The SVM model and regression were constructed from whole pollution profiles that ranked compounds in relation to their correlation to the mutagenicity. Both classification and prediction performance were evaluated. The SVM model based on whole pollution profiles showed lower performance (sensitivity, specificity, accuracy and correlation coefficient were 69.5-70.7%, 70.6-73.2%, 69.9-72.1%, and 0.55-0.59%, respectively) than one based on compounds with highest association with mutagenicity. A SVM model with the top 10 compounds had the highest performance (sensitivity, specificity, accuracy, and correlation coefficient were 89.8-90.3%, 90.1-92.1%, 90.1-91.3%, and 0.80-0.82%, respectively), with negligible decreases in performance between the test and training set. SVM can be a powerful, robust classifier of the relationship of pollutants and mutagenicity in complex real-world mixtures. The top 14 compounds have the greatest contribution to mutagenicity and deserve further studies to identify these constituents. (C) 2013 Elsevier Ireland Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available