4.7 Article

Cocoa origin classifiability through LC-MS data: A statistical approach for large and long-term datasets

Journal

FOOD RESEARCH INTERNATIONAL
Volume 140, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.foodres.2020.109983

Keywords

Theobroma cacao; LC-MS; Principal component analysis (PCA); Linear discriminant analysis (LDA); Origin classification; Feature selection

Funding

  1. COMETA project - Barry Callebaut AG

Ask authors/readers for more resources

Classifying food samples based on their countries of origin is essential in the food industry, utilizing LC-MS for detailed chemical analysis. Challenges such as experimental conditions and instrumental effects make this task difficult, with PCA offering limited separation and LDA being influenced by non-linear compound dependencies. Introducing a compound selection criterion based on Gaussian distribution of intensities can enhance origin clustering in the dataset.
Classification of food samples based upon their countries of origin is an important task in food industry for quality assurance and development of fine flavor products. Liquid chromatography - mass spectrometry (LC-MS) provides a fast technique for obtaining in-depth information about chemical composition of foods. However, in a large dataset that is gathered over a period of few years, multiple, incoherent and hard to avoid sources of variations e.g., experimental conditions, transportation, batch and instrumental effects, etc. pose technical challenges that make the study of origin classification a difficult problem. Here, we use a large dataset gathered over a period of four years containing 297 LC-MS profiles of cocoa sourced from 10 countries to demonstrate these challenges by using two popular multivariate analysis methods: principal component analysis (PCA) and linear discriminant analysis (LDA). We show that PCA provides a limited separation in bean origin, while LDA suffers from a strong non-linear dependence on the set of compounds. Further, we show for LDA that a compound selection criterion based on Gaussian distribution of intensities across samples dramatically enhances origin clustering of samples thereby suggesting possibilities for studying marker compounds in such a disparate dataset through this approach. In essence, we show and develop a new approach that maximizes, avoiding overfitting, the utility of multivariate analysis in a highly complex dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available