4.7 Article

Stable isotope and trace element analyses with non-linear machine-learning data analysis improved coffee origin classification and marker selection

Journal

JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE
Volume 103, Issue 9, Pages 4704-4718

Publisher

WILEY
DOI: 10.1002/jsfa.12546

Keywords

coffee; origin traceability; machine learning; stable isotopes; trace elements; marker selection

Ask authors/readers for more resources

This study utilized stable isotope and trace element analyses in combination with non-linear machine learning data analysis to classify the geographical origin of green coffee beans. The results showed good prediction of origin at the country and regional levels. However, the prediction was poor at the continental and Central American regional levels. Non-linear machine learning techniques improved accuracy and identified more relevant origin markers.
BACKGROUNDThis study investigated the geographical origin classification of green coffee beans from continental to country and regional levels. An innovative approach combined stable isotope and trace element analyses with non-linear machine learning data analysis to improve coffee origin classification and marker selection. Specialty green coffee beans sourced from three continents, eight countries, and 22 regions were analyzed by measuring five isotope ratios (delta C-13, delta N-15, delta O-18, delta H-2, and delta S-34) and 41 trace elements. Partial least squares discriminant analysis (PLS-DA) was applied to the integrated dataset for origin classification. RESULTSOrigins were predicted well at the country level and showed promise at the regional level, with discriminating marker selection at all levels. However, PLS-DA predicted origin poorly at the continental and Central American regional levels. Non-linear machine learning techniques improved predictions and enabled the identification of a higher number of origin markers, and those that were identified were more relevant. The best predictive accuracy was found using ensemble decision trees, random forest and extreme gradient boost, with accuracies of up to 0.94 and 0.89 for continental and Central American regional models, respectively. CONCLUSIONThe potential for advanced machine learning models to improve origin classification and the identification of relevant origin markers was demonstrated. The decision-tree-based models were superior with their embedded variable identification features and visual interpretation. (c) 2023 The Authors. Journal of The Science of Food and Agriculture published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available