☆ 4.7 Article

Advanced predictive methods for wine age prediction: Part I - A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods

TALANTA (2017)

期刊

TALANTA

卷 171, 期 -, 页码 341-350

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/j.talanta.2016.10.062

关键词

Regression methods; Wine age prediction; Madeira wine; Sparsity; Collinearity

类别

Chemistry, Analytical

资金

Promover a Producao Cientifica e Desenvolvimento Tecnologico e a Constituicao de Redes Tematicas [016658, PTDC/QEQ-EPS/1323/2014, POCI-01-0145-FEDER-016658, 3599-PPCDT]
European Union's FEDER
Fundação para a Ciência e a Tecnologia [PTDC/QEQ-EPS/1323/2014] Funding Source: FCT

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper we test and compare advanced predictive approaches for estimating wine age in the context of the production of a high quality fortified wine - Madeira Wine. We consider four different data sets, namely, volatile, polyphenols, organic acids and the UV-vis spectra. Each one of these data sets contain chemical information of a different nature and present diverse data structures, namely a different dimensionality, level of collinearity and degree of sparsity. These different aspects may imply the use of different modelling approaches in order to better explore the data set's information content, namely their predictive potential for wine age. This happens to be so, because different regression methods have different prior assumptions regarding the predictors, response variable(s) and the data generating mechanism, which may or may not find good adherence to the case study under analysis. In order to cover a wide range of modelling domains, we have incorporated in this work methods belonging to four very distinct classes of approaches that cover most applications found in practice: linear regression with variable selection, penalized regression, latent variables regression and tree based ensemble methods. We have also developed a rigorous comparison framework based on a double Monte Carlo cross-validation scheme, in order to perform the relative assessment of the performance of the various methods. Upon comparison, models built using the polyphenols and volatile composition data sets led to better wine age predictions, showing lower errors under testing conditions. Furthermore, the results obtained for the polyphenols data set suggest a more sparse structure that can be further explored in order to reduce the number of measured variables. In terms of regression methods, tree-based methods, and boosted regression trees in particular, presented the best results for the polyphenols, volatile and the organic acid data sets, suggesting a possible presence of a nonlinear relationship between predictors and response. Regarding the UV-vis data set, penalized regression methods (ridge regression, LASSO and elastic nets) presented the best results, albeit methods such as partial least squares (PLS) or principal component regression (PCR) are often the practitioners' preferred choice.

Advanced predictive methods for wine age prediction: Part I - A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods

期刊

TALANTA

出版社

ELSEVIER SCIENCE BV

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Advanced predictive methods for wine age prediction: Part I - A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods

期刊

TALANTA

出版社

ELSEVIER SCIENCE BV

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文