4.7 Article

Advanced predictive methods for wine age prediction: Part I - A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods

期刊

TALANTA
卷 171, 期 -, 页码 341-350

出版社

ELSEVIER SCIENCE BV
DOI: 10.1016/j.talanta.2016.10.062

关键词

Regression methods; Wine age prediction; Madeira wine; Sparsity; Collinearity

资金

  1. Promover a Producao Cientifica e Desenvolvimento Tecnologico e a Constituicao de Redes Tematicas [016658, PTDC/QEQ-EPS/1323/2014, POCI-01-0145-FEDER-016658, 3599-PPCDT]
  2. European Union's FEDER
  3. Fundação para a Ciência e a Tecnologia [PTDC/QEQ-EPS/1323/2014] Funding Source: FCT

向作者/读者索取更多资源

In this paper we test and compare advanced predictive approaches for estimating wine age in the context of the production of a high quality fortified wine - Madeira Wine. We consider four different data sets, namely, volatile, polyphenols, organic acids and the UV-vis spectra. Each one of these data sets contain chemical information of a different nature and present diverse data structures, namely a different dimensionality, level of collinearity and degree of sparsity. These different aspects may imply the use of different modelling approaches in order to better explore the data set's information content, namely their predictive potential for wine age. This happens to be so, because different regression methods have different prior assumptions regarding the predictors, response variable(s) and the data generating mechanism, which may or may not find good adherence to the case study under analysis. In order to cover a wide range of modelling domains, we have incorporated in this work methods belonging to four very distinct classes of approaches that cover most applications found in practice: linear regression with variable selection, penalized regression, latent variables regression and tree based ensemble methods. We have also developed a rigorous comparison framework based on a double Monte Carlo cross-validation scheme, in order to perform the relative assessment of the performance of the various methods. Upon comparison, models built using the polyphenols and volatile composition data sets led to better wine age predictions, showing lower errors under testing conditions. Furthermore, the results obtained for the polyphenols data set suggest a more sparse structure that can be further explored in order to reduce the number of measured variables. In terms of regression methods, tree-based methods, and boosted regression trees in particular, presented the best results for the polyphenols, volatile and the organic acid data sets, suggesting a possible presence of a nonlinear relationship between predictors and response. Regarding the UV-vis data set, penalized regression methods (ridge regression, LASSO and elastic nets) presented the best results, albeit methods such as partial least squares (PLS) or principal component regression (PCR) are often the practitioners' preferred choice.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据