☆ 4.7 Article

Marriage between variable selection and prediction methods to model plant disease risk

EUROPEAN JOURNAL OF AGRONOMY (2023)

期刊

EUROPEAN JOURNAL OF AGRONOMY

卷 151, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.eja.2023.126995

关键词

Logistic regression; Random forest; Feature selection; Prediction models; Multicollinearity; Pathosystems

类别

Agronomy

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study aimed to evaluate different combinations of variable selection methods with linear and non-linear predictors to fit climate-based disease models and predict the occurrence of diseases in pathosystems. The results showed that the feature selection methods had no impact on the accuracy of predictions in the random forest algorithm, while the stepwise regression combined with VIF and p-value criteria outperformed other methods in fitting the logistic linear regression model.

Predicting the risk of a disease in a pathosystem based on a set of climatic variables usually requires handling a high number of input variables, many of which are often irrelevant and/or redundant. Building linear predictive models entails not only dimensionality issues but also the negative impact of multicollinearity. Several feature selection methods have proved to be efficient in both linear and non-linear models, regardless of those issues. However, in a machine learning (ML) context, it is necessary to evaluate these feature selection methods embedded into the model fitting algorithm to obtain the greatest accuracy. The aim of this work was to assess different combinations of variable selection methods with linear and non-linear predictors to fit climate-based models that predict the occurrence of a disease in a pathosystem. Four selection methods were compared: stepwise, which is frequently used in linear models, combined with VIF and p-value statistical criteria (Step+VIF+Pv), and other methods commonly used in ML: filter (F), genetic algorithm (GA), and Boruta (B). The disease risk predictors were constructed with a logistic linear regression model (LR) and the random forest (RF) algorithm, using all the available variables and the subgroups of variables selected by each feature selection method. Data from three pathosystems were processed: two involving Begomovirus -one in common bean (Phaseolus vulgaris L) and the other in soybean (Glycine max)- and the third one involving Mal de Rio Cuarto virus in maize (Zea mays L.). The data sets differed in sample size and number of variables. The accuracy of RF pre-diction did not vary among feature selection methods. Step+VIF+Pv was used to reduce the model outperformed the other feature selection methods in fitting LR. Our proposal suggests that the appropriate pairing of variable selection and prediction models would improve the modeling of plant disease risk.

Marriage between variable selection and prediction methods to model plant disease risk

期刊

EUROPEAN JOURNAL OF AGRONOMY

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Marriage between variable selection and prediction methods to model plant disease risk

期刊

EUROPEAN JOURNAL OF AGRONOMY

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文