4.7 Article

Modelling wheat yield with antecedent information, satellite and climate data using machine learning methods in Mexico

Journal

AGRICULTURAL AND FOREST METEOROLOGY
Volume 300, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.agrformet.2020.108317

Keywords

Climate data; Food security; Machine learning; Satellite data; Wheat yield

Ask authors/readers for more resources

This study developed a Machine Learning approach to predict wheat yield in Mexico by combining satellite, climate data, and antecedent yield information. Non-linear models performed better under the FS = 0.5 scenario, while linear models were less sensitive to feature reduction. The highest prediction accuracy was obtained by the rf method.
Wheat is one of the most important cereal crops in the world, and its demand is expected to increase about 60% by 2050. Thus, appropriate and reliable yield forecasts are fundamental to ensure price stability and food security around the globe. In this study, we developed a Machine Learning (ML) approach to combine satellite and climate data with antecedent wheat yield information (YieldBaseLine) from 2004 - 2018, at municipal level, in Mexico. We compared the performance of four linear (generalized linear model -glm-, ridge regression -ridge-, lasso, partial least squares -pls-) and four non-linear algorithms (k-nearest neighbours -kknn-, support vector machine radial -svmR-, extreme gradient boosting -xgbTree- and random forest -rf) before harvest time. Additionally, we evaluated their performance using five different feature selection scenarios (No FS, FS = 0.9, FS = 0.75, FS = 0.9 and YieldBaseLine). The models were independently tested using two different approaches: random sampling and selective sampling. In the random sampling, the non-linear models performed generally better under the FS = 0.5 scenario, whereas the non-linear models were less sensitive to feature reduction. The results also evidenced the capacity of the YieldBaseLine predictor, combined with satellite and climate data, to address the inter-annual and spatial variability in the study area. The highest prediction accuracy was obtained by the rf method (No FS) with R-2 = 0.84. To further prove the model's operability in a simulated real-case scenario, we held out the last year records (2018) to test the models. The best performing model was again the rf (R-2 = 0.81). This study proposes a robust methodology to model crop yield (at large scale) and it may be used with operative purposes. Therefore, it can be of interest to decision and law makers, producers, authorities or the wheat industry. In addition, it can help to establish appropriate food security and trading policies. A similar approach can be applied to other regions or crops.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available