☆ 4.7 Article

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

FIELD CROPS RESEARCH (2023)

期刊

FIELD CROPS RESEARCH

卷 302, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.fcr.2023.109063

关键词

Sustainable intensification; Model accuracy; Model precision; Linear mixed models; Machine learning

类别

Agronomy

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The performance of statistical and machine learning methods in explaining and predicting crop yield variability was assessed in this study. The results showed that big data from farmers' fields can to some extent explain on-farm yield variability, but not predict it across time and space.

Context: Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. Objective: The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers' fields in contrasting farming systems worldwide. Methods: A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data resampling over space and time. Results: Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R2 considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. Conclusion: Big data from farmers' fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. Significance: The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used.

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

期刊

FIELD CROPS RESEARCH

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

期刊

FIELD CROPS RESEARCH

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文