4.6 Article

Random Forests for Spatially Dependent Data

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
卷 118, 期 541, 页码 665-683

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/01621459.2021.1950003

关键词

Gaussian processes; Generalized least squares; Random forests; Spatial

向作者/读者索取更多资源

Spatial linear mixed-models are widely used for geospatial data analysis but often ignore spatial correlation in the presence of nonlinear covariate effects. Random forests, a popular method for estimating nonlinear functions, also suffer from the same issue. We propose a novel extension called RF-GLS that incorporates the spatial correlation modeled using Gaussian process in random forests, and show its superior performance in estimation and prediction compared to traditional random forests.
Spatial linear mixed-models, consisting of a linear covariate effect and a Gaussian process (GP) distributed spatial random effect, are widely used for analyses of geospatial data. We consider the setting where the covariate effect is nonlinear. Random forests (RF) are popular for estimating nonlinear functions but applications of RF for spatial data have often ignored the spatial correlation. We show that this impacts the performance of RF adversely. We propose RF-GLS, a novel and well-principled extension of RF, for estimating nonlinear covariate effects in spatial mixed models where the spatial correlation is modeled using GP. RF-GLS extends RF in the same way generalized least squares (GLS) fundamentally extends ordinary least squares (OLS) to accommodate for dependence in linear models. RF becomes a special case of RF-GLS, and is substantially outperformed by RF-GLS for both estimation and prediction across extensive numerical experiments with spatially correlated data. RF-GLS can be used for functional estimation in other types of dependent data like time series. We prove consistency of RF-GLS for beta-mixing dependent error processes that include the popular spatial Matern GP. As a byproduct, we also establish, to our knowledge, the first consistency result for RF under dependence. We establish results of independent importance, including a general consistency result of GLS optimizers of data-driven function classes, and a uniform law of large number under beta-mixing dependence with weaker assumptions. These new tools can be potentially useful for asymptotic analysis of other GLS-style estimators in nonparametric regression with dependent data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据