4.7 Article

Leveraging Google Earth Engine (GEE) and machine learning algorithms to incorporate in situ measurement from different times for rangelands monitoring

Journal

REMOTE SENSING OF ENVIRONMENT
Volume 236, Issue -, Pages -

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.rse.2019.111521

Keywords

Google earth engine; Big data; Machine learning; Domain adaptation; Transfer learning; Feature selection; Rangeland monitoring

Funding

  1. NASA [NNX17AG50G]

Ask authors/readers for more resources

Mapping and monitoring of indicators of soil cover, vegetation structure, and various native and non-native species is a critical aspect of rangeland management. With the advancement in satellite imagery as well as cloud storage and computing, the capability now exists to conduct planetary-scale analysis, including mapping of rangeland indicators. Combined with recent investments in the collection of large amounts of in situ data in the western U.S., new approaches using machine learning can enable prediction of surface conditions at times and places when no in situ data are available. However, little analysis has yet been done on how the temporal relevancy of training data influences model performance. Here, we have leveraged the Google Earth Engine (GEE) platform and a machine learning algorithm (Random Forest, after comparison with other candidates) to identify the potential impact of different sampling times (across months and years) on estimation of rangeland indicators from the Bureau of Land Management's (BLM) Assessment, Inventory, and Monitoring (AIM) and Landscape Monitoring Framework (LMF) programs. Our results indicate that temporally relevant training data improves predictions, though the training data need not be from the exact same month and year for a prediction to be temporally relevant. Moreover, inclusion of training data from the time when predictions are desired leads to lower prediction error but the addition of training data from other times does not contribute to overall model error. Using all of the available training data can lead to biases, toward the mean, for times when indicator values are especially high or low. However, for mapping purposes, limiting training data to just the time when predictions are desired can lead to poor predictions of values outside the spatial range of the training data for that period. We conclude that the best Random Forest prediction maps will use training data from all possible times with the understanding that estimates at the extremes will be biased.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available