☆ 4.6 Article

A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE (2017)

Journal

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Volume 31, Issue 1, Pages 17-35

Publisher

TAYLOR & FRANCIS LTD

DOI: 10.1080/13658816.2015.1131830

Keywords

Spatiotemporal index; big climate data; array-based; Hadoop MapReduce; HDFS; NASA MERRA; climate change

Funding

NSF [PLR-1349259, IIP-1338925, CNS-1117300, ICER-1343759]
NASA [NNG12PP37I, NNG14HH38I]
Direct For Computer & Info Scie & Enginr
Division Of Computer and Network Systems [1338925] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient processing of these data is essential for assessing global challenges such as climate change, natural disasters, and diseases. This is challenging not only because of the large data volume, but also because of the intrinsic high-dimensional nature of geoscience data. To tackle this challenge, we propose a spatiotemporal indexing approach to efficiently manage and process big climate data with MapReduce in a highly scalable environment. Using this approach, big climate data are directly stored in a Hadoop Distributed File System in its original, native file format. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieval when performing spatiotemporal queries. Based on the index, a data-partitioning algorithm is applied to enable MapReduce to achieve high data locality, as well as balancing the workload. The proposed indexing approach is evaluated using the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. The experimental results show that the index can significantly accelerate querying and processing (similar to 10x speedup compared to the baseline test using the same computing cluster), while keeping the index-to-data ratio small (0.0328%). The applicability of the indexing approach is demonstrated by a climate anomaly detection deployed on a NASA Hadoop cluster. This approach is also able to support efficient processing of general array-based spatiotemporal data in various geoscience domains without special configuration on a Hadoop cluster.

A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce

Journal

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Publisher

TAYLOR & FRANCIS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce

Journal

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Publisher

TAYLOR & FRANCIS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper