4.5 Article

Multi-Temperate Logical Data Warehouse Design for Large-Scale Healthcare Data

Journal

BIG DATA RESEARCH
Volume 25, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.bdr.2021.100255

Keywords

Data warehouse design; OLAP workloads; Healthcare data management; Data partitioning algorithms; Logical data warehouses; Columnar databases

Ask authors/readers for more resources

Advancements in modern hardware architectures and database technology have led to the increased adoption of logical data warehouses (LDWs) as complements to traditional physical data warehousing (PDW) approaches. LDWs allow for integration and transformation of data at run-time, with a focus on replicating high value data to physical core for spatial locality in premium hardware environments. This study explores the support and evaluation of LDW design algorithms in multi-temperature storage systems.
Modern hardware architectures and advances in database technology are driving increased adoption of logical data warehouses (LDWs) that complement traditional physical data warehousing (PDW) approaches. In contrast to PDW design methodologies that emphasize physical consolidation of all data of interest on a single (perhaps distributed) computing platform, along with early-binding approaches that pre-materialize transformations and changes to the source data, LDW techniques allow for the integration and transformation of data at run-time and typically physically move or modify much less data in advance. In an environment with premium hardware such as multi-temperate storage, the successful design of LDWs depends on replication of high value data to their physical core to maximize spatial locality. Identifying and collocating high value data is a non-trivial task that has not been adequately explored in the context of LDWs in multi-temperate storage systems. In this paper, we gather queries to construct an OLAP workload for use in supporting and evaluating LDW design algorithms for a large healthcare organization. We introduce new algorithms to address the preprocessing of the workload, identification of data clusters to support OLAP queries, and assignment of clusters to appropriate (hot, warm, and cold) storage tiers, allowing the LDW to deliver results more efficiently by covering a higher percentage of its query workload using the fastest storage devices. Any use case involving copying data from sources to tiered storage targets for analytic querying could benefit from the techniques and solutions presented here. (C) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available