4.7 Article

Intelligent approach to automated star-schema construction using a knowledge base

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 182, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.115226

Keywords

Data warehouse; Intelligent system; Multidimensional model; Ontology; Semantic approach; Star schema

Funding

  1. Computer Science and Information Technology Department, Science Faculty, Naresuan University [R2564E059, R2564E060]
  2. Health Systems Research Institute [63-017]
  3. Program Management Unit for Human Resources & Institutional Development, Research, and Innovation [B16F630071]
  4. Thailand Science Research Innovation (TSRI) [CU_FRB640001_01_30_1]

Ask authors/readers for more resources

The study introduces a knowledge-based model and framework that can automatically generate star schemas, addressing the challenges in data warehouse construction. By predicting attribute names and data types, it achieves the automated generation of star schemas, outperforming baseline methods.
Most data-warehouse construction processes are performed manually by experts, which is laborious, timeconsuming, and prone to error. Furthermore, special knowledge is required to design complex multidimensional models, such as a star schema. This predicament has motivated computer scientists to propose automation techniques to generate such models. For this reason, we present a new strategy that incorporates knowledgebased models into a framework, named the Semantic-based Star-schema Designer, that assists the automation of star schema construction. Our models provide reasoning capabilities needed by star schema designs, including those that can disambiguate heterogeneous terms, detect appropriate data types and attribute sizes, and organize data hierarchies to support online analytical processes. We also propose strategies to overcome the uncertainty arising when attribute names are not available in the data source. The names of unknown attributes are thus predicted using an arithmetic coding technique to infer column names. Our system also generates star schema from semi-structured data (e.g., comma-separated-value files and spreadsheets), which do not provide primary keys, foreign keys, or relationship cardinalities between tables. Our framework facilitates star schema construction and their relationship information without human intervention using homegrown algorithms. Experiments demonstrate that our technique predicts column names and data types that enable the effective generation of star schema better than baseline approaches.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available