4.7 Article

Intelligent approach to automated star-schema construction using a knowledge base

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 182, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.115226

关键词

Data warehouse; Intelligent system; Multidimensional model; Ontology; Semantic approach; Star schema

资金

  1. Computer Science and Information Technology Department, Science Faculty, Naresuan University [R2564E059, R2564E060]
  2. Health Systems Research Institute [63-017]
  3. Program Management Unit for Human Resources & Institutional Development, Research, and Innovation [B16F630071]
  4. Thailand Science Research Innovation (TSRI) [CU_FRB640001_01_30_1]

向作者/读者索取更多资源

The study introduces a knowledge-based model and framework that can automatically generate star schemas, addressing the challenges in data warehouse construction. By predicting attribute names and data types, it achieves the automated generation of star schemas, outperforming baseline methods.
Most data-warehouse construction processes are performed manually by experts, which is laborious, timeconsuming, and prone to error. Furthermore, special knowledge is required to design complex multidimensional models, such as a star schema. This predicament has motivated computer scientists to propose automation techniques to generate such models. For this reason, we present a new strategy that incorporates knowledgebased models into a framework, named the Semantic-based Star-schema Designer, that assists the automation of star schema construction. Our models provide reasoning capabilities needed by star schema designs, including those that can disambiguate heterogeneous terms, detect appropriate data types and attribute sizes, and organize data hierarchies to support online analytical processes. We also propose strategies to overcome the uncertainty arising when attribute names are not available in the data source. The names of unknown attributes are thus predicted using an arithmetic coding technique to infer column names. Our system also generates star schema from semi-structured data (e.g., comma-separated-value files and spreadsheets), which do not provide primary keys, foreign keys, or relationship cardinalities between tables. Our framework facilitates star schema construction and their relationship information without human intervention using homegrown algorithms. Experiments demonstrate that our technique predicts column names and data types that enable the effective generation of star schema better than baseline approaches.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据