3.8 Proceedings Paper

Analysis-oriented Metadata for Data Lakes

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3472163.3472273

关键词

Data Lake; Metadata Model; Analysis-oriented Metadata

向作者/读者索取更多资源

The aim of this paper is to establish an easily accessible, reusable data lake by proposing an analysis-oriented metadata model. This model includes descriptive information of datasets and attributes, as well as all metadata related to machine learning analyzes on these datasets. The implementation of a data lake metadata management application allows users to search for and use existing data, processes, and analyses by finding relevant metadata stored in a NoSQL data store within the data lake.
Data lakes are supposed to enable analysts to perform more efficient and efficacious data analysis by crossing multiple existing data sources, processes and analyses. However, it is impossible to achieve that when a data lake does not have a metadata governance system that progressively capitalizes on all the performed analysis experiments. The objective of this paper is to have an easily accessible, reusable data lake that capitalizes on all user experiences. To meet this need, we propose an analysis-oriented metadata model for data lakes. This model includes the descriptive information of datasets and their attributes, as well as all metadata related to the machine learning analyzes performed on these datasets. To illustrate our metadata solution, we implemented an application of data lake metadata management. This application allows users to find and use existing data, processes and analyses by searching relevant metadata stored in a NoSQL data store within the data lake. To demonstrate how to easily discover metadata with the application, we present two use cases, with real data, including datasets similarity detection and machine learning guidance.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据