☆ 4.2 Article

Enabling efficient process mining on large data sets: realizing an in-database process mining operator

DISTRIBUTED AND PARALLEL DATABASES (2020)

期刊

DISTRIBUTED AND PARALLEL DATABASES

卷 38, 期 1, 页码 227-253

出版社

SPRINGER

DOI: 10.1007/s10619-019-07270-1

关键词

Process mining; Relational algebra; Formal methods; Database management system

类别

Computer Science, Information Systems Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Process mining can be used to analyze business processes based on logs of their execution. These execution logs are often obtained by querying a database and storing the results in a file. The mining itself is then done on the file, such that the data processing power of the database cannot be used after the log is extracted. Enabling process mining directly on a database therefore provides additional flexibility and efficiency. To help facilitate this, this paper formally defines a database operator that extracts the 'directly follows' relation-one of the relations that is at the heart of process mining-from an operational database. It defines the operator using the well-known relational algebra and formally proves equivalence properties of the operator that are useful for query optimization. Subsequently, it presents time-complexity properties of the operator. Finally, it presents an implementation of the operator as part of the H2 DBMS and demonstrates that this implementation extracts the 'directly follows' relation from a database with an arbitrary database structure within a fraction of a second; several orders of magnitude faster than is currently possible.

Enabling efficient process mining on large data sets: realizing an in-database process mining operator

期刊

DISTRIBUTED AND PARALLEL DATABASES

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Enabling efficient process mining on large data sets: realizing an in-database process mining operator

期刊

DISTRIBUTED AND PARALLEL DATABASES

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文