☆ 4.6 Review

Machine learning methods for generating high dimensional discrete datasets

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY (2022)

期刊

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

卷 12, 期 2, 页码 -

出版社

WILEY PERIODICALS, INC

DOI: 10.1002/widm.1450

关键词

constraints-based models; data generation; generative adversarial networks; generative models; inverse frequent itemset mining; synthetic dataset; variational autoencoder

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

资金

European Commission [952026]
Ministero dell'Istruzione, dell'Universita e della Ricerca [ARS01_00587]
National Science Foundation [1820685]
Direct For Education and Human Resources
Division Of Graduate Education [1820685] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. This survey explores two possible approaches for synthesizing datasets that reflect patterns of real ones, and compares their pros and cons.

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X ' that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons. This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discovery

Machine learning methods for generating high dimensional discrete datasets

期刊

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

出版社

WILEY PERIODICALS, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Machine learning methods for generating high dimensional discrete datasets

期刊

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

出版社

WILEY PERIODICALS, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文