4.7 Article

Validating functional redundancy with mixed generative adversarial networks

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 264, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2023.110342

Keywords

Functional redundancy; Data imputation; Generative adversarial networks; Mixed data types; Data management; Functional dependency

Ask authors/readers for more resources

Data redundancy is a significant problem in data-intensive applications. This study introduces a new concept called functional redundancy, which overcomes the limitations of existing works on continuous data. An efficient algorithm based on generative adversarial networks is designed to validate any functional redundancy, regardless of the number of attributes and tuples. Experimental results demonstrate the superiority and applicability of the proposed method.
Data redundancy has been one of the most important problems in data-intensive applications such as data mining and machine learning. Removing data redundancy brings many benefits in efficient data updating, effective data storage, and error-free query processing. While it has been studied for four decades, existing works on data redundancy mostly focus on syntactic formulations such as normal forms and functional dependencies, which lead to intractable discovery problems. In this work, we propose a new concept, namely functional redundancy, that overcomes the limitations of functional dependencies, especially on continuous data. We design and develop efficient algorithms based on generative adversarial networks to validate any functional redundancy without heavily depending on the number of attributes and the number of tuples like functional dependencies. The core idea is to use the imputation power of generative adversarial networks to model any semantic dependencies between attributes. Extensive experiments on different real-world and synthetic datasets show that our approach outperforms representative baselines, is applicable for first-order and high-order dependencies, and is extensible for different types of data. (c) 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available