3.8 Proceedings Paper

A Statistical Perspective on Discovering Functional Dependencies in Noisy Data

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3318464.3389749

Keywords

Functional Dependencies; Structure Learning

Funding

  1. Amazon under an ARA Award
  2. NSF [IIS-1755676]
  3. DARPA [ASKE HR00111990013]

Ask authors/readers for more resources

We study the problem of discovering functional dependencies (FD) from a noisy data set. We adopt a statistical perspective and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy data set is equivalent to learning the structure of a model over binary random variables, where each random variable corresponds to a functional of the data set attributes. We build upon this observation to introduce FDX a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that FDX can recover true functional dependencies across a diverse array of real-world and synthetic data sets, even in the presence of noisy or missing data. We find that FDX scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F-1 improvement of 2x against state-of-the-art FD discovery methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available