4.7 Article

Data-Driven Representations for Testing Independence: Modeling, Analysis and Connection With Mutual Information Estimation

Journal

IEEE TRANSACTIONS ON SIGNAL PROCESSING
Volume 70, Issue -, Pages 158-173

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSP.2021.3135689

Keywords

Testing; Mutual information; Estimation; Task analysis; Complexity theory; Random variables; Partitioning algorithms; Independence testing; non-parametric learning; learning representations; data-driven partitions; tree-structure partitions; mutual information; consistency; finite-length analysis

Funding

  1. CONICYT-Chile
  2. Fondecyt [1210315]
  3. ANID-PFCHA/MagsterNacional [2019-22191445]
  4. Advanced Center for Electrical and Electronic Engineering [FB0008]

Ask authors/readers for more resources

This work presents a method for testing the independence of two continuous and finite-dimensional random variables using a data-driven partition. By approximating the sufficient statistics of an oracle test, a learning criterion is provided for designing the partition. The method achieves a consistent and distribution-free test of independence over the family of probabilities with a density.
This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition. The empirical log-likelihood statistic is adopted to approximate the sufficient statistics of an oracle test against independence (that knows the two hypotheses). It is shown that approximating the sufficient statistics of the oracle test offers a learning criterion for designing a data-driven partition that connects with the problem of mutual information estimation. Applying these ideas in the context of a data-dependent tree-structured partition (TSP), we derive conditions on the TSP's parameters to achieve a strongly consistent distribution-free test of independence over the family of probabilities equipped with a density. Complementing this result, we present finite-length results that show our TSP scheme's capacity to detect the scenario of independence structurally with the data-driven partition as well as new sampling complexity bounds for this detection. Finally, some experimental analyses provide evidence regarding our scheme's advantage for testing independence compared with some strategies that do not use data-driven representations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available