期刊
KNOWLEDGE AND INFORMATION SYSTEMS
卷 -, 期 -, 页码 -出版社
SPRINGER LONDON LTD
DOI: 10.1007/s10115-023-01964
关键词
Conditional independence; Hypothesis testing; Representation learning; Generative models; Normalizing flows; Mixed data
In this study, a novel method called LCIT (Latent representation-based Conditional Independence Test) is introduced for testing conditional independence based on representation learning. LCIT first learns to infer the latent representations of target variables X and Y that contain no information about conditioning variable Z, and then investigates the latent variables for any significant remaining dependencies using a conventional correlation test. LCIT outperforms several state-of-the-art baselines consistently and adapts well to both nonlinear, high-dimensional, and mixed data settings on a diverse collection of synthetic and real data sets.
Detecting conditional independencies plays a key role in several statistical and machine learning tasks, especially in causal discovery algorithms, yet it remains a highly challenging problem due to dimensionality and complex relationships presented in data. In this study, we introduce LCIT (Latent representation-based Conditional Independence Test)-a novel method for conditional independence testing based on representation learning. Our main contribution involves a hypothesis testing framework in which to test for the independence between X and Y given Z, we first learn to infer the latent representations of target variables X and Y that contain no information about the conditioning variable Z. The latent variables are then investigated for any significant remaining dependencies, which can be performed using a conventional correlation test. Moreover, LCIT can also handle discrete and mixed-type data in general by converting discrete variables into the continuous domain via variational dequantization. The empirical evaluations show that LCIT outperforms several state-of-the-art baselines consistently under different evaluation metrics, and is able to adapt really well to both nonlinear, high-dimensional, and mixed data settings on a diverse collection of synthetic and real data sets.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据