期刊
COMPUTATIONAL STATISTICS & DATA ANALYSIS
卷 152, 期 -, 页码 -出版社
ELSEVIER
DOI: 10.1016/j.csda.2020.107043
关键词
Data analysis; Correlation; Contingency test; Significance; Simulation
资金
- KPMG, The Netherlands Advisory N.V.
A prescription is presented for a new and practical correlation coefficient, phi(K), based on several refinements to Pearson's hypothesis test of independence of two variables. The combined features of phi(K) form an advantage over existing coefficients. Primarily, it works consistently between categorical, ordinal and interval variables, in essence by treating each variable as categorical, and can therefore be used to calculate correlations between variables of mixed type. Second, it captures nonlinear dependency. The strength of phi(K) is similar to Pearson's correlation coefficient, and is equivalent in case of a bivariate normal input distribution. These are useful properties when studying the correlations between variables with mixed types, where some are categorical. Two more innovations are presented: to the proper evaluation of statistical significance of correlations, and to the interpretation of variable relationships in a contingency table, in particular in case of sparse or low statistics samples and significant dependencies. Two practical applications are discussed. The presented algorithms are easy to use and available through a public Python library.(1) (C) 2020 Published by Elsevier B.V.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据