4.5 Article

A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics

期刊

出版社

ELSEVIER
DOI: 10.1016/j.csda.2020.107043

关键词

Data analysis; Correlation; Contingency test; Significance; Simulation

资金

  1. KPMG, The Netherlands Advisory N.V.

向作者/读者索取更多资源

A prescription is presented for a new and practical correlation coefficient, phi(K), based on several refinements to Pearson's hypothesis test of independence of two variables. The combined features of phi(K) form an advantage over existing coefficients. Primarily, it works consistently between categorical, ordinal and interval variables, in essence by treating each variable as categorical, and can therefore be used to calculate correlations between variables of mixed type. Second, it captures nonlinear dependency. The strength of phi(K) is similar to Pearson's correlation coefficient, and is equivalent in case of a bivariate normal input distribution. These are useful properties when studying the correlations between variables with mixed types, where some are categorical. Two more innovations are presented: to the proper evaluation of statistical significance of correlations, and to the interpretation of variable relationships in a contingency table, in particular in case of sparse or low statistics samples and significant dependencies. Two practical applications are discussed. The presented algorithms are easy to use and available through a public Python library.(1) (C) 2020 Published by Elsevier B.V.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据