☆ 4.6 Article

Mimicking Complexity of Structured Data Matrix's Information Content: Categorical Exploratory Data Analysis

ENTROPY (2021)

期刊

ENTROPY

卷 23, 期 5, 页码 -

出版社

MDPI

DOI: 10.3390/e23050594

关键词

contingency-kD-lattice; high order structural dependency; heterogeneity; mutual conditional entropy matrix; principal component analysis (PCA)

类别

Physics, Multidisciplinary

资金

CEDAR in UC Davis

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

CEDA with mimicking explores and exhibits the complexity and structural dependency of data matrices, revealing information content and feature associations from fine-scale to global structures. It enhances data visualization reliability and robustness, clarifying which covariate feature-groups have major-vs.-minor predictive powers on response features at specific scales.

We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features' categorical nature via histogram and it is guided by all features' associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of k(>= 3) features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix's information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system.

Mimicking Complexity of Structured Data Matrix's Information Content: Categorical Exploratory Data Analysis

期刊

ENTROPY

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Mimicking Complexity of Structured Data Matrix's Information Content: Categorical Exploratory Data Analysis

期刊

ENTROPY

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文