4.4 Article

Testing homogeneity in high dimensional data through random projections

期刊

JOURNAL OF MULTIVARIATE ANALYSIS
卷 200, 期 -, 页码 -

出版社

ELSEVIER INC
DOI: 10.1016/j.jmva.2023.105252

关键词

Cramer-von Mises test; High dimension; High Random projections; Two-sample test

向作者/读者索取更多资源

This article introduces a method for testing the homogeneity of two random vectors. The method involves selecting two subspaces and projecting them onto one-dimensional spaces, using the Cramer-von Mises distance to construct the test statistic. The performance is enhanced by repeating this procedure and the effectiveness is demonstrated through numerical simulations.
Testing for homogeneity of two random vectors is a fundamental problem in statistics. In the past two decades, numerous efforts have been made to detect heterogeneity when the random vectors are multivariate or even high dimensional. Due to the curse of dimensionality, existing tests based on Euclidean distance may fail to capture the overall homogeneity in high dimensional settings while can only capture the moment discrepancy. To address this issue, we propose a fully nonparametric test for homogeneity of two random vectors. Our method involves randomly selecting two subspaces consisting of components of the vectors, projecting the subspaces onto one-dimensional spaces, respectively, and constructing the test statistic using the Cramer-von Mises distance of the projections. To enhance the performance, we repeatedly implement this procedure to construct the final test statistic. Theoretically, if the replication time tends to infinity, we can avoid potential power loss caused by lousy directions. Owing to the U-statistic theory, the asymptotic null distribution of our proposed test is standard normal, regardless of the parent distributions of the random samples and the relationship between data dimensions and sample sizes. As a result, no re-sampling procedure is needed to determine critical values. The empirical size and power of the proposed test are demonstrated through numerical simulations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据