☆ 4.5 Article

Consistency of Archetypal Analysis

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE (2021)

Journal

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE

Volume 3, Issue 1, Pages 1-30

Publisher

SIAM PUBLICATIONS

DOI: 10.1137/20M1331792

Keywords

archetypal analysis; principal convex hull; consistency; multivariate data summarization; unsupervised learning

Funding

NSF DMS [16-19755, 17-52202]
Simons collaboration grant for mathematicians [586942]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Archetypal analysis is an unsupervised learning method that utilizes convex polytopes to summarize multivariate data, with archetype points being the key components. Consistency results are proven, showing convergence of archetype points under certain assumptions, along with convergence rates for optimal objective values. Experiments with various distributions support the analysis and demonstrate the effectiveness of the method for summarizing data.

Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data. For fixed k, the method finds a convex polytope with k vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared distance between the data and the polytope is minimal. In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support, then the archetype points converge to a solution of the continuum version of the problem, of which we identify and establish several properties. We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution. If the data is independently sampled from a distribution with unbounded support, we also prove a consistency result for a modified method that penalizes the dispersion of the archetype points. Our analysis is supported by detailed computational experiments of the archetype points for data sampled from the uniform distribution in a disk, the normal distribution, an annular distribution, and a Gaussian mixture model.

Consistency of Archetypal Analysis

Journal

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE

Publisher

SIAM PUBLICATIONS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Consistency of Archetypal Analysis

Journal

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE

Publisher

SIAM PUBLICATIONS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper