4.7 Article

Outlier-Robust Subsampling Techniques for Persistent Homology

Journal

JOURNAL OF MACHINE LEARNING RESEARCH
Volume 24, Issue -, Pages -

Publisher

MICROTOME PUBL

Keywords

landmarks; persistent homology; subsampling; outliers; noise

Ask authors/readers for more resources

This article proposes a novel approach to select landmarks specifically for persistent homology (PH) that preserves coarse topological information of the original dataset. The method is tested on artificial datasets with different levels of noise and outperforms standard methods and a subsampling technique based on an outlier-robust version of the k-means algorithm in terms of robustness to outliers under low sampling densities.
In recent years, persistent homology (PH) has been successfully applied to real-world data in many different settings. Despite significant computational advances, PH algorithms do not yet scale to large datasets preventing interesting applications. One approach to address computational issues posed by PH is to select a set of landmarks by subsampling from the data. Currently, these landmark points are chosen either at random or using the maxmin algorithm. Neither is ideal as random selection tends to favour dense areas of the data while the maxmin algorithm is very sensitive to noise. Here, we propose a novel approach to select landmarks specifically for PH that preserves coarse topological information of the original dataset. Our method is motivated by the Mayer-Vietoris sequence and requires only local PH calculations thus enabling efficient computation. We test our landmarks on artificial data sets which contain different levels of noise and compare them to standard landmark selection techniques. We demonstrate that our landmark selection outperforms standard methods as well as a subsampling technique based on an outlier-robust version of the k-means algorithm for low sampling densities in noisy data with respect to robustness to outliers.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available