4.7 Article

Evaluation of Sampling Methods for Scatterplots

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TVCG.2020.3030432

Keywords

Scatterplot; data sampling; empirical evaluation

Funding

  1. National Key R&D Program of China [2018YFB1004300, 2019YFB1405703]
  2. National Natural Science Foundation of China [61761136020, 61672307, 61672308, 61872389, 61936002]
  3. XJTLU Research Development Funding [RDF19-02-11, TC190A4DA/3]

Ask authors/readers for more resources

The study investigated the impact of different sampling strategies on multi-class scatterplots, finding that random sampling is preferred for preserving region density, blue noise sampling and random sampling have comparable performance in maintaining class density, outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers, and blue noise sampling outperforms others in maintaining the overall shape of a scatterplot.
Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but good scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available