Journal
PROCEEDINGS OF THE VLDB ENDOWMENT
Volume 8, Issue 1, Pages 61-72Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.14778/2735461.2735467
Keywords
-
Ask authors/readers for more resources
In this paper, we solve the following data summarization problem: given a multi-dimensional data set augmented with a binary attribute, how can we construct an interpretable and informative summary of the factors affecting the binary attribute in terms of the combinations of values of the dimension attributes? We refer to such summaries as explanation tables. We show the hardness of constructing optimally-informative explanation tables from data, and we propose effective and efficient heuristics. The proposed heuristics are based on sampling and include optimizations related to computing the information content of a summary from a sample of the data. Using real data sets, we demonstrate the advantages of explanation tables compared to related approaches that can be adapted to solve our problem, and we show significant performance benefits of our optimizations.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available