期刊
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
卷 35, 期 2, 页码 1270-1282出版社
IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2021.3103571
关键词
Outlier detection; clustering algorithms; rate-distortion theory
Rate-distortion theory-based outlier detection utilizes good data compression to encode outliers with unique symbols. We propose Cluster Purging as an extension of clustering-based outlier detection, allowing the assessment of clustering representivity and the identification of data best represented by individual unique clusters. We present two efficient algorithms for Cluster Purging, one parameter-free and the other allowing tuning in supervised setups.
Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that are best represented by individual unique clusters. We propose two efficient algorithms for performing Cluster Purging, one being parameter-free, while the other algorithm has a parameter that controls representivity estimations, allowing it to be tuned in supervised setups. In an experimental evaluation, we show that Cluster Purging improves upon outliers detected from raw clusterings, and that Cluster Purging competes strongly against state-of-the-art alternatives.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据