期刊
IEEE ACCESS
卷 10, 期 -, 页码 32093-32103出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3161481
关键词
Unsupervised outlier detection; k-nearest neighbor; mixed-valued dataset; network model; random walk process
资金
- National Natural Science Foundation, China [51505480, 72001203]
- Graduate Research and Innovation Projects of Jiangsu Province, China [KYCX21_2477]
An unsupervised outlier detection method for datasets with mixed-valued attributes based on an adaptive k-NN global network is proposed in this study. By introducing an adaptive search algorithm and a Heterogeneous Euclidean-Overlap Metric for distance measurement, as well as using transition probabilities to limit behaviors of random walkers, the method effectively detects outliers in the dataset.
Outlier detection aims to reveal data patterns different from existing data. Benefit from its good robustness and interpretability, the outlier detection method for numerical dataset based on k-Nearest Neighbor (k-NN) network has attracted much attention in recent years. However, the datasets produced in many practical contexts tend to contain both numerical and categorical attributes, that are, the datasets with mixed-valued attributes (DMAs). And, the selection of k is also an issue that is worthy of attention for unlabeled datasets. Therefore, an unsupervised outlier detection method for DMA based on an adaptive k-NN global network is proposed. First, an adaptive search algorithm for the appropriate value of k considering the distribution characteristics of datasets is introduced. Next, the distance between mixed-valued data objects is measured based on the Heterogeneous Euclidean-Overlap Metric, and the k-NN of a data object is obtained. Then, an adaptive k-NN global network is constructed based on the neighborhood relationships between data objects, and a customized random walk process is executed on it to detect outliers by using the transition probability to limit behaviors of the random walker. Finally, the effectiveness, accuracy, and applicability of the proposed method are demonstrated by a detailed experiment.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据