4.6 Article

Unsupervised Outlier Detection for Mixed-Valued Dataset Based on the Adaptive k-Nearest Neighbor Global Network

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 32093-32103

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3161481

Keywords

Unsupervised outlier detection; k-nearest neighbor; mixed-valued dataset; network model; random walk process

Funding

  1. National Natural Science Foundation, China [51505480, 72001203]
  2. Graduate Research and Innovation Projects of Jiangsu Province, China [KYCX21_2477]

Ask authors/readers for more resources

An unsupervised outlier detection method for datasets with mixed-valued attributes based on an adaptive k-NN global network is proposed in this study. By introducing an adaptive search algorithm and a Heterogeneous Euclidean-Overlap Metric for distance measurement, as well as using transition probabilities to limit behaviors of random walkers, the method effectively detects outliers in the dataset.
Outlier detection aims to reveal data patterns different from existing data. Benefit from its good robustness and interpretability, the outlier detection method for numerical dataset based on k-Nearest Neighbor (k-NN) network has attracted much attention in recent years. However, the datasets produced in many practical contexts tend to contain both numerical and categorical attributes, that are, the datasets with mixed-valued attributes (DMAs). And, the selection of k is also an issue that is worthy of attention for unlabeled datasets. Therefore, an unsupervised outlier detection method for DMA based on an adaptive k-NN global network is proposed. First, an adaptive search algorithm for the appropriate value of k considering the distribution characteristics of datasets is introduced. Next, the distance between mixed-valued data objects is measured based on the Heterogeneous Euclidean-Overlap Metric, and the k-NN of a data object is obtained. Then, an adaptive k-NN global network is constructed based on the neighborhood relationships between data objects, and a customized random walk process is executed on it to detect outliers by using the transition probability to limit behaviors of the random walker. Finally, the effectiveness, accuracy, and applicability of the proposed method are demonstrated by a detailed experiment.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available