4.7 Article

A relative granular ratio-based outlier detection method in heterogeneous data

期刊

INFORMATION SCIENCES
卷 622, 期 -, 页码 710-731

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.11.154

关键词

Heterogeneous data; Relative granular ratio; Neighborhood rough set; Outlier detection

向作者/读者索取更多资源

This paper proposes a neighborhood rough set based approach for outlier detection, which can handle heterogeneous data and reduce hyper-parameters. It introduces a relative granular ratio factor to measure the size of a neighborhood and defines a granule-based majority set to capture the majority of objects. An outlier factor based on the feature of a negative region is determined to measure the difference between outliers and the majority set. The proposed approach, called RNRD-based outlier detection (RNROD), is evaluated on sixteen heterogeneous datasets and outperforms seven existing detection algorithms.
Outlier detection is the discovery of some objects that are significantly different from many objects in data, and it is widely used in important fields. Most existing methods are based on prior knowledge, while few methods are suitable for heterogeneous data. In this paper, we detect outliers based on neighborhood rough set, which can process heterogeneous data and reduce some hyper-parameters. Considering the few characters of outliers, a rel-ative granular ratio factor is consequently created to measure the size of a neighborhood in which an object belongs. Since outliers always differ from the majority of objects, a granule-based majority set is defined. Then, a valid outlier factor is determined by the fea-ture of a negative region to measure the difference between outliers and the majority set. Finally, a ratio and negative region detection factor (RNRD) is constructed by combining the above factors under a wide range of relations. In addition, the RNRD-based outlier detection (RNROD) algorithm is designed. And experiments show the superiority of RNROD by comparing with seven existing detection algorithms on sixteen heterogeneous datasets. (c) 2022 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据