4.7 Article

Effective enhancement of isolation Forest method based on Minimal Spanning tree clustering

期刊

INFORMATION SCIENCES
卷 628, 期 -, 页码 320-338

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2023.01.104

关键词

Anomaly detection; Isolation Forest; Minimal Spanning Tree clustering; Outliers detection

向作者/读者索取更多资源

Modern technologies have made it possible for researchers and practitioners to explore large datasets, highlighting the importance of anomaly detection methods in fixing or deleting unwanted records. The Isolation Forest algorithm is considered one of the fastest and most effective methods in anomaly detection, based on the construction of isolation binary trees through random split of dataset elements. This manuscript proposes an innovative approach that modifies the Isolation Forest technique by replacing random divisions with divisions based on Minimal Spanning Tree clustering. The evaluation process is also improved through the introduction of a two-component score function, which takes into account the level of the test element in the isolation tree as well as the distance between specific points in the last split node.
Modern technologies let researchers and practitioners explore large datasets. Anomaly detection methods applied to fix or delete unwanted records are of great importance here. One of the fastest and the most effective algorithms of anomaly detection is Isolation Forest. This solution is based on building isolation binary trees by randomly splitting the dataset elements. In this manuscript, we propose an innovative approach modifying this technique. In particular, we replace random divisions in the base mechanism with divisions based on Minimal Spanning Tree clustering. Additionally, we improve the evaluation process by introducing a two-component score function. The first component is related to the level of the test element in the isolation tree. The second term is calculated as the distance between specific points in the last split node. Namely, between the value of the evaluated attribute and the partition center stored in the node. In a series of comprehensive experiments, the proposed approach was compared with other Isolation Forest -based algorithms as well as state-of-the-art competing solutions. Our enhancement has proved its advantage in classification quality. In addition, the implementation operation times of selected solutions were measured. The results clearly demonstrate high effectiveness of the proposed approach.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据