☆ 4.4 Article

A COMPARATIVE STUDY FOR OUTLIER DETECTION METHODS IN HIGH DIMENSIONAL TEXT DATA

JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH (2023)

期刊

JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH

卷 13, 期 1, 页码 5-17

出版社

SCIENDO

DOI: 10.2478/jaiscr-2023-0001

关键词

Curse of dimensionality; Dimension reduction; High dimensional text data; Outlier detection

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper compares and analyzes the performance of outlier detection in high dimensional data, with a focus on text data with dimensions typically in the tens of thousands. The performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions are compared through simulated experimental setups. The paper also discusses the use of k-NN distance in high dimensional data.

Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.

A COMPARATIVE STUDY FOR OUTLIER DETECTION METHODS IN HIGH DIMENSIONAL TEXT DATA

期刊

JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH

出版社

SCIENDO

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A COMPARATIVE STUDY FOR OUTLIER DETECTION METHODS IN HIGH DIMENSIONAL TEXT DATA

期刊

JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH

出版社

SCIENDO

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文