3.9 Article

A Scalable Feature Based Clustering Algorithm for Sequences with Many Distinct Items

出版社

KOREAN INST INTELLIGENT SYSTEMS
DOI: 10.5391/IJFIS.2018.18.4.316

关键词

Sequence data; Feature-based clustering; Frequent sequential patterns

资金

  1. Seoul National University of Science and Technology (SeoulTech)

向作者/读者索取更多资源

Various sequence data have grown explosively in recent years. As more and more of such data become available, clustering is needed to understand the structure of sequence data. However, the existing clustering algorithms for sequence data are computationally demanding. To avoid such a problem, a feature-based clustering algorithm has been proposed. Notwithstanding that, the algorithm uses only a subset of all possible frequent sequential patterns as features, which may result in the distortion of similarities between sequences in practice, especially when dealing with sequence data with a large number of distinct items such as customer transaction data. Developed in this article is a feature-based clustering algorithm using a complete set of frequent sequential patterns as features for sequences of sets of items as well as sequences of single items which consist of many distinct items. The proposed algorithm projects sequence data into feature space whose dimension consists of a complete set of frequent sequential patterns, and then, employs K-means clustering algorithm. Experimental results show that the proposed algorithm generates more meaningful clusters than the compared algorithms regardless of the dataset and parameters such as the minimum support value of frequent sequential patterns and the number of clusters considered. Moreover, the proposed algorithm can be applied to a large sequence database since it is linearly scalable to the number of sequence data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.9
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据