☆ 3.9 Article

A Scalable Feature Based Clustering Algorithm for Sequences with Many Distinct Items

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS (2018)

期刊

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS

卷 18, 期 4, 页码 316-325

出版社

KOREAN INST INTELLIGENT SYSTEMS

DOI: 10.5391/IJFIS.2018.18.4.316

关键词

Sequence data; Feature-based clustering; Frequent sequential patterns

类别

Computer Science, Theory & Methods

资金

Seoul National University of Science and Technology (SeoulTech)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Various sequence data have grown explosively in recent years. As more and more of such data become available, clustering is needed to understand the structure of sequence data. However, the existing clustering algorithms for sequence data are computationally demanding. To avoid such a problem, a feature-based clustering algorithm has been proposed. Notwithstanding that, the algorithm uses only a subset of all possible frequent sequential patterns as features, which may result in the distortion of similarities between sequences in practice, especially when dealing with sequence data with a large number of distinct items such as customer transaction data. Developed in this article is a feature-based clustering algorithm using a complete set of frequent sequential patterns as features for sequences of sets of items as well as sequences of single items which consist of many distinct items. The proposed algorithm projects sequence data into feature space whose dimension consists of a complete set of frequent sequential patterns, and then, employs K-means clustering algorithm. Experimental results show that the proposed algorithm generates more meaningful clusters than the compared algorithms regardless of the dataset and parameters such as the minimum support value of frequent sequential patterns and the number of clusters considered. Moreover, the proposed algorithm can be applied to a large sequence database since it is linearly scalable to the number of sequence data.

A Scalable Feature Based Clustering Algorithm for Sequences with Many Distinct Items

期刊

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS

出版社

KOREAN INST INTELLIGENT SYSTEMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Scalable Feature Based Clustering Algorithm for Sequences with Many Distinct Items

期刊

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS

出版社

KOREAN INST INTELLIGENT SYSTEMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文