☆ 4.5 Article

Postimpact similarity: a similarity measure for effective grouping of unlabelled text using spectral clustering

KNOWLEDGE AND INFORMATION SYSTEMS (2022)

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Volume 64, Issue 3, Pages 723-742

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s10115-022-01658-9

Keywords

Text clustering; Data clustering; Applied machine learning; Data mining

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The task of text clustering is to divide a set of text documents into meaningful groups based on their similarity. The content similarity between documents is commonly used to form clusters, but it may not be effective for large and high-dimensional corpora. This paper proposes a similarity measure using spectral method, which assigns scores based on the content similarity between documents and their individual similarity with shared neighbors. Experimental results show that this method outperforms existing text clustering techniques in terms of normalized mutual information, f-measure, and v-measure.

The task of text clustering is to partition a set of text documents into different meaningful groups such that the documents in a particular cluster are more similar to each other than the documents of other clusters according to a similarity or dissimilarity measure. Therefore, the role of similarity measure is crucial for producing good-quality clusters. The content similarity between two documents is generally used to form individual clusters, and it is measured by considering shared terms between the documents. However, the same may not be effective for a reasonably large and high-dimensional corpus. Therefore, a similarity measure is proposed here to improve the performance of text clustering using spectral method. The proposed similarity measure between two documents assigns a score based on their content similarity and their individual similarity with the shared neighbours over the corpus. The effectiveness of the proposed document similarity measure has been tested for clustering of different standard corpora using spectral clustering method. The empirical results using some well-known text collections have shown that the proposed method performs better than the state-of-the-art text clustering techniques in terms of normalized mutual information, f-measure and v-measure.

Postimpact similarity: a similarity measure for effective grouping of unlabelled text using spectral clustering

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Postimpact similarity: a similarity measure for effective grouping of unlabelled text using spectral clustering

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper