☆ 4.5 Article

Multi-view document clustering based on geometrical similarity measurement

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS (2022)

Journal

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

Volume 13, Issue 3, Pages 663-675

Publisher

SPRINGER HEIDELBERG

DOI: 10.1007/s13042-021-01295-8

Keywords

Multi-view clustering; Ensemble clustering; Similarity measurement; Document clustering

Funding

National Science Foundation of China [61772435, 61976182, 61876157]
Fundamental Research Funds for the Central Universities [220710004005040177]
Sichuan Key RD project [2020YFG0035]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper introduces five similarity metric models that address the limitations of traditional Cosine similarity and Euclidean distance metrics. By proposing a more accurate similarity function, the experimental results show that this approach outperforms existing algorithms.

Numerous works implemented multi-view clustering algorithms in document clustering. A challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods broadly utilized two measurements: the Cosine similarity (CS) and the Euclidean distance (ED). The first did not consider the magnitude difference (MD) between the two vectors. The second can't register the divergence of two vectors that offer a similar ED. In this paper, we originally created five models of similarity metric. This methodology foils the downside of the CS and ED similarity metrics by figuring the divergence between documents with the same ED while thinking about their sizes. Furthermore, we proposed our multi-view document clustering plan which dependent on the proposed similarity metric. Firstly, CS, ED, triangle's area similarity and sector's area similarity metric, and our five similarity metrics have been applied to every view of a dataset to generate a corresponding similarity matrix. Afterward, we ran clustering algorithms on these similarity matrices to evaluate the performance of single view. Later, we aggregated these similarity matrices to obtain a unified similarity matrix and apply spectral clustering algorithm on it to generate the final clusters. The experimental results show that the proposed similarity functions can gauge the similitude between documents more accurately than the existing metrics, and the proposed clustering scheme surpasses considerably up-to-date algorithms.

Multi-view document clustering based on geometrical similarity measurement

Journal

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Multi-view document clustering based on geometrical similarity measurement

Journal

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper