4.5 Article

Multi-view document clustering based on geometrical similarity measurement

Journal

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s13042-021-01295-8

Keywords

Multi-view clustering; Ensemble clustering; Similarity measurement; Document clustering

Funding

  1. National Science Foundation of China [61772435, 61976182, 61876157]
  2. Fundamental Research Funds for the Central Universities [220710004005040177]
  3. Sichuan Key RD project [2020YFG0035]

Ask authors/readers for more resources

This paper introduces five similarity metric models that address the limitations of traditional Cosine similarity and Euclidean distance metrics. By proposing a more accurate similarity function, the experimental results show that this approach outperforms existing algorithms.
Numerous works implemented multi-view clustering algorithms in document clustering. A challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods broadly utilized two measurements: the Cosine similarity (CS) and the Euclidean distance (ED). The first did not consider the magnitude difference (MD) between the two vectors. The second can't register the divergence of two vectors that offer a similar ED. In this paper, we originally created five models of similarity metric. This methodology foils the downside of the CS and ED similarity metrics by figuring the divergence between documents with the same ED while thinking about their sizes. Furthermore, we proposed our multi-view document clustering plan which dependent on the proposed similarity metric. Firstly, CS, ED, triangle's area similarity and sector's area similarity metric, and our five similarity metrics have been applied to every view of a dataset to generate a corresponding similarity matrix. Afterward, we ran clustering algorithms on these similarity matrices to evaluate the performance of single view. Later, we aggregated these similarity matrices to obtain a unified similarity matrix and apply spectral clustering algorithm on it to generate the final clusters. The experimental results show that the proposed similarity functions can gauge the similitude between documents more accurately than the existing metrics, and the proposed clustering scheme surpasses considerably up-to-date algorithms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available