4.6 Article

BTM and GloVe Similarity Linear Fusion-Based Short Text Clustering Algorithm for Microblog Hot Topic Discovery

期刊

IEEE ACCESS
卷 8, 期 -, 页码 32215-32225

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2973430

关键词

BTM; GloVe; microblog hot topic discovery; similarity linear fusion; WMD

资金

  1. Reaearch Projects of Science and Technology in Hebei Higher Education Institutions [ZD2018087, ZD2016017, QN2018109, YQ2014014]
  2. Nature Science Foundation of Hebei Province [F2019402428]
  3. National Key RD Program of China [2018YFF0301004]
  4. National Natural Foundation of China [61802107]

向作者/读者索取更多资源

Microblog hot topic discovery is one of the research hotspots in the field of text mining. The distance function of traditional K-means leads to low clustering accuracy, which leads to poor hot topic discovery. Three definitions are proposed in this paper: title words and body words, positional contribution-based weight and fusion similarity-based distance. The short text clustering algorithm based on BTM and GloVe similarity linear fusion (BG & x0026; SLF-Kmeans) is further proposed. BTM and GloVe are used to model the preprocessed microblog short texts. JS divergence is adopted to calculate the text similarity based on BTM topic modeling. WMD of improved word weight (IWMD) is used to calculate the text similarity based on GloVe word vector modeling. Finally, the two similarities are linearly fused and used as the distance function to realize K-means clustering. Specific word sets of 6 hot topics can be obtained, and microblog hot topics can be discovered. The experimental results show that BG & x0026; SLF-Kmeans significantly improves clustering accuracy compared with TF-IDF & x0026; K-means, BTM & x0026; K-means, and BTF & x0026; SLF-Kmeans.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据