3.8 Proceedings Paper

Hierarchical Topic Model Inference by Community Discovery on Word Co-occurrence Networks

期刊

DATA MINING, AUSDM 2022
卷 1741, 期 -, 页码 148-162

出版社

SPRINGER-VERLAG SINGAPORE PTE LTD
DOI: 10.1007/978-981-19-8746-5_11

关键词

Topic modelling; Information networks; Graphs; Natural language processing; Data mining

向作者/读者索取更多资源

The popular topic modelling algorithm, Latent Dirichlet Allocation, only produces a simple set of topics. In contrast, the novel algorithm called Community Topic mines communities from word co-occurrence networks to generate topics with a hierarchical structure. Compared to other models, Community Topic uncovers a more coherent topic hierarchy with a tighter relationship between parent and child topics, and it can find this hierarchy more quickly. This algorithm also allows researchers to discover sub- and super-topics on demand, facilitating corpus exploration.
The most popular topic modelling algorithm, Latent Dirichlet Allocation, produces a simple set of topics. However, topics naturally exist in a hierarchy with larger, more general super-topics and smaller, more specific sub-topics. We develop a novel topic modelling algorithm, Community Topic, that mines communities from word co-occurrence networks to produce topics. The fractal structure of networks provides a natural topic hierarchy where sub-topics can be found by iteratively mining the sub-graph formed by a single topic. Similarly, super-topics can by found by mining the network of topic hyper-nodes. We compare the topic hierarchies discovered by Community Topic to those produced by two probabilistic graphical topic models and find that Community Topic uncovers a topic hierarchy with a more coherent structure and a tighter relationship between parent and child topics. Community Topic is able to find this hierarchy more quickly and allows for on-demand sub- and super-topic discovery, facilitating corpus exploration by researchers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据