期刊
DATA MINING, AUSDM 2022
卷 1741, 期 -, 页码 148-162出版社
SPRINGER-VERLAG SINGAPORE PTE LTD
DOI: 10.1007/978-981-19-8746-5_11
关键词
Topic modelling; Information networks; Graphs; Natural language processing; Data mining
The popular topic modelling algorithm, Latent Dirichlet Allocation, only produces a simple set of topics. In contrast, the novel algorithm called Community Topic mines communities from word co-occurrence networks to generate topics with a hierarchical structure. Compared to other models, Community Topic uncovers a more coherent topic hierarchy with a tighter relationship between parent and child topics, and it can find this hierarchy more quickly. This algorithm also allows researchers to discover sub- and super-topics on demand, facilitating corpus exploration.
The most popular topic modelling algorithm, Latent Dirichlet Allocation, produces a simple set of topics. However, topics naturally exist in a hierarchy with larger, more general super-topics and smaller, more specific sub-topics. We develop a novel topic modelling algorithm, Community Topic, that mines communities from word co-occurrence networks to produce topics. The fractal structure of networks provides a natural topic hierarchy where sub-topics can be found by iteratively mining the sub-graph formed by a single topic. Similarly, super-topics can by found by mining the network of topic hyper-nodes. We compare the topic hierarchies discovered by Community Topic to those produced by two probabilistic graphical topic models and find that Community Topic uncovers a topic hierarchy with a more coherent structure and a tighter relationship between parent and child topics. Community Topic is able to find this hierarchy more quickly and allows for on-demand sub- and super-topic discovery, facilitating corpus exploration by researchers.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据