4.6 Article

A text semantic topic discovery method based on the conditional co-occurrence degree

期刊

NEUROCOMPUTING
卷 368, 期 -, 页码 11-24

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2019.08.047

关键词

Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree

资金

  1. National Natural Science Foundation of China [71771034, 71421001]
  2. Scientific and Technological Innovation Foundation of Dalian [2018J11CY009]
  3. Science and Technology Program of Jieyang [2017xm041]

向作者/读者索取更多资源

The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, topic-word distributions and document-topic distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, document-topic distributions of the original documents are obtained by merging new subdocuments' document-topic distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据