☆ 4.5 Article

Short text similarity based on probabilistic topics

KNOWLEDGE AND INFORMATION SYSTEMS (2010)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 25, 期 3, 页码 473-491

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-009-0250-y

关键词

Text similarity measures; Information retrieval; Query expansion; Text mining; Question answering

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

City University of Hong Kong [7002488]
China Semantic Grid Research Plan (National Grand Fundamental Research 973 Program) [2003CB317000]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we propose a new method for measuring the similarity between two short text snippets by comparing each of them with the probabilistic topics. Specifically, our method starts by firstly finding the distinguishing terms between the two short text snippets and comparing them with a series of probabilistic topics, extracted by Gibbs sampling algorithm. The relationship between the distinguishing terms of the short text snippets can be discovered by examining their probabilities under each topic. The similarity between two short text snippets is calculated based on their common terms and the relationship of their distinguishing terms. Extensive experiments on paraphrasing and question categorization show that the proposed method can calculate the similarity of short text snippets more accurately than other methods including the pure TF-IDF measure.

Short text similarity based on probabilistic topics

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Short text similarity based on probabilistic topics

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文