期刊
KNOWLEDGE AND INFORMATION SYSTEMS
卷 25, 期 3, 页码 473-491出版社
SPRINGER LONDON LTD
DOI: 10.1007/s10115-009-0250-y
关键词
Text similarity measures; Information retrieval; Query expansion; Text mining; Question answering
资金
- City University of Hong Kong [7002488]
- China Semantic Grid Research Plan (National Grand Fundamental Research 973 Program) [2003CB317000]
In this paper, we propose a new method for measuring the similarity between two short text snippets by comparing each of them with the probabilistic topics. Specifically, our method starts by firstly finding the distinguishing terms between the two short text snippets and comparing them with a series of probabilistic topics, extracted by Gibbs sampling algorithm. The relationship between the distinguishing terms of the short text snippets can be discovered by examining their probabilities under each topic. The similarity between two short text snippets is calculated based on their common terms and the relationship of their distinguishing terms. Extensive experiments on paraphrasing and question categorization show that the proposed method can calculate the similarity of short text snippets more accurately than other methods including the pure TF-IDF measure.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据