4.3 Article

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

期刊

CHINESE JOURNAL OF ELECTRONICS
卷 29, 期 2, 页码 233-241

出版社

TECHNOLOGY EXCHANGE LIMITED HONG KONG
DOI: 10.1049/cje.2019.12.011

关键词

Term frequency-Inverse document frequency(TF-IDF) model; Semantic fingerprint; Similarity; Characteristic phrases

资金

  1. National Natural Science Foundation of China [61572523, 61873281, 61572522]

向作者/读者索取更多资源

Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics. Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate, recall rate, and F-value to the traditional and digital fingerprinting method.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据