4.6 Article

Generic SAO Similarity Measure via Extended Sorensen-Dice Index

期刊

IEEE ACCESS
卷 8, 期 -, 页码 66538-66552

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2984024

关键词

Semantics; Indexes; Patents; Atmospheric measurements; Particle measurements; Syntactics; Current measurement; Similarity measurement; SOrensen-Dice index; semantic information; Subject-Action-Object; computational linguistics

资金

  1. National Social Science Found of China [16BTQ067]
  2. Innovation Project of Chinese Academy of Agricultural Sciences [CAAS-ASTIP-2016-AII]
  3. Specialized Fundamental Research Operational Fees of Chinese Academy of Agricultural Sciences [Y2017ZK04]

向作者/读者索取更多资源

As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action-Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sorensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sorensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sorensen-Dice index is of independent interest, and has potential applications for other similarity measures.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据