Journal
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
Volume 17, Issue 9, Pages -Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3596603
Keywords
Technology portrait; technical phrase extraction; patent mining; multi-level; multi-aspect
Ask authors/readers for more resources
This article discusses the importance of recognizing technical phrases in patent mining and proposes an unsupervised model called TechPat to automatically identify technical phrases. The experimental results confirm the wide application prospects of technical phrases in practical tasks such as patent search and classification.
In recent years, due to the explosive growth of patent applications, patent mining has drawn extensive attention and interest. An important issue of patent mining is that of recognizing the technologies contained in patents, which serves as a fundamental preparation for deeper analysis. To this end, in this article, we make a focused study on constructing a technology portrait for each patent, i.e., to recognize technical phrases concerned in it, which can summarize and represent patents from a technical perspective. Along this line, a critical challenge is how to analyze the unique characteristics of technical phrases and illustrate them with definite descriptions. Therefore, we first generate the detailed descriptions about the technical phrases existing in extensive patents based on different criteria, including various previous works, practical experience, and statistical analyses. Then, considering the unique characteristics of technical phrases and the complex structure of patent documents, such as multi-aspect semantics and multi-level relevances, we further propose a novel unsupervised model, namely TechPat, which can not only automatically recognize technical phrases from massive patents but also avoid the need for expensive human labeling. After that, we evaluate the extraction results from various aspects. Specifically, we propose a novel evaluation metric called Information Retrieval Efficiency (IRE) to quantify the performance of extracted technical phrases from a new perspective. Extensive experiments on real-world patent data demonstrate that the TechPat model can effectively discriminate technical phrases in patents and greatly outperform existing methods. We further apply extracted technical phrases to two practical application tasks, namely patent search and patent classification, where the experimental results confirm the wide application prospects of technical phrases. Finally, we discuss the generalization ability of our proposed methods.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available