4.4 Article

Identifying artificial intelligence (AI) invention: a novel AI patent dataset

Journal

JOURNAL OF TECHNOLOGY TRANSFER
Volume 47, Issue 2, Pages 476-505

Publisher

SPRINGER
DOI: 10.1007/s10961-021-09900-2

Keywords

Patent; Patent landscape; Artificial intelligence; AI; Machine learning; Patent dataset

Ask authors/readers for more resources

This paper introduces a novel dataset of Artificial Intelligence Patent Dataset (AIPD), which identifies over 13.2 million patents and PGPubs containing AI using machine learning models. The dataset consists of two data files, one for identifying AI patents and PGPubs, and the other containing the patent documents utilized to train the machine learning classification models.
Artificial intelligence (AI) is an area of increasing scholarly and policy interest. To help researchers, policymakers, and the public, this paper describes a novel dataset identifying AI in over 13.2 million patents and pre-grant publications (PGPubs). The dataset, called the Artificial Intelligence Patent Dataset (AIPD), was constructed using machine learning models for each of eight AI component technologies covering areas such as natural language processing, AI hardware, and machine learning. The AIPD contains two data files, one identifying the patents and PGPubs predicted to contain AI and a second file containing the patent documents used to train the machine learning classification models. We also present several evaluation metrics based on manual review by patent examiners with focused expertise in AI, and show that our machine learning approach achieves state-of-the-art performance across existing alternatives in the literature. We believe releasing this dataset will strengthen policy formulation, encourage additional empirical work, and provide researchers with a common base for building empirical knowledge on the determinants and impacts of AI invention.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available