☆ 4.6 Article

On the Potential of Taxonomic Graphs to Improve Applicability and Performance for the Classification of Biomedical Patents

APPLIED SCIENCES-BASEL (2021)

Journal

APPLIED SCIENCES-BASEL

Volume 11, Issue 2, Pages -

Publisher

MDPI

DOI: 10.3390/app11020690

Keywords

innovation management; medical technology; taxonomies; tree edit distance; multiclass patent categorization; automation; emerging technologies

Funding

Klaus Tschira Foundation gGmbH, Heidelberg, Germany

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study combines taxonomic and textual information to develop an ensemble classification system for patent categorization, achieving nearly 10 points higher performance when compared to basic classifiers. The classifiers are trained on patents' title/abstract and CPC, IPC assignments, with the taxonomies transformed into real-valued vectors through DSE. The ensemble of classifiers, particularly when combined with a feed-forward ANN, outperforms individual classifiers and offers new possibilities for technology management.

A core task in technology management in biomedical engineering and beyond is the classification of patents into domain-specific categories, increasingly automated by machine learning, with the fuzzy language of patents causing particular problems. Striving for higher classification performance, increasingly complex models have been developed, based not only on text but also on a wealth of distinct (meta) data and methods. However, this makes it difficult to access and integrate data and to fuse distinct predictions. Although the already established Cooperate Patent Classification (CPC) offers a plethora of information, it is rarely used in automated patent categorization. Thus, we combine taxonomic and textual information to an ensemble classification system comparing stacking and fixed combination rules as fusion methods. Various classifiers are trained on title/abstract and on both the CPC and IPC (International Patent Classification) assignments of 1230 patents covering six categories of future biomedical innovation. The taxonomies are modeled as tree graphs, parsed and transformed by Dissimilarity Space Embedding (DSE) to real-valued vectors. The classifier ensemble tops the basic performance by nearly 10 points to F1 = 78.7% when stacked with a feed-forward Artificial Neural Network (ANN). Taxonomic base classifiers perform nearly as well as the text-based learners. Moreover, an ensemble only of CPC and IPC learners reaches F1 = 71.2% as fully language independent and straightforward approach of established algorithms and readily available integrated data enabling new possibilities for technology management.

On the Potential of Taxonomic Graphs to Improve Applicability and Performance for the Classification of Biomedical Patents

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On the Potential of Taxonomic Graphs to Improve Applicability and Performance for the Classification of Biomedical Patents

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper