☆ 4.1 Article

Learning multiple distributed prototypes of semantic categories for named entity recognition

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS (2015)

Journal

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS

Volume 13, Issue 4, Pages 395-411

Publisher

INDERSCIENCE ENTERPRISES LTD

DOI: 10.1504/IJDMB.2015.072766

Keywords

distributional semantics; semantic space ensembles; random indexing; named entity recognition; electronic health records; de-identification

Funding

project High-Performance Data Mining for Drug Effect Detection at Stockholm University - Swedish Foundation for Strategic Research [IIS11-0053]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The scarcity of large labelled datasets comprising clinical text that can be exploited within the paradigm of supervised machine learning creates barriers for the secondary use of data from electronic health records. It is therefore important to develop capabilities to leverage the large amounts of unlabelled data that, indeed, tend to be readily available. One technique utilises distributional semantics to create word representations in a wholly unsupervised manner and uses existing training data to learn prototypical representations of predefined semantic categories. Features describing whether a given word belongs to a certain category are then provided to the learning algorithm. It has been shown that using multiple distributional semantic models, each employing a different word order strategy, can lead to enhanced predictive performance. Here, another hyperparameter is also varied - the size of the context window and an experimental investigation shows that this leads to further performance gains.

Learning multiple distributed prototypes of semantic categories for named entity recognition

Journal

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS

Publisher

INDERSCIENCE ENTERPRISES LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Learning multiple distributed prototypes of semantic categories for named entity recognition

Journal

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS

Publisher

INDERSCIENCE ENTERPRISES LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper