4.6 Article

PMIVec: a word embedding model guided by point-wise mutual information criterion

Journal

MULTIMEDIA SYSTEMS
Volume 28, Issue 6, Pages 2275-2283

Publisher

SPRINGER
DOI: 10.1007/s00530-022-00928-4

Keywords

Natural language processing; Word embedding; Point-wise mutual information

Funding

  1. NSFC [61836011, U20B2070, 61976199]

Ask authors/readers for more resources

Word embedding represents words with dense vectors to show the semantic similarity. This paper introduces a novel method, PMIVec, which learns context vectors to represent words and uses point-wise mutual information to measure semantic similarity. Experimental results demonstrate that PMIVec outperforms state-of-the-art models consistently.
Word embedding aims to represent each word with a dense vector which reveals the semantic similarity between words. Existing methods such as word2vec derive such representations by factorizing the word-context matrix into two parts, i.e., word vectors and context vectors. However, only one part is used to represent the word, which may damage the semantic similarity between words. To address this problem, this paper proposes a novel word embedding method based on point-wise mutual information criterion (PMIVec). Our method explicitly learns the context vector as the final word representation for each word, while discarding the word vector. To avoid the damage of semantic similarity between words, we normalize the word vector during the training process. Moreover, this paper uses point-wise mutual information to measure the semantic similarity between words, which is more consistent with human intuition on semantic similarity. Experiments on public data sets show that our PMIVec model can consistently outperform state-of-the-art models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available