Journal
MULTIMEDIA SYSTEMS
Volume 28, Issue 6, Pages 2275-2283Publisher
SPRINGER
DOI: 10.1007/s00530-022-00928-4
Keywords
Natural language processing; Word embedding; Point-wise mutual information
Funding
- NSFC [61836011, U20B2070, 61976199]
Ask authors/readers for more resources
Word embedding represents words with dense vectors to show the semantic similarity. This paper introduces a novel method, PMIVec, which learns context vectors to represent words and uses point-wise mutual information to measure semantic similarity. Experimental results demonstrate that PMIVec outperforms state-of-the-art models consistently.
Word embedding aims to represent each word with a dense vector which reveals the semantic similarity between words. Existing methods such as word2vec derive such representations by factorizing the word-context matrix into two parts, i.e., word vectors and context vectors. However, only one part is used to represent the word, which may damage the semantic similarity between words. To address this problem, this paper proposes a novel word embedding method based on point-wise mutual information criterion (PMIVec). Our method explicitly learns the context vector as the final word representation for each word, while discarding the word vector. To avoid the damage of semantic similarity between words, we normalize the word vector during the training process. Moreover, this paper uses point-wise mutual information to measure the semantic similarity between words, which is more consistent with human intuition on semantic similarity. Experiments on public data sets show that our PMIVec model can consistently outperform state-of-the-art models.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available