期刊
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS
卷 18, 期 1, 页码 -出版社
IGI GLOBAL
DOI: 10.4018/IJSWIS.309428
关键词
Out of Vocabulary (OOV); Social Media; Word Embedding; Word2Vec
资金
- National Research Foundation of Korea (NRF) - Korean government (MSIP) [NRF-2020R1A2B5B01002207, NRF-2021R1I1A1A01060302]
This chapter proposes the use of a contextual Word2Vec model for understanding OOV. The authors extract the OOV using left-right entropy and point information entropy. They construct a word vector space using Word2Vec and obtain contextual information using CBOW. The results show that the proposed model achieves a higher accuracy rate than Skip-Gram.
In this chapter, the authors propose to use contextual Word2Vec model for understanding OOV (out of vocabulary). The OOV is extracted by using left-right entropy and point information entropy. They choose to use Word2Vec to construct the word vector space and CBOW (continuous bag of words) to obtain the contextual information of the words. If there is a word that has similar contextual information to the OOV, the word can be used to understand the OOV. They chose the Weibo corpus as the dataset for the experiments. The results show that the proposed model achieves 97.10% accuracy, which is better than Skip-Gram by 8.53%.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据