期刊
IEEE ACCESS
卷 9, 期 -, 页码 99954-99967出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3096238
关键词
Task analysis; Hidden Markov models; Bit error rate; Data models; Training; Dictionaries; Semantics; Artificial intelligence; deep learning; natural language processing; neural networks; new words; word embedding
资金
- Basic Science Research Program through the NRF (National Research Foundation of Korea) - MSIT (Ministry of Science and ICT)
- Gachon University [NRF2019R1A2C1008412, GCU-2019-0773]
Most embedding models in natural language processing require retraining the entire model to get embedding values for new words, which is inefficient. To address this, a new embedding model MVS is proposed, which can obtain embedding values for new words without retraining the existing model.
Most embedding models used in natural language processing require retraining of the entire model to obtain the embedding value of a new word. In the current system, as retraining is repeated, the amount of data used for learning gradually increases. It is thus very inefficient to retrain the entire model whenever some new words emerge. Moreover, since a language has a huge number of words and its characteristics change continuously over time, it is not easy to embed all words. To solve both problems, we propose a new embedding model, the Mirroring Vector Space (MVS), which enables us to obtain a new word embedding by using the previously built word embedding model without retraining it. The MVS embedding model has a convolutional neural networks (CNN) structure and presents a novel strategy to obtain word embeddings. It predicts the embedding value of a word by learning the vector space of an existing embedding model using the explanations of the word. It also provides flexibility for external resources, reusability for training times, and portability in the point that our model can be used with any models. We verify these three attributes and the novelty in our experiments.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据