4.6 Article

Systematic Comparison of Vectorization Methods in Classification Context

期刊

APPLIED SCIENCES-BASEL
卷 12, 期 10, 页码 -

出版社

MDPI
DOI: 10.3390/app12105119

关键词

natural language processing; text vectorization; Continuous Bag of Words; skip-gram; k-nearest neighbors; Naive Bayesian Classifier

向作者/读者索取更多资源

This study compared different text vectorization methods in natural language processing, especially in Text Mining, by checking the accuracy of classification. The methods NBC and k-NN were used to avoid the influence of method choice on the final result, providing a basis for further research in better automatic text analysis.
Natural language processing has been the subject of numerous studies in the last decade. These have focused on the various stages of text processing, from text preparation to vectorization to final text comprehension. The goal of vector space modeling is to project words in a language corpus into a vector space in such a way that words that are similar in meaning are close to each other. Currently, there are two commonly used approaches to the topic of vectorization. The first focuses on creating word vectors taking into account the entire linguistic context, while the second focuses on creating document vectors in the context of the linguistic corpus of the analyzed texts. The paper presents the comparison of different existing text vectorization methods in natural language processing, especially in Text Mining. The comparison of text vectorization methods is possible by checking the accuracy of classification; we used the methods NBC and k-NN, as they are some of the simplest methods. They were used for the classification in order to avoid the influence of the choice of the method itself on the final result. The conducted experiments provide a basis for further research for better automatic text analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据