4.1 Article

Low resource language specific pre-processing and features for sentiment analysis task

期刊

LANGUAGE RESOURCES AND EVALUATION
卷 55, 期 4, 页码 947-969

出版社

SPRINGER
DOI: 10.1007/s10579-021-09541-9

关键词

Low resource; Pre-processing; Morphology; Sentiment analysis; Machine learning; Deep learning; Ensembled classifier; TF-IDF; BM25; Manipuri

向作者/读者索取更多资源

This study conducted sentiment analysis on Manipuri using various machine learning approaches, improving classification results by performing language-specific preprocessing tasks and incorporating additional linguistic features.
Sentiment analysis is a classification task where polarity of textual data is identified, i.e. to analyze whether a sentence or document expresses a negative, positive or neutral sentiment. Manipuri is a less privileged, highly agglutinative and tonal language. Despite being a scheduled language of Indian Constitution, it is also a resource constrained language. In this work, we report the sentiment analysis for Manipuri using different types of machine learning based approaches. The dataset used in our work is collected from local daily newspaper. The novelty of this work is that we carry out language specific pre-processing tasks such as transliteration, building negative morpheme based lexicon and filtering of noisy words. Using them as additional linguistic features in our models improves the classification result in terms of precision, recall and F-score. The ensemble voting of best three classifiers based on TF-IDF perform better than BM25 based classifiers and other stand-alone classifiers. Based on this result, we attempt to classify the sentiment of news articles during a certain period of time. Further, we report the finding of deep learning based approaches on the same dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据