☆ 4.1 Article

Low resource language specific pre-processing and features for sentiment analysis task

LANGUAGE RESOURCES AND EVALUATION (2021)

期刊

LANGUAGE RESOURCES AND EVALUATION

卷 55, 期 4, 页码 947-969

出版社

SPRINGER

DOI: 10.1007/s10579-021-09541-9

关键词

Low resource; Pre-processing; Morphology; Sentiment analysis; Machine learning; Deep learning; Ensembled classifier; TF-IDF; BM25; Manipuri

类别

Computer Science, Interdisciplinary Applications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study conducted sentiment analysis on Manipuri using various machine learning approaches, improving classification results by performing language-specific preprocessing tasks and incorporating additional linguistic features.

Sentiment analysis is a classification task where polarity of textual data is identified, i.e. to analyze whether a sentence or document expresses a negative, positive or neutral sentiment. Manipuri is a less privileged, highly agglutinative and tonal language. Despite being a scheduled language of Indian Constitution, it is also a resource constrained language. In this work, we report the sentiment analysis for Manipuri using different types of machine learning based approaches. The dataset used in our work is collected from local daily newspaper. The novelty of this work is that we carry out language specific pre-processing tasks such as transliteration, building negative morpheme based lexicon and filtering of noisy words. Using them as additional linguistic features in our models improves the classification result in terms of precision, recall and F-score. The ensemble voting of best three classifiers based on TF-IDF perform better than BM25 based classifiers and other stand-alone classifiers. Based on this result, we attempt to classify the sentiment of news articles during a certain period of time. Further, we report the finding of deep learning based approaches on the same dataset.

Low resource language specific pre-processing and features for sentiment analysis task

期刊

LANGUAGE RESOURCES AND EVALUATION

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Low resource language specific pre-processing and features for sentiment analysis task

期刊

LANGUAGE RESOURCES AND EVALUATION

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文