☆ 4.5 Article

Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation

JOURNAL OF BIG DATA (2021)

期刊

JOURNAL OF BIG DATA

卷 8, 期 1, 页码 -

出版社

SPRINGERNATURE

DOI: 10.1186/s40537-021-00413-1

关键词

Accuracy; Classification; Indonesian; Stemming; Text processing

类别

Computer Science, Theory & Methods

资金

Ministry of Education and Culture Republic of Indonesian [07.1/LP/UG/III/2020]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces a new stemming method for non-formal Indonesian text processing which improves the accuracy of text classifier models. Experimental results show that the proposed stemming method outperforms existing methods, suggesting the development of a corpus for non-formal Indonesian text to enhance stemming methods.

Background: Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root. In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level. However, there are not many stemming methods for non-formal Indonesian text processing. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to improve the accuracy of text classifier models by strengthening stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. Findings: The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. These results indicate that the proposed stemming methods produces a classifier model with a small error rate, so it will be more accurate to predict a class of objects. Conclusion: The existing Indonesian stemming methods are still oriented towards Indonesian formal sentences, therefore the method has limitations to be used in Indonesian non-formal sentences. This phenomenon underlies the suggestion of developing a corpus by normalizing Indonesian non-formal into formal to be used as a better stemming method. The impact of using the corpus as a stemming method is that it can improve the accuracy of the classifier model. In the future, the proposed corpus and stemming methods can be used for various purposes including text clustering, summarizing, detecting hate speech, and other text processing applications in Indonesian.

Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation

期刊

JOURNAL OF BIG DATA

出版社

SPRINGERNATURE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation

期刊

JOURNAL OF BIG DATA

出版社

SPRINGERNATURE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文