☆ 4.6 Article

Class Association and Attribute Relevancy Based Imputation Algorithm to Reduce Twitter Data for Optimal Sentiment Analysis

IEEE ACCESS (2019)

期刊

IEEE ACCESS

卷 7, 期 -, 页码 136535-136544

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2019.2942112

关键词

Classification; class association; dimensionality reduction; imputation; machine learning; preprocessing; Twitter sentiment analysis

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Twitter sentiment analysis is a challenging task that involves various preprocessing steps including dimensionality reduction. Dimensionality reduction helps ensure low computational complexity and performance improvement during the classification process. In Twitter data, each tweet has feature values which may or may not reflect a person's response. Therefore, a large number of sparse data points are generated when tweets are represented as feature matrix, eventually increasing computational overheads and error rates in Twitter sentiment analysis. This study proposes a novel preprocessing technique called class association and attribute relevancy based imputation algorithm (CAARIA) to reduce the Twitter data size. CAARIA achieves the dimensionality reduction goal by imputing those tweets that belong to the same class and also share useful information. The performance of two classifiers (Naive Bayes and support vector machines) is evaluated on three Twitter datasets in terms of classification accuracy, measured as area under curve, and time efficiency. CAARIA is also compared against two widely used feature selection (dimensionality reduction) techniques, information gain (IG) and Pearson's correlation (PC). The findings reveal that CAARIA outperforms IG and PC in terms of classification accuracy and time efficiency. These results suggest that CAARIA is a robust data preprocessing technique for the classification task.

Class Association and Attribute Relevancy Based Imputation Algorithm to Reduce Twitter Data for Optimal Sentiment Analysis

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Class Association and Attribute Relevancy Based Imputation Algorithm to Reduce Twitter Data for Optimal Sentiment Analysis

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文