4.5 Article

Big data analytics for critical information classification in online social networks using classifier chains

期刊

PEER-TO-PEER NETWORKING AND APPLICATIONS
卷 15, 期 1, 页码 626-641

出版社

SPRINGER
DOI: 10.1007/s12083-021-01269-1

关键词

Big data; Age-group classifier; Gender classifier; Feature selection; Feature transformation; Multi-label classification

向作者/读者索取更多资源

Industrial and academic organizations are utilizing online social networks for various purposes, with researchers using sentiment analysis and data mining to evaluate user-generated content. This study constructed a novel dataset of 160,000 Twitter users' writing characteristics and applied Machine Learning techniques to create classification models, demonstrating that Random Forest, XGBoost, and Decision Tree algorithms achieved the best performance results. The proposed multidimensional learning technique using Classifier Chain transformation overcame similar proposals, with all algorithms reaching the same F1 micro-average value of 0.976 in testing.
Industrial and academic organizations are using online social network (OSN) for different purposes, such as social and economic aspects. Now, OSN is a new mean of obtaining information from people about their preferences, and interests. Due to the large volume of user-generated content, researchers use various techniques, such as sentiment analysis or data mining to evaluate this information automatically. However, the sentiment analysis of OSN content is performed by different methods, but there are some problems to obtain highly reliable results, mainly because of the lack of user profile information, such as gender and age. In this work, a novel dataset is built, which contains the writing characteristics of 160,000 users of the Twitter OSN. Before creating classification models with Machine Learning (ML) techniques, feature transformation and feature selection methods are applied to determine the most relevant set of characteristics. To create the models, the Classifier Chain (CC) transformation technique and different machine learning algorithms are applied to the training set. Simulation results show that the Random Forest, XGBoost and Decision Tree algorithms obtain the best performance results. In the testing phase, these algorithms reached Hamming Loss values of 0.033, 0.033, and 0.034, respectively, and all of them reached the same F1 micro-average value equal to 0.976. Therefore, our proposal based on a multidimensional learning technique using CC transformation overcomes other similar proposals.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据