4.6 Article

Role of twitter user profile features in retweet prediction for big data streams

期刊

MULTIMEDIA TOOLS AND APPLICATIONS
卷 81, 期 19, 页码 27309-27338

出版社

SPRINGER
DOI: 10.1007/s11042-022-12815-1

关键词

Twitter; Social media analysis; Retweet prediction; User behavior; User profiling; Big data analysis

向作者/读者索取更多资源

This research explores the influence of numerical features extracted from user profiles on the process of information sharing on Twitter. The study finds that user profile features have a better predictive accuracy for retweets and user behavior compared to tweet content features, and their combined use performs even better.
To study the various factors influencing the process of information sharing on Twitter is a very active research area. This paper aims to explore the impact of numerical features extracted from user profiles in retweet prediction from the real-time raw feed of tweets. The originality of this work comes from the fact that the proposed model is based on simple numerical features with the least computational complexity, which is a scalable solution for big data analysis. This research work proposes three new features from the tweet author profile to capture the unique behavioral pattern of the user, namely Author total activity, Author total activity per year, and Author tweets per year. The features set is tested on a dataset of 100 million random tweets collected through Twitter API. The binary labels regression gave an accuracy of 0.98 for user-profile features and gave an accuracy of 0.99 when combined with tweet content features. The regression analysis to predict the retweet count gave an R-squared value of 0.98 with combined features. The multi-label classification gave an accuracy of 0.9 for combined features and 0.89 for user-profile features. The user profile features performed better than tweet content features and performed even better when combined. This model is suitable for near real-time analysis of live streaming data coming through Twitter API and provides a baseline pattern of user behavior based on numerical features available from user profiles only.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据