4.6 Article

Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Artificial Intelligence

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Belal Abdullah Hezam Murshed et al.

Summary: This article examines the current state of the art in Short Text Topic Modeling (STTM) algorithms, providing a comprehensive survey, taxonomy, qualitative and quantitative study, and comparative analysis of representative models. It also discusses the future research directions and challenges in this field.

ARTIFICIAL INTELLIGENCE REVIEW (2023)

Article Computer Science, Artificial Intelligence

Topic Modeling of Short Texts: A Pseudo-Document View With Word Embedding Enhancement

Yuan Zuo et al.

Summary: Recent years have seen a rapid growth of online social media, resulting in short texts becoming the predominant form of information on the Internet. However, due to data scarcity, there is still a significant challenge in short-text topic modeling. This paper proposes a novel model called Pseudo-document-based Topic Model (PTM), which addresses data sparsity by introducing the concept of pseudo-documents to implicitly aggregate short texts. PTM achieves high accuracy and efficiency by modeling the topic distributions of latent pseudo-documents. Furthermore, a word embedding-enhanced PTM (WE-PTM) is introduced to alleviate data sparsity. Experimental results on real-world datasets demonstrate the effectiveness of the proposed models.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Artificial Intelligence

Feature selection techniques in the context of big data: taxonomy and analysis

Hudhaifa Mohammed Abdulwahab et al.

Summary: This article provides a comprehensive review of the latest feature selection (FS) approaches in the context of big data. It categorizes the existing methods based on their nature, search strategy, evaluation process, and feature structure. The article presents qualitative and quantitative analyses of FS methods, as well as experimental comparisons to evaluate their performance. It also highlights the research issues and open challenges in FS, serving as a guide for future research directions.

APPLIED INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

Cross-Lingual Text Reuse Detection at sentence level for English-Urdu language pair

Iqra Muneer et al.

Summary: This study introduces a large benchmark sentential cross-lingual corpus for English-Urdu language pair, with simulated cases of X-TR and state-of-the-art techniques for Cross-Lingual Text Reuse Detection. The results show promising performance for binary and ternary classification tasks, showcasing the significance of this corpus for promoting X-TRD research in under-resourced languages like Urdu.

COMPUTER SPEECH AND LANGUAGE (2022)

Article Computer Science, Information Systems

DEA-RNN: A Hybrid Deep Learning Approach for Cyberbullying Detection in Twitter Social Media Platform

Belal Abdullah Hezam Murshed et al.

Summary: This paper proposes a hybrid deep learning model called DEA-RNN to detect cyberbullying on the Twitter platform. The experimental results show that DEA-RNN outperforms other algorithms in all aspects.

IEEE ACCESS (2022)

Article Computer Science, Information Systems

A Novel Auto-Annotation Technique for Aspect Level Sentiment Analysis

Muhammad Aasim Qureshi et al.

Summary: This study proposes a fully automated annotation technique for aspect level sentiment analysis and validates it using a dataset of YouTube comments. The results show that the proposed technique achieves annotation quality comparable to manual annotation, while requiring significantly less computational cost.

CMC-COMPUTERS MATERIALS & CONTINUA (2022)

Article Computer Science, Information Systems

Analyzing the Quality of Twitter Data Streams

Franco Arolfo et al.

Summary: This paper addresses the problem of low and unpredictable data quality in Twitter data streams by re-defining quality dimensions and metrics, introducing a software tool for real-time analysis, and performing a thorough analysis of the data quality. The study reveals that the quality of Twitter streams is higher than expected.

INFORMATION SYSTEMS FRONTIERS (2022)

Article Computer Science, Artificial Intelligence

NSLPCD: Topic based tweets clustering using Node significance based label propagation community detection algorithm

Jagrati Singh et al.

Summary: Social networks like Twitter, Facebook have become the most widely used communication platforms for information propagation. This article proposes a novel topic detection approach - NSLPCD algorithm, which can detect topics faster without compromising accuracy by analyzing keyword frequency distribution and building keyword co-occurrence graph. Experimental results show that the proposed method is effective in quality and run-time performance compared to existing methods.

ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE (2021)

Article Social Sciences, Interdisciplinary

Social Media and Twitter Data Quality for New Social Indicators

Camilla Salvatore et al.

Summary: This paper discusses the quality issues of social media data, focusing on Twitter, and introduces a new data quality framework. The importance of using a mixed methods approach for data quality evaluation is emphasized.

SOCIAL INDICATORS RESEARCH (2021)

Article Computer Science, Information Systems

An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit

Stephan A. Curiskis et al.

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Artificial Intelligence

Automating fake news detection system using multi-level voting model

Sawinder Kaur et al.

SOFT COMPUTING (2020)

Article Computer Science, Information Systems

User group based emotion detection and topic discovery over short text

Jiachun Feng et al.

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS (2020)

Article Computer Science, Information Systems

Improving biterm topic model with word embeddings

Jiajia Huang et al.

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS (2020)

Article Computer Science, Information Systems

Unsupervised Semantic Approach of Aspect-Based Sentiment Analysis for Large-Scale User Reviews

Sumaia Mohammed Al-Ghuribi et al.

IEEE ACCESS (2020)

Article Computer Science, Information Systems

Collaboratively Modeling and Embedding of Latent Topics for Short Texts

Zheng Liu et al.

IEEE ACCESS (2020)

Article Computer Science, Information Systems

Sentiment analysis of multimodal twitter data

Akshi Kumar et al.

MULTIMEDIA TOOLS AND APPLICATIONS (2019)

Article Mathematics, Interdisciplinary Applications

Nowcasting earthquake damages with Twitter

Marcelo Mendoza et al.

EPJ DATA SCIENCE (2019)

Article Computer Science, Information Systems

Fuzzy topic modeling approach for text mining over short text

Junaid Rashid et al.

INFORMATION PROCESSING & MANAGEMENT (2019)

Article Transportation

Social media as a resource for sentiment analysis of Airport Service Quality (ASQ)

Luis Martin-Domingo et al.

JOURNAL OF AIR TRANSPORT MANAGEMENT (2019)

Article Computer Science, Information Systems

A Novel Hot Topic Detection Framework With Integration of Image and Short Text Information From Twitter

Chengde Zhang et al.

IEEE ACCESS (2019)

Proceedings Paper Computer Science, Theory & Methods

Global Vectors for Node Representations

Robin Brochier et al.

WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) (2019)

Article Computer Science, Information Systems

A Study of the Effects of Stemming Strategies on Arabic Document Classification

Yousif A. Alhaj et al.

IEEE ACCESS (2019)

Article Computer Science, Information Systems

GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts

Wenxin Liang et al.

IEEE ACCESS (2018)

Proceedings Paper Computer Science, Artificial Intelligence

A Temporal Topic Model for Noisy Mediums

Rob Churchill et al.

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Public Opinion Clustering for Hot Event Based on BR-LDA Model

Ningning Ni et al.

INTELLIGENT INFORMATION PROCESSING IX (2018)

Article Computer Science, Information Systems

Using sentiment analysis to define twitter political users' classes and their homophily during the 2016 American presidential election

Josemar A. Caetano et al.

JOURNAL OF INTERNET SERVICES AND APPLICATIONS (2018)

Article Hospitality, Leisure, Sport & Tourism

Tourist Activity Analysis by Leveraging Mobile Social Media Data

Huy Quan Vu et al.

JOURNAL OF TRAVEL RESEARCH (2018)

Article Computer Science, Artificial Intelligence

Word co-occurrence augmented topic model in short text

Guan-Bin Chen et al.

INTELLIGENT DATA ANALYSIS (2017)

Article Computer Science, Information Systems

Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis

Zhao Jianqiang et al.

IEEE ACCESS (2017)

Proceedings Paper Computer Science, Information Systems

Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data

Anjie Fang et al.

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017 (2017)

Article Computer Science, Information Systems

Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings

Chenliang Li et al.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2017)

Proceedings Paper Physics, Applied

The Effects of Pre-Processing Strategies in Sentiment Analysis of Online Movie Reviews

Harnani Mat Zin et al.

2ND INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND TECHNOLOGY 2017 (ICAST'17) (2017)

Article Computer Science, Artificial Intelligence

Word network topic model: a simple but general solution for short and imbalanced texts

Yuan Zuo et al.

KNOWLEDGE AND INFORMATION SYSTEMS (2016)

Article Computer Science, Information Systems

Privacy Preserving Social Network Data Publication

Jemal H. Abawajy et al.

IEEE COMMUNICATIONS SURVEYS AND TUTORIALS (2016)

Article Engineering, Electrical & Electronic

Filtering Redundant Data from RFID Data Streams

Hazalila Kamaludin et al.

JOURNAL OF SENSORS (2016)

Article Computer Science, Theory & Methods

Comprehensive analysis of big data variety landscape

Jemal Abawajy

INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS (2015)

Article Computer Science, Artificial Intelligence

BTM: Topic Modeling over Short Texts

Xueqi Cheng et al.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2014)

Article Computer Science, Information Systems

A study of the effects of preprocessing strategies on sentiment analysis for Arabic text

Rehab Duwairi et al.

JOURNAL OF INFORMATION SCIENCE (2014)

Proceedings Paper Computer Science, Artificial Intelligence

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media

Himabindu Lakkaraju et al.

12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012) (2012)

Article Chemistry, Analytical

An Approach for Removing Redundant Data from RFID Data Streams

Hairulnizam Mahdin et al.

SENSORS (2011)

Article Computer Science, Information Systems

An algorithm for suffix stripping

M. F. Porter

PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS (2006)

Article Multidisciplinary Sciences

Finding scientific topics

TL Griffiths et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2004)