☆ 4.4 Article

A Semantic Embedding Enhanced Topic Model For User-Generated Textual Content Modeling In Social Ecosystems

COMPUTER JOURNAL (2022)

Journal

COMPUTER JOURNAL

Volume 65, Issue 11, Pages 2953-2968

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/comjnl/bxac091

Keywords

Social Ecosystems; User-generated Textual Content; Topic Model; Semantic Embedding; Twitter; Weibo

Funding

National Natural Science Foun-dation of China (NSFC) [61932007, 61902075]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The development of ICT and Web 2.0 has led to the emergence of diverse social ecosystems. User-generated textual content is the most important type of content in these ecosystems, but current modeling methods have limitations. Therefore, we propose a new model that can accurately model user-generated textual content in social ecosystems.

The development of Information and Communication Technologies (ICT) and Web 2.0 promotes the emergence of diverse social ecosystems like social Internet of Things (IoT), social media and online communities. User-generated textual content (UGTC), which consists of unstructured texts, is the most important and common type of user-generated content in social ecosystems. UGTC in social ecosystems is generated according to two types of context information-global context (topics) and local context (semantic regularities). For UGTC modeling, topic models just consider global context but ignore semantic regularities, while semantic embedding models are on the opposite. So only utilizing topic models or semantic embedding models to model UGTC suffers from some drawbacks. For this problem, we propose a semantic embedding enhanced topic model named SEE-Twitter-LDA for accurately modeling UGTC in social ecosystems. The core of SEE-Twitter-LDA is that words are generated according to mutual semantic information of topics and semantic regularities. So global context and local context are jointly considered for UGTC modeling. By utilizing 553 098 tweets sampled from Twitter and 211 233 posts sampled from Weibo, we validate SEE-Twitter-LDA's better performance on perplexity, topic divergence and topic coherence versus existing related models.

A Semantic Embedding Enhanced Topic Model For User-Generated Textual Content Modeling In Social Ecosystems

Journal

COMPUTER JOURNAL

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Semantic Embedding Enhanced Topic Model For User-Generated Textual Content Modeling In Social Ecosystems

Journal

COMPUTER JOURNAL

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper