☆ 4.1 Article

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

FRONTIERS IN ARTIFICIAL INTELLIGENCE (2020)

期刊

FRONTIERS IN ARTIFICIAL INTELLIGENCE

卷 3, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA

DOI: 10.3389/frai.2020.00042

关键词

natural language processing; topic modeling; short text; user-generated content; online social networks

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequently used topic modeling methods, as applied to short textual social data, to show their benefits practically in detecting important topics. These methods are latent semantic analysis, latent Dirichlet allocation, non-negative matrix factorization, random projection, and principal component analysis. Two textual datasets were selected to evaluate the performance of included topic modeling methods based on the topic quality and some standard statistical evaluation metrics, like recall, precision, F-score, and topic coherence. As a result, latent Dirichlet allocation and non-negative matrix factorization methods delivered more meaningful extracted topics and obtained good results. The paper sheds light on some common topic modeling methods in a short-text context and provides direction for researchers who seek to apply these methods.

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

期刊

FRONTIERS IN ARTIFICIAL INTELLIGENCE

出版社

FRONTIERS MEDIA SA

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

期刊

FRONTIERS IN ARTIFICIAL INTELLIGENCE

出版社

FRONTIERS MEDIA SA

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文