4.2 Article

TopicStriKer: A topic kernels-powered approach for text classification

期刊

RESULTS IN ENGINEERING
卷 17, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.rineng.2023.100949

关键词

Topic modeling; String kernels; Text classification; String embedding; Topic sequence

向作者/读者索取更多资源

TopicStriKer is a model that combines unsupervised topic modeling with supervised string kernels for text classification tasks. It reduces the document corpus using co-occurring topic words and topic proportions per document, and utilizes string kernels for classification, resulting in improved accuracy and reduced training time.
Topic models are unsupervised machine learning techniques that output clusters of topics represented as co-occurring words with their associated probability distributions. Topic modeling algorithms find latent themes from large document collections by understanding their context. On the other hand, string kernels are supervised machine-learning techniques that quantify string similarities without explicit string encoding. We propose TopicStriKer, a model combining the advantages of unsupervised topic modeling with supervised string kernels for text classification tasks. The co-occurring topic words per topic and topic proportions per document obtained are used to reduce the document corpus to a topic-word sequence. This reduced representation is then used for text classification with the aid of string kernels, significantly improving accuracy and reducing training time. Experiments on the bag-of-words kernel-based string embeddings using the proposed algorithm outperform the traditional text classification approaches. This work extensively compares string kernels with topic modeling on various performance metrics to establish our findings.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据