4.6 Article

TextNetTopics: Text Classification Based Word Grouping as Topics and Topics' Scoring

期刊

FRONTIERS IN GENETICS
卷 13, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2022.893378

关键词

text classification; topics detection; grouping; ranking; feature reduction; medical documents; latent dirichlet allocation (LDA); feature selection

资金

  1. Zefat Academic College

向作者/读者索取更多资源

Medical document classification is a challenging research problem within text classification, and TextNetTopics proposes a novel approach of feature selection based on Bag-of-topics instead of the traditional Bag-of-words. The approach, using the G-S-M method, scores topics to select the top topics for training the classifier, leading to improved accuracy.
Medical document classification is one of the active research problems and the most challenging within the text classification domain. Medical datasets often contain massive feature sets where many features are considered irrelevant, redundant, and add noise, thus, reducing the classification performance. Therefore, to obtain a better accuracy of a classification model, it is crucial to choose a set of features (terms) that best discriminate between the classes of medical documents. This study proposes TextNetTopics, a novel approach that applies feature selection by considering Bag-of-topics (BOT) rather than the traditional approach, Bag-of-words (BOW). Thus our approach performs topic selections rather than words selection. TextNetTopics is based on the generic approach entitled G-S-M (Grouping, Scoring, and Modeling), developed by Yousef and his colleagues and used mainly in biological data. The proposed approach suggests scoring topics to select the top topics for training the classifier. This study applied TextNetTopics to textual data to respond to the CAMDA challenge. TextNetTopics outperforms various feature selection approaches while highly performing when applying the model to the validation data provided by the CAMDA. Additionally, we have applied our algorithm to different textual datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据