4.7 Article

Transductive transfer learning based Genetic Programming for balanced and unbalanced document classification using different types of features

期刊

APPLIED SOFT COMPUTING
卷 103, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.asoc.2021.107172

关键词

Genetic Programming; Document classification; Transfer learning

向作者/读者索取更多资源

This paper introduces a transductive transfer learning method for document classification using two different text feature representations to share knowledge and improve performance. It shows that programs learned from TF and doc2vec can be alternatively used to improve each other. Furthermore, it addresses the unbalanced dataset problem by considering the unbalanced distributions on categories.
Document classification is one of the predominant tasks in Natural Language Processing. However, some document classification tasks do not have ground truth while other similar datasets may have ground truth. Transfer learning can utilize similar datasets with ground truth to train effective classifiers on the dataset without ground truth. This paper introduces a transductive transfer learning method for document classification using two different text feature representations?the term frequency (TF) and the semantic feature doc2vec. It has three main contributions. First, it enables the sharing knowledge in a dataset using TF and a dataset using doc2vec in transductive transfer learning for performance improvement. Second, it demonstrates that the partially learned programs from TFs and from doc2vecs can be alternatively used to ?label then learn?and they improve each other. Lastly, it addresses the unbalanced dataset problem by considering the unbalanced distributions on categories for evolving proper Genetic Programming (GP) programs on the target domains. Our experimental results on two popular document datasets show that the proposed technique effectively transfers knowledge from the GP programs evolved from the source domains to the new GP programs on the target domains using TF or doc2vec. There are obviously more than 10 percentages improvement achieved by the GP programs evolved by the proposed method over the GP programs directly evolved from the source domains. Also, the proposed technique effectively utilizes GP programs evolved from unbalanced datasets (on the source and target domains) to evolve new GP programs on the target domains, which balances predictions on different categories. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据