4.6 Article

An Efficient Approach to Select Instances in Self-Training and Co-Training Semi-Supervised Methods

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 7254-7276

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3138682

Keywords

Semisupervised learning; Training; Labeling; Classification algorithms; Prediction algorithms; Machine learning; Supervised learning; Artificial intelligence; machine learning; semi-supervised learning; self-training semi-supervised method; co-training semi-supervised method

Funding

  1. Federal University of Rio Grande do Norte
  2. Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior-Brazil (CAPES) [001]

Ask authors/readers for more resources

Semi-supervised learning is a machine learning approach that combines supervised and unsupervised learning mechanisms, aiming to enhance performance using labeled and unlabeled data. This paper investigates and improves two well-known semi-supervised learning algorithms, self-training and co-training. Three methods are proposed to automate the labeling process of unlabeled instances, with different confidence rate calculations and label selection strategies. An empirical analysis on 30 datasets with diverse characteristics demonstrates that all three proposed methods outperform the original self-training and co-training methods in most cases.
Semi-supervised learning is a machine learning approach that integrates supervised and unsupervised learning mechanisms. In this learning, most of labels in the training set are unknown, while there is a small part of data that has known labels. The semi-supervised learning is attractive due to its potential to use labeled and unlabeled data to perform better than supervised learning. This paper consists of a study in the field of semi-supervised learning and implements changes on two well-known semi-supervised learning algorithms: self-training and co-training. In the literature, it is common to develop researches that change the structure of these algorithms, however, none of them proposes automating the labeling process of unlabeled instances, which is the main purpose of this work. In order to achieve this goal, three methods are proposed: FlexCon-G, FlexCon and FlexCon-C. The main difference among these methods is the way in which the confidence rate is calculated and the strategy used to select a label in each iteration. In order to evaluate the proposed methods' performance, an empirical analysis is conducted, in which the performance of these methods has been evaluated on 30 datasets with different characteristics. The obtained results indicate that all three proposed methods perform better than the original self-training and co-training methods, in most analysed cases.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available