SRPM-ST: Sequential retraining and pseudo-labeling in mini-batches for self-training

Mukhamediya A. Zollanvari A.
7 November 2024 Elsevier B.V.

Neurocomputing
2024 #605

An impediment to training accurate classifiers in supervised learning is the scarcity of labeled data. In that respect, semi-supervised learning could help by using both labeled and unlabeled data. A specific form of semi-supervised learning is self-training (ST). In its basic form, ST trains an initial classifier using the labeled data to generate pseudo-labels for the unlabeled set. At this point, either the whole set of pseudo-labeled data or a subset of them with some high confidence scores about the generated pseudo-labels is selected. The selected pseudo-labeled data are then used to update the initial classifier. Although this process can be repeated to generate new pseudo-labels for the unlabeled data, it is typically a tacit assumption up to this point that the classifier is updated once all pseudo-labels are generated—a process to which we refer as the full-batch ST (F-ST) regardless of any confidence score-based subset selection. Here, we show that sequential retraining and pseudo-labeling in mini-batches (SRPM) could potentially improve the performance of the classifier with respect to F-ST. Our empirical results show the existence of a data-dependent mini-batch size for SRPM that is optimal in terms of possessing the least error rate. In practice, this parameter could be treated as a hyperparameter to tune.

Pseudo-labeling , Self-training , Semi-supervised learning

Text of the article Перейти на текст статьи

Department of Electrical and Computer Engineering, School of Engineering and Digital Sciences, Nazarbayev University, Kabanbay batyr 53, Astana, 010000, Kazakhstan

Department of Electrical and Computer Engineering

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026