ABSTRACT
Pseudo-labelling-based semi-supervised learning (SSL) has demonstrated remarkable success in enhancing model performance by effectively leveraging a large amount of unlabeled data. However, existing studies focus mainly on rectifying individual predictions (i.e., pseudo-labels) on each unlabeled instance but ignore the overall prediction statistics from a global perspective. Such neglect may lead to model collapse and performance degradation in SSL, especially in label-scarce scenarios. In this paper, we emphasize the cruciality of global prediction constraints and propose a new SSL method that employs Entropy-based optimization on both Individual and Global predictions of unlabeled instances, dubbed EntInG. Specifically, we propose two criteria for leveraging unlabeled data in SSL: individual prediction entropy minimization (IPEM) and global distribution entropy maximization (GDEM). On the one hand, we show that current dominant SSL methods can be viewed as an implicit form of IPEM improved by recent augmentation techniques. On the other hand, we construct a new distribution loss to encourage GDEM, which greatly benefits producing better pseudo-labels for unlabeled data. Theoretical analysis also demonstrates that our proposed criteria can be derived by enforcing mutual information maximization on unlabeled instances. Despite its simplicity, our proposed method can achieve significant accuracy gains on popular SSL classification benchmarks.
- Abulikemu Abuduweili, Xingjian Li, et al. 2021. Adaptive Consistency Regularization for Semi-Supervised Transfer Learning. In CVPR. 6923--6932.Google Scholar
- Luís B Almeida. 2003. MISEP--inear and nonlinear ICA based on mutual information. The journal of Machine Learning Research, Vol. 4 (2003), 1297--1318.Google ScholarDigital Library
- Philip Bachman, Ouais Alsharif, and Doina Precup. 2014. Learning with pseudo-ensembles. NeurIPS.Google Scholar
- Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. 2018. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018).Google Scholar
- Anthony J Bell and Terrence J Sejnowski. 1995. An information-maximization approach to blind separation and blind deconvolution. Neural computation, Vol. 7, 6 (1995), 1129--1159.Google Scholar
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In ICML. 41--48.Google Scholar
- David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2020. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. In ICLR.Google Scholar
- David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. In NeurIPS, Vol. 32.Google Scholar
- Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 17th annual conference on computational learning theory. 92--100.Google ScholarDigital Library
- John S Bridle, Anthony JR Heading, and David JC MacKay. 1992. Unsupervised Classifiers, Mutual Information and 'Phantom Targets'. In NeurIPS, Vol. 4.Google Scholar
- Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, and Vicente Ordonez. 2021. Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning. In AAAI, Vol. 35. 6912--6920.Google ScholarCross Ref
- Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, Vol. 20, 3 (2009), 542--542.Google ScholarDigital Library
- Dongdong Chen, Wei Wang, Wei Gao, and Zhihua Zhou. 2018. Tri-net for semi-supervised deep learning. In IJCAI. 2014--2020.Google Scholar
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML. 1597--1607.Google Scholar
- Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In AIStat.Google Scholar
- Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops. https://doi.org/10.1109/CVPRW50498.2020.00359Google ScholarCross Ref
- Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. NeurIPS, Vol. 26.Google Scholar
- Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).Google Scholar
- Yue Duan, Zhen Zhao, Lei Qi, Lei Wang, Luping Zhou, Yinghuan Shi, and Yang Gao. 2022. MutexMatch: semi-supervised learning with mutex-based consistency regularization. IEEE Transactions on Neural Networks and Learning Systems (2022).Google Scholar
- Chengyue Gong, Dilin Wang, and Qiang Liu. 2021. Alphamatch: Improving consistency for semi-supervised learning with alpha-divergence. In CVPR. 13683--13692.Google Scholar
- Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. MIT press Cambridge.Google ScholarDigital Library
- Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. In NeurIPS. 529--536.Google Scholar
- Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised learning by entropy minimization.. In CAP. 281--296.Google Scholar
- Guan Gui, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, and Yinghuan Shi. 2022. Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class. In NeurIPS. 19849--19860.Google Scholar
- Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR. 9729--9738.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google Scholar
- R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning deep representations by mutual information estimation and maximization. In ICLR.Google Scholar
- Zijian Hu, Zhengyu Yang, Xuefeng Hu, and Ram Nevatia. 2021. Simple: Similar pseudo label exploitation for semi-supervised classification. In CVPR. 15099--15108.Google Scholar
- Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).Google Scholar
- Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2021. A survey on contrastive self-supervised learning. Technologies, Vol. 9, 1 (2021), 2.Google ScholarCross Ref
- Byoungjip Kim, Jinho Choo, Yeong-Dae Kwon, Seongho Joe, Seungjai Min, and Youngjune Gwon. 2020. SelfMatch: Combining Contrastive Self-Supervision and Consistency for Semi-Supervised Learning. In NeurIPS Workshop.Google Scholar
- A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, Vol. 1, 4 (2009).Google Scholar
- Samuli Laine and Timo Aila. 2017. Temporal ensembling for semi-supervised learning. In ICLR.Google Scholar
- Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop.Google Scholar
- Junnan Li, Caiming Xiong, and Steven Hoi. 2021. CoMatch: Semi-supervised Learning with Contrastive Graph Regularization. In ICCV. 9475--9484.Google Scholar
- Ralph Linsker. 1988. Self-organization in a perceptual network. Computer, Vol. 21, 3 (1988), 105--117.Google ScholarDigital Library
- Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, and Jie Tang. 2020. Self-supervised learning: Generative or contrastive. arXiv preprint arXiv:2006.08218, Vol. 1, 2 (2020).Google Scholar
- Geoffrey J McLachlan. 1975. Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Amer. Statist. Assoc., Vol. 70, 350 (1975), 365--369.Google ScholarCross Ref
- Lu Mi, Hao Wang, Yonglong Tian, and Nir Shavit. 2019. Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate. arXiv preprint arXiv:1910.04858 (2019).Google Scholar
- Yuval Netzer and Tao Wang. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NeurIPS Workshop.Google Scholar
- Avital Oliver, Augustus Odena, Colin Raffel, Ekin D Cubuk, and Ian J Goodfellow. 2018. Realistic evaluation of deep semi-supervised learning algorithms. arXiv preprint arXiv:1804.09170 (2018).Google Scholar
- Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).Google Scholar
- Yassine Ouali, Céline Hudelot, and Myriam Tami. 2020. An Overview of Deep Semi-Supervised Learning. arXiv preprint arXiv:2006.05278 (2020).Google Scholar
- Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. 2018. Deep co-training for semi-supervised image recognition. In ECCV. 135--152.Google Scholar
- Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In NeurIPS, Vol. 28.Google Scholar
- Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, and Mubarak Shah. 2021. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. In ICLR.Google Scholar
- Chuck Rosenberg, Martial Hebert, and Henry Schneiderman. 2005. Semi-Supervised Self-Training of Object Detection Models. In WACV Workshops.Google Scholar
- Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685 (2020).Google Scholar
- Kai Sheng Tai, Peter Bailis, and Gregory Valiant. 2021. Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training. ICML.Google Scholar
- Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS, Vol. 30.Google Scholar
- Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. 2019. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825 (2019).Google Scholar
- Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR. 3733--3742.Google Scholar
- Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. 2019. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019).Google Scholar
- Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. 2020. Self-training with noisy student improves imagenet classification. In CVPR. 10687--10698.Google Scholar
- I Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).Google Scholar
- Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, Wei Zhang, Chengjie Wang, and Long Zeng. 2022. Class-aware contrastive semi-supervised learning. In CVPR. 14421--14430.Google Scholar
- Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).Google Scholar
- Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. 2019. S4l: Self-supervised semi-supervised learning. In ICCV. 1476--1485.Google Scholar
- Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, and Takahiro Shinozaki. 2021. FlexMatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling. In NeurIPS.Google Scholar
- Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).Google Scholar
- Zhen Zhao, Luping Zhou, Yue Duan, Lei Wang, Lei Qi, and Yinghuan Shi. 2022a. DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning. In CVPR. 9757--9765.Google Scholar
- Zhen Zhao, Luping Zhou, Lei Wang, Yinghuan Shi, and Yang Gao. 2022b. LaSSL: Label-guided Self-training for Semi-supervised Learning. In AAAI. 9208--9216.Google Scholar
Index Terms
- Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning
Recommendations
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Semi-supervised partial label learning algorithm via reliable label propagation
AbstractPartial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...
Multiview Semi-Supervised Learning with Consensus
Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...
Comments