skip to main content
10.1145/3581783.3612567acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning

Published:27 October 2023Publication History

ABSTRACT

Pseudo-labelling-based semi-supervised learning (SSL) has demonstrated remarkable success in enhancing model performance by effectively leveraging a large amount of unlabeled data. However, existing studies focus mainly on rectifying individual predictions (i.e., pseudo-labels) on each unlabeled instance but ignore the overall prediction statistics from a global perspective. Such neglect may lead to model collapse and performance degradation in SSL, especially in label-scarce scenarios. In this paper, we emphasize the cruciality of global prediction constraints and propose a new SSL method that employs Entropy-based optimization on both Individual and Global predictions of unlabeled instances, dubbed EntInG. Specifically, we propose two criteria for leveraging unlabeled data in SSL: individual prediction entropy minimization (IPEM) and global distribution entropy maximization (GDEM). On the one hand, we show that current dominant SSL methods can be viewed as an implicit form of IPEM improved by recent augmentation techniques. On the other hand, we construct a new distribution loss to encourage GDEM, which greatly benefits producing better pseudo-labels for unlabeled data. Theoretical analysis also demonstrates that our proposed criteria can be derived by enforcing mutual information maximization on unlabeled instances. Despite its simplicity, our proposed method can achieve significant accuracy gains on popular SSL classification benchmarks.

References

  1. Abulikemu Abuduweili, Xingjian Li, et al. 2021. Adaptive Consistency Regularization for Semi-Supervised Transfer Learning. In CVPR. 6923--6932.Google ScholarGoogle Scholar
  2. Luís B Almeida. 2003. MISEP--inear and nonlinear ICA based on mutual information. The journal of Machine Learning Research, Vol. 4 (2003), 1297--1318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Philip Bachman, Ouais Alsharif, and Doina Precup. 2014. Learning with pseudo-ensembles. NeurIPS.Google ScholarGoogle Scholar
  4. Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. 2018. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018).Google ScholarGoogle Scholar
  5. Anthony J Bell and Terrence J Sejnowski. 1995. An information-maximization approach to blind separation and blind deconvolution. Neural computation, Vol. 7, 6 (1995), 1129--1159.Google ScholarGoogle Scholar
  6. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In ICML. 41--48.Google ScholarGoogle Scholar
  7. David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2020. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. In ICLR.Google ScholarGoogle Scholar
  8. David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. In NeurIPS, Vol. 32.Google ScholarGoogle Scholar
  9. Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 17th annual conference on computational learning theory. 92--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John S Bridle, Anthony JR Heading, and David JC MacKay. 1992. Unsupervised Classifiers, Mutual Information and 'Phantom Targets'. In NeurIPS, Vol. 4.Google ScholarGoogle Scholar
  11. Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, and Vicente Ordonez. 2021. Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning. In AAAI, Vol. 35. 6912--6920.Google ScholarGoogle ScholarCross RefCross Ref
  12. Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, Vol. 20, 3 (2009), 542--542.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dongdong Chen, Wei Wang, Wei Gao, and Zhihua Zhou. 2018. Tri-net for semi-supervised deep learning. In IJCAI. 2014--2020.Google ScholarGoogle Scholar
  14. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML. 1597--1607.Google ScholarGoogle Scholar
  15. Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In AIStat.Google ScholarGoogle Scholar
  16. Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops. https://doi.org/10.1109/CVPRW50498.2020.00359Google ScholarGoogle ScholarCross RefCross Ref
  17. Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. NeurIPS, Vol. 26.Google ScholarGoogle Scholar
  18. Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).Google ScholarGoogle Scholar
  19. Yue Duan, Zhen Zhao, Lei Qi, Lei Wang, Luping Zhou, Yinghuan Shi, and Yang Gao. 2022. MutexMatch: semi-supervised learning with mutex-based consistency regularization. IEEE Transactions on Neural Networks and Learning Systems (2022).Google ScholarGoogle Scholar
  20. Chengyue Gong, Dilin Wang, and Qiang Liu. 2021. Alphamatch: Improving consistency for semi-supervised learning with alpha-divergence. In CVPR. 13683--13692.Google ScholarGoogle Scholar
  21. Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. MIT press Cambridge.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. In NeurIPS. 529--536.Google ScholarGoogle Scholar
  23. Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised learning by entropy minimization.. In CAP. 281--296.Google ScholarGoogle Scholar
  24. Guan Gui, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, and Yinghuan Shi. 2022. Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class. In NeurIPS. 19849--19860.Google ScholarGoogle Scholar
  25. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR. 9729--9738.Google ScholarGoogle Scholar
  26. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google ScholarGoogle Scholar
  27. R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning deep representations by mutual information estimation and maximization. In ICLR.Google ScholarGoogle Scholar
  28. Zijian Hu, Zhengyu Yang, Xuefeng Hu, and Ram Nevatia. 2021. Simple: Similar pseudo label exploitation for semi-supervised classification. In CVPR. 15099--15108.Google ScholarGoogle Scholar
  29. Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).Google ScholarGoogle Scholar
  30. Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2021. A survey on contrastive self-supervised learning. Technologies, Vol. 9, 1 (2021), 2.Google ScholarGoogle ScholarCross RefCross Ref
  31. Byoungjip Kim, Jinho Choo, Yeong-Dae Kwon, Seongho Joe, Seungjai Min, and Youngjune Gwon. 2020. SelfMatch: Combining Contrastive Self-Supervision and Consistency for Semi-Supervised Learning. In NeurIPS Workshop.Google ScholarGoogle Scholar
  32. A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, Vol. 1, 4 (2009).Google ScholarGoogle Scholar
  33. Samuli Laine and Timo Aila. 2017. Temporal ensembling for semi-supervised learning. In ICLR.Google ScholarGoogle Scholar
  34. Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop.Google ScholarGoogle Scholar
  35. Junnan Li, Caiming Xiong, and Steven Hoi. 2021. CoMatch: Semi-supervised Learning with Contrastive Graph Regularization. In ICCV. 9475--9484.Google ScholarGoogle Scholar
  36. Ralph Linsker. 1988. Self-organization in a perceptual network. Computer, Vol. 21, 3 (1988), 105--117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, and Jie Tang. 2020. Self-supervised learning: Generative or contrastive. arXiv preprint arXiv:2006.08218, Vol. 1, 2 (2020).Google ScholarGoogle Scholar
  38. Geoffrey J McLachlan. 1975. Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Amer. Statist. Assoc., Vol. 70, 350 (1975), 365--369.Google ScholarGoogle ScholarCross RefCross Ref
  39. Lu Mi, Hao Wang, Yonglong Tian, and Nir Shavit. 2019. Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate. arXiv preprint arXiv:1910.04858 (2019).Google ScholarGoogle Scholar
  40. Yuval Netzer and Tao Wang. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NeurIPS Workshop.Google ScholarGoogle Scholar
  41. Avital Oliver, Augustus Odena, Colin Raffel, Ekin D Cubuk, and Ian J Goodfellow. 2018. Realistic evaluation of deep semi-supervised learning algorithms. arXiv preprint arXiv:1804.09170 (2018).Google ScholarGoogle Scholar
  42. Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).Google ScholarGoogle Scholar
  43. Yassine Ouali, Céline Hudelot, and Myriam Tami. 2020. An Overview of Deep Semi-Supervised Learning. arXiv preprint arXiv:2006.05278 (2020).Google ScholarGoogle Scholar
  44. Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. 2018. Deep co-training for semi-supervised image recognition. In ECCV. 135--152.Google ScholarGoogle Scholar
  45. Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In NeurIPS, Vol. 28.Google ScholarGoogle Scholar
  46. Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, and Mubarak Shah. 2021. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. In ICLR.Google ScholarGoogle Scholar
  47. Chuck Rosenberg, Martial Hebert, and Henry Schneiderman. 2005. Semi-Supervised Self-Training of Object Detection Models. In WACV Workshops.Google ScholarGoogle Scholar
  48. Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685 (2020).Google ScholarGoogle Scholar
  49. Kai Sheng Tai, Peter Bailis, and Gregory Valiant. 2021. Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training. ICML.Google ScholarGoogle Scholar
  50. Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS, Vol. 30.Google ScholarGoogle Scholar
  51. Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. 2019. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825 (2019).Google ScholarGoogle Scholar
  52. Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR. 3733--3742.Google ScholarGoogle Scholar
  53. Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. 2019. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019).Google ScholarGoogle Scholar
  54. Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. 2020. Self-training with noisy student improves imagenet classification. In CVPR. 10687--10698.Google ScholarGoogle Scholar
  55. I Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).Google ScholarGoogle Scholar
  56. Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, Wei Zhang, Chengjie Wang, and Long Zeng. 2022. Class-aware contrastive semi-supervised learning. In CVPR. 14421--14430.Google ScholarGoogle Scholar
  57. Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).Google ScholarGoogle Scholar
  58. Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. 2019. S4l: Self-supervised semi-supervised learning. In ICCV. 1476--1485.Google ScholarGoogle Scholar
  59. Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, and Takahiro Shinozaki. 2021. FlexMatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling. In NeurIPS.Google ScholarGoogle Scholar
  60. Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).Google ScholarGoogle Scholar
  61. Zhen Zhao, Luping Zhou, Yue Duan, Lei Wang, Lei Qi, and Yinghuan Shi. 2022a. DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning. In CVPR. 9757--9765.Google ScholarGoogle Scholar
  62. Zhen Zhao, Luping Zhou, Lei Wang, Yinghuan Shi, and Yang Gao. 2022b. LaSSL: Label-guided Self-training for Semi-supervised Learning. In AAAI. 9208--9216.Google ScholarGoogle Scholar

Index Terms

  1. Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '23: Proceedings of the 31st ACM International Conference on Multimedia
        October 2023
        9913 pages
        ISBN:9798400701085
        DOI:10.1145/3581783

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)100
        • Downloads (Last 6 weeks)22

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader