Skip to main content
Log in

Perturbation consistency and mutual information regularization for semi-supervised semantic segmentation

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Recent semi-supervised learning has attracted much attention by leveraging the hidden structures learned from unlabeled data to reduce the number of required labels in the field of human-centric understanding. Most semi-supervised methods have been proposed to improve the performance of image classification, and their ideas cannot be directly applied to the task of semantic segmentation. In this paper, we propose a semi-supervised model for sematic segmentation named semi-supervised consistency segmentation (SCSeg). The performance gain benefits from two techniques—perturbation consistency and mutual information regularization. Perturbation consistency enforces the output consistency between the uncorrupted and perturbed features. Mutual information regularization adopts a mutual information loss to ensure the spatial consistency of adjacent patches on unlabeled data. The experimental results on Pascal VOC 2012 and Cityscapes datasets widely used in visual understanding tasks demonstrate that the proposed model outperforms the current semi-supervised segmentation methods under varying amounts of labeled data. The proposed model alleviates the pressure of annotation in human-centric practical multimedia applications towards semantic segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Liu, Z., Chen, H., Feng, R., Wu, S., Ji, S., Yang, B., Wang, X.: Deep dual consecutive network for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 525–534 (2021)

  2. Yang, X., Zhou, P., Wang, M.: Person reidentification via structural deep metric learning. IEEE Trans Neural Netw Learn Syst 30(10), 2987–2998 (2018)

    Article  Google Scholar 

  3. Yang, X., Wang, M., Tao, D.: Person re-identification with metric learning using privileged information. IEEE Trans. Image Process. 27(2), 791–805 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  4. Yang, X., Wang, M., Hong, R., Tian, Q., Rui, Y.: Enhancing person re-identification in a self-trained subspace. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 13(3), 1–23 (2017)

  5. Ben, X., Gong, C., Zhang, P., Yan, R., Wu, Q., Meng, W.: Coupled bilinear discriminant projection for cross-view gait recognition. IEEE Trans. Circuits Syst. Video Technol. 30(3), 734–747 (2019)

    Article  Google Scholar 

  6. Ben, X., Gong, C., Zhang, P., Jia, X., Wu, Q., Meng, W.: Coupled patch alignment for matching cross-view gaits. IEEE Trans. Image Process. 28(6), 3142–3157 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ben, X., Zhang, P., Lai, Z., Yan, R., Zhai, X., Meng, W.: A general tensor representation framework for cross-view gait recognition. Pattern Recogn. 90, 87–98 (2019)

    Article  Google Scholar 

  8. Liu, Z., Wu, S., Jin, S., Liu, Q., Lu, S., Zimmermann, R., Cheng, L.: Towards natural and accurate future motion prediction of humans and animals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10004–10012 (2019)

  9. Liu, Z., Zhang, L., Liu, Q., Yin, Y., Cheng, L., Zimmermann, R.: Fusion of magnetic and visual sensors for indoor localization: Infrastructure-free and more effective. IEEE Trans. Multimedia 19(4), 874–888 (2016)

    Article  Google Scholar 

  10. Ben, X., Ren, Y., Zhang, J., Wang, S.-J., Kpalma, K., Meng, W., Liu, Y.-J.: Video-based facial micro-expression analysis: A survey of datasets, features and algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)

  11. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks 20(3), 542–542 (2009)

  12. Lee, D.-H.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning (ICML), vol. 3, pp. 1–6 (2013)

  13. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)

  14. Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (NIPS), pp. 1195–1204 (2017)

  15. Miyato, T., Maeda, S.-I., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1979–1993 (2018)

    Article  Google Scholar 

  16. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: A holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 5049–5059 (2019)

  17. Sohn, K., Berthelot, D., Li, C.-L., Zhang, Z., Carlini, N., Cubuk, E.D., Kurakin, A., Zhang, H., Raffel, C.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685 (2020)

  18. Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems (NIPS), pp. 529–536 (2005)

  19. Qi, G.-J., Zhang, L., Hu, H., Edraki, M., Wang, J., Hua, X.-S.: Global versus localized generative adversarial nets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1517–1525 (2018)

  20. Li, X., Yu, L., Chen, H., Fu, C.-W., Xing, L., Heng, P.-A.: Transformation-consistent self-ensembling model for semisupervised medical image segmentation. IEEE Transactions on Neural Networks and Learning Systems 32(2), 523–534 (2020)

    Article  Google Scholar 

  21. Hung, W.-C., Tsai, Y.-H., Liou, Y.-T., Lin, Y.-Y., Yang, M.-H.: Adversarial learning for semi-supervised semantic segmentation. arXiv preprint arXiv:1802.07934 (2018)

  22. French, G., Laine, S., Aila, T., Mackiewicz, M., Finlayson, G.: Semi-supervised semantic segmentation needs strong, varied perturbations. In: British Machine Vision Conference (BMVC) (2020)

  23. Ouali, Y., Hudelot, C., Tami, M.: Semi-supervised semantic segmentation with cross-consistency training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12674–12684 (2020)

  24. Ke, Z., Di Qiu, K.L., Yan, Q., Lau, R.W.: Guided collaborative training for pixel-wise semi-supervised learning. In: Proceedings of the European Conference on Computer Vision (ECCV), vol. 2, p. 6 (2020). Springer

  25. Yin, Y., Liu, Z., Zimmermann, R.: Geographic information use in weakly-supervised deep learning for landmark recognition. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1015–1020 (2017). IEEE

  26. Yang, X., Liu, X., Jian, M., Gao, X., Wang, M.: Weakly-supervised video object grounding by exploring spatio-temporal contexts. In: Proceedings of the 28th ACM International Conference on Multimedia (ACMMM), pp. 1939–1947 (2020)

  27. Fan, J., Zhang, Z., Song, C., Tan, T.: Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4283–4292 (2020)

  28. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 6023–6032 (2019)

  29. Maaloe, L., Sonderby, C.K., Sonderby, S.K., Winther, O.: Auxiliary deep generative models. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1445–1453 (2016)

  30. Sonderby, C.K., Raiko, T., Maaloe, L., Sonderby, S.K., Winther, O.: Ladder variational autoencoders. In: Advances in Neural Information Processing Systems (NIPS), pp. 3738–3746 (2016)

  31. Hu, W., Miyato, T., Tokui, S., Matsumoto, E., Sugiyama, M.: Learning discrete representations via information maximizing self-augmented training. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1558–1567 (2017)

  32. Qi, G.-J., Zhang, L., Chen, C.W., Tian, Q.: Avt: Unsupervised learning of transformation equivariant representations by autoencoding variational transformations. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 8130–8139 (2019)

  33. Tschannen, M., Djolonga, J., Rubenstein, P.K., Gelly, S., Lucic, M.: On mutual information maximization for representation learning. arXiv preprint arXiv:1907.13625 (2019)

  34. Poole, B., Ozair, S., Van Den Oord, A., Alemi, A., Tucker, G.: On variational bounds of mutual information. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 5171–5180 (2019)

  35. Song, J., Ermon, S.: Understanding the limitations of variational mutual information estimators. arXiv preprint arXiv:1910.06222 (2019)

  36. Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018)

  37. Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. In: Advances in Neural Information Processing Systems (NIPS), pp. 15535–15545 (2019)

  38. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  39. Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 9865–9874 (2019)

  40. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)

  41. Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)

    Article  Google Scholar 

  42. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 648–656 (2015)

  43. Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)

    Article  Google Scholar 

  44. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)

  45. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)

  46. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  47. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883 (2016)

  48. Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 991–998 (2011)

Download references

Acknowledgements

This research was funded in part by the National Key Research and Development Program of China (2021YFB2800300), the Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) under Grant 2019JZZY010119, the National Natural Science Foundation of China under Grant No. 62001267, and the Future Plan for Young Scholars of Shandong University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lei Chen or Hongchao Zhou.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Liu, C., Chen, L. et al. Perturbation consistency and mutual information regularization for semi-supervised semantic segmentation. Multimedia Systems 29, 511–523 (2023). https://doi.org/10.1007/s00530-022-00931-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00931-9

Keywords

Navigation