Abstract
Image semantic segmentation is a meaningful task that requires both accuracy and efficiency in computer vision. At present, most current deep learning based semantic segmentation methods needs extensive computational resources, and knowledge distillation may reduce such a computational burden due to its model compression ability. In this paper, different from previous knowledge distillation methods that directly transfer the knowledge of the teacher network to the student network, we propose a novel intermediate feature refinement method for semantic segmentation based on adversarial learning, which reduces the error and redundant information contained in the teacher network in the process of knowledge distillation, enhances the correct information contained in the teacher network and transfers it to the student network. Then we improve the conventional discriminator in adversarial learning to help the student network align more correct intermediate features in the teacher network. Our method can make the feature distribution of the student network closer to that of the teacher network, and finally improve the segmentation performance of the student network. Finally, we conducted experiments on three popular benchmarks to verify the effectiveness of our proposed method, including Pascal VOC, Cityscapes and CamVid. Compared with the competitive baseline, our proposed method can improve the performance of the student network by up to 1.43% (the mIOU increases from 67.14% to 68.57% on the Cityscapes val set).
Similar content being viewed by others
References
Dai X, Yuan X, Wei X (2021) Tirnet: object detection in thermal infrared images for autonomous driving. Appl Intell 51(4):1–18
Wang K, Liu M (2021) Yolov3-mt: a yolov3 using multi-target tracking for vehicle visual detection. Applied Intelligence (3)
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision, pp 801–818
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Romera E, Alvarez JM, Bergasa LM, Arroyo R (2017) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Adv Neur Inform Process Syst 28:1135–1143
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights during propagations. Adv Neur Inform Process Syst 28:3123–3131
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision, pp 525–542. Springer
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. arXiv:1412.6550
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928
Michieli U, Zanuttigh P (2019) Incremental learning techniques for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0
Huang Z, Hao W, Wang X, Tao M, Huang J, Liu W, Hua X-S (2021) Half-real half-fake distillation for class-incremental semantic segmentation. arXiv:2104.00875
Gülçehre Ç, Bengio Y (2016) Knowledge matters: importance of prior information for optimization. J Mach Learn Res 17(1):226–257
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Lee C-Y, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570. PMLR
Liu Y, Shu C, Wang J, Shen C (2020) Structured knowledge distillation for dense prediction. IEEE Trans Pattern Anal Mach Intell, 1–1
Xie J, Shuai B, Hu J-F, Lin J, Zheng W-S (2018) Improving fast segmentation with teacher-student learning. arXiv:1810.08476
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neur Inform Process Syst 27:2672–2680
Wang Y, Ye H, Cao F (2022) A novel multi-discriminator deep network for image segmentation. Appl Intell 52(1):1092–1109
Shen K, Quan H, Han J, Wu M (2022) Uro-gan: an untrustworthy region optimization approach for adipose tissue segmentation based on adversarial learning. Appl Intell, 1–23
Tong H, Fang Z, Wei Z, Cai Q, Gao Y (2021) Sat-net: a side attention network for retinal image segmentation. Appl Intell 51(7):5146–5156
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Yuan Y, Wang J (2018) Ocnet: object context network for scene parsing. arXiv:1809.00916
Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19
Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation. arXiv:2005.10821
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision, pp 325–341
Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European conference on computer vision, pp 552–568
Michieli U, Zanuttigh P (2021) Continual semantic segmentation via repulsion-attraction of sparse and disentangled latent representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1114–1124
He T, Shen C, Tian Z, Gong D, Sun C, Yan Y (2019) Knowledge adaptation for efficient semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 578–587
Shu C, Liu Y, Gao J, Xu L, Shen C (2020) Channel-wise distillation for semantic segmentation. arXiv e-prints 2011
Wang Y, Zhou W, Jiang T, Bai X, Xu Y (2020) Intra-class feature variation distillation for semantic segmentation. In: European Conference on computer vision, pp 346–362. Springer
Wang H, Qin Z, Wan T (2018) Text generation based on generative adversarial nets with latent variables. In: Pacific-Asia conference on knowledge discovery and data mining, pp 92–103. Springer
Mirza M, Osindero S (2014) Conditional generative adversarial nets. Computer Science, 2672–2680
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp 694–711. Springer
Liu Y, Qin Z, Wan T, Luo Z (2018) Auto-painter: cartoon image generation from sketch by using conditional wasserstein generative adversarial networks. Neurocomputing 311:78–87
Luc P, Couprie C, Chintala S, Verbeek J (2016) Semantic segmentation using adversarial networks. arXiv:1611.08408
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5767–5777
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International conference on machine learning, pp 7354–7363. PMLR
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612
Shen Z, Zhang M, Zhao H, Yi S, Li H (2021) Efficient attention: attention with linear complexities. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3531–3539
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE/CVF International conference on computer vision workshop (ICCVW), pp 0–0
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 International conference on computer vision, pp 991–998. IEEE
Brostow GJ, Shotton J, Fauqueur J, Cipolla R (2008) Segmentation and recognition using structure from motion point clouds. In: European conference on computer vision, pp 44–57. Springer
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch 9
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Inverted residuals and linear bottlenecks: mobile networks for classification detection and segmentation
Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 593–602
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Acknowledgment
This work is supported by the National Key R&D Program of China (2020YFA0713503), the Project of Department of Science and Technology of Hunan Province (2020GK2036), the National Key R&D Program of China (2020YFA0713503), the Project of Shanghai Municipal Science and Technology Commission (19511120900), the National Natural Science Foundation of China (61773330) and Aeronautical Science Foundation of China (20200020114004).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, D., Yuan, Z., Ouyang, W. et al. Adversarial learning based intermediate feature refinement for semantic segmentation. Appl Intell 53, 14775–14791 (2023). https://doi.org/10.1007/s10489-022-04107-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04107-w