Local structure consistency and pixel-correlation distillation for compact semantic segmentation

Wang, Chen; Zhong, Jiang; Dai, Qizhu; Li, Rongzhen; Yu, Qien; Fang, Bin

doi:10.1007/s10489-022-03656-4

Local structure consistency and pixel-correlation distillation for compact semantic segmentation

Published: 08 July 2022

Volume 53, pages 6307–6323, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chen Wang ORCID: orcid.org/0000-0001-9780-0984¹,
Jiang Zhong¹,
Qizhu Dai¹,
Rongzhen Li¹,
Qien Yu² &
…
Bin Fang¹

537 Accesses
3 Citations
Explore all metrics

Abstract

Current state-of-the-art semantic segmentation methods usually contain millions of parameters and require high computational resources, which limit their applications in the low resources cases. Knowledge distillation is one promising way to achieve a good trade-off between performance and efficiency. In this paper, we propose a novel local structure consistency distillation (LSCD) to improve the segmentation accuracy of compact networks. Different from previous works mainly transferring the pixel-level and image-level knowledge, we propose to transfer the patch-level knowledge. Specially, we propose the local structure consistency as the patch-level knowledge, which integrate the structural similarity index measure into our framework to provide some local structural constrains between the outputs of teacher and the student. Furthermore, we propose the pixel-correlation distillation to capture the contextual dependencies between any two pixels of the feature maps in a global view. Distilling such pixel correlations from the teacher to the student could help the student mimic the teacher better in terms of contextual dependencies, and thus improve the segmentation accuracy. To validate the effectiveness of the proposed approach, extensive experiments have been conducted on three widely adopted benchmarks: Cityscapes, CamVid, and Pascal VOC 2012. Experimental results show that the proposed approach could consistently improve state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adversarial learning based intermediate feature refinement for semantic segmentation

Article 03 November 2022

Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation

Article 12 March 2023

CGNet: cross-guidance network for semantic segmentation

Article 16 January 2020

Notes

The FLOPs is calculated with the pytorch version implementation.

References

Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation. TPAMI 39(4):640–651
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI 40(4):834–848
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Cvpr, pp 2881–2890
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp 3684–3692
Lin G, Liu F, Milan A, Shen C, Reid I (2020) Refinenet: Multi-path refinement networks for dense prediction. TPAMI 42(5):1228–1242. https://doi.org/10.1109/TPAMI.2019.2893630
Google Scholar
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Cvpr, pp 3146–3154
Huang Z, Wang X, Wei Y, Huang L, Shi H, Liu W, Huang T S (2020) Ccnet: Criss-cross attention for semantic segmentation. https://doi.org/10.1109/TPAMI.2020.3007032
Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H, Chen L-C (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, pp 12472–12482
Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: Eccv, pp 173–190
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Eccv, pp 405–420
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. TPAMI 12:2481–2495
Article Google Scholar
Sachin M, Mohammad R, Anat C, Linda S, Hannaneh H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Eccv, pp 552–568
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Eccv, pp 325–341
Li H, Xiong P, Fan H, Sun J (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation. In: CVPR, pp 9522–9531
Wang J, Xiong H, Wang H, Nian X (2020) Adscnet: asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50(4):1045–1056
Article Google Scholar
Wu Y, Jiang J, Huang Z, Tian Y (2021) Fpanet: Feature pyramid aggregation network for real-time semantic segmentation
Hu X, Jing L, Sehar U (2021) Joint pyramid attention network for real-time semantic segmentation of urban scenes
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Cvpr, pp 4510–4520
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Icml, PMLR, pp 6105–6114
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition
Feng Y, Sun X, Diao W, Li J, Gao X (2021) Double similarity distillation for semantic image segmentation. TIP 30:5363–5376. https://doi.org/10.1109/TIP.2021.3083113
Article Google Scholar
Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. In: NIPSW
Ba J, Caruana R (2014) Do deep nets really need to be deep?. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2654–2662
Xu G, Liu Z, Li X, Loy CC (2020) Knowledge distillation meets self-supervision. In: Eccv, pp 588–604
Zhang Z, Zhang H, Arik SO, Lee H, Pfister T (2020) Distilling effective supervision from severe label noise. In: Cvpr, pp 9294–9303
Deng J, Pan Y, Yao T, Zhou W, Li H, Mei T (2019) Relation distillation networks for video object detection. In: ICCV. https://doi.org/10.1109/ICCV.2019.00712. IEEE, pp 7022–7031
Dong N, Zhang Y, Ding M, Xu S, Bai Y (2021) One-stage object detection knowledge distillation via adversarial learning
Huang Y, Shen P, Tai Y, Li S, Liu X, Li J, Huang F, Ji R (2020) Improving face recognition from hard samplesvia distribution distillation loss. In: Eccv, pp 138–154
Niu J-Y, Xie Z-H, Li Y, Cheng S-J, Fan J-W (2021) Scale fusion light cnn for hyperspectral face recognition with knowledge distillation and attention mechanism
Zhou Y, Li R, Sun Y, Dong K, Li S (2022) Knowledge self-distillation for visible-infrared cross-modality person re-identification
Wang W, Wei F, Dong L, Bao H, Yang N, Zhou M (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: NIPS
Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured knowledge distillation for semantic segmentation. In: Cvpr, pp 2604–2613
He T, Shen C, Tian Z, Gong D, Sun C, Yan Y (2019) Knowledge adaptation for efficient semantic segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, long beach, ca, usa, june 16-20, 2019. Computer Vision Foundation / IEEE, pp 578–587
Wang Y, Zhou W, Jiang T, Bai X, Xu Y (2020) Intra-class feature variation distillation for semantic segmentation. In: Eccv, Springer, pp 346–362
Li SZ (2009) Markov random field modeling in image analysis, Advances in Pattern Recognition, Springer Science & Business Media. https://doi.org/10.1007/978-1-84800-279-1
Qin X, Zhang Z, Huang C, Gao C, Dehghan M, Jagersand M (2019) Basnet: Boundary-aware salient object detection. In: CVPR, pp 7479–7489
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Cvpr, pp 3213–3223
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. IJCV 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV, pp 1520–1528
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Proc. Medical Image Computing and Computer-Assisted Intervention, pp 234–241
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. TPAMI 43(10):3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686
Article Google Scholar
Yu F, Koltun V (May 2016) Multi-scale context aggregation by dilated convolutions
Li Q, Jin S, Yan J (2017) Mimicking very efficient network for object detection. In: Cvpr, pp 7341–7349
Zagoruyko S, Komodakis N (2017) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In: Iclr. https://openreview.net/forum?id=Sks9_ajex
Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding. In: Eccv, pp 418–434
Liu Y, Shu C, Wang J, Shen C (2020) Structured knowledge distillation for dense prediction
Wang Y, Ye H, Cao F (2022) A novel multi-discriminator deep network for image segmentation. Appl Intell 52(1):1092–1109
Article Google Scholar
Adriana R, Nicolas B, Samira EK, Antoine C, Carlo G, Yoshua B (2015) Fitnets: Hints for thin deep nets. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 1412.6550
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. TIP 13(4):600–612
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Nips, pp 5998–6008
Xie J, Shuai B, Hu J, Lin J, Zheng W (2018) Improving fast segmentation with teacher-student learning. In: Bmvc
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: Iccv. https://doi.org/10.1109/ICCV.2011.6126343, pp 991–998
Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings
Eduardo R, José M, Luis MB, Roberto A (2017) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Google Scholar
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Cvpr, pp 5168–5177
Drozdzal SJM, Vazquez D, Bengio ARY (2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In: Cvprw. https://doi.org/10.1109/CVPRW.2017.156. IEEE Computer Society, pp 1175–1183
Chandra S, Couprie C, Kokkinos I (2018) Deep spatio-temporal random fields for efficient video segmentation. In: CVPR
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Iccv, pp 603–612
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr Philip HS (2015) Conditional random fields as recurrent neural networks. In: Iccv, pp 1529–1537
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. In: Cvpr
Zhang Z, Zhang X, Peng C, Xue X, Sun J (2018) Exfuse: Enhancing feature fusion for semantic segmentation. In: Eccv, pp 269–284

Download references

Acknowledgements

Authors would like to thanks Dingding Chen, Yupin Yang, Ganghong Huang and Yafei Qi for their helps on the codes and discussion. This research was partially supported by the the National Natural Science Foundation of China (62176029 and 61876026), the National Key Research and Development Program of China (2017YFB1402400 and 2017YFB1402401), the Key Research Program of Chongqing Science and Technology Bureau (cstc2020jscx-msxmX0149, cstc2019jscx-mbdxX0012, and cstc2019jscx-fxyd0142).

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, China, 400044
Chen Wang, Jiang Zhong, Qizhu Dai, Rongzhen Li & Bin Fang
College of Computer Science, Sichuan University, Chengdu, Sichuan, 610065, China
Qien Yu

Authors

Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qizhu Dai
View author publications
You can also search for this author in PubMed Google Scholar
Rongzhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Qien Yu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Zhong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Zhong, J., Dai, Q. et al. Local structure consistency and pixel-correlation distillation for compact semantic segmentation. Appl Intell 53, 6307–6323 (2023). https://doi.org/10.1007/s10489-022-03656-4

Download citation

Accepted: 18 April 2022
Published: 08 July 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10489-022-03656-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local structure consistency and pixel-correlation distillation for compact semantic segmentation

Abstract

Access this article

Similar content being viewed by others

Adversarial learning based intermediate feature refinement for semantic segmentation

Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation

CGNet: cross-guidance network for semantic segmentation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Local structure consistency and pixel-correlation distillation for compact semantic segmentation

Abstract

Access this article

Similar content being viewed by others

Adversarial learning based intermediate feature refinement for semantic segmentation

Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation

CGNet: cross-guidance network for semantic segmentation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation