Abstract
Current state-of-the-art semantic segmentation methods usually contain millions of parameters and require high computational resources, which limit their applications in the low resources cases. Knowledge distillation is one promising way to achieve a good trade-off between performance and efficiency. In this paper, we propose a novel local structure consistency distillation (LSCD) to improve the segmentation accuracy of compact networks. Different from previous works mainly transferring the pixel-level and image-level knowledge, we propose to transfer the patch-level knowledge. Specially, we propose the local structure consistency as the patch-level knowledge, which integrate the structural similarity index measure into our framework to provide some local structural constrains between the outputs of teacher and the student. Furthermore, we propose the pixel-correlation distillation to capture the contextual dependencies between any two pixels of the feature maps in a global view. Distilling such pixel correlations from the teacher to the student could help the student mimic the teacher better in terms of contextual dependencies, and thus improve the segmentation accuracy. To validate the effectiveness of the proposed approach, extensive experiments have been conducted on three widely adopted benchmarks: Cityscapes, CamVid, and Pascal VOC 2012. Experimental results show that the proposed approach could consistently improve state-of-the-art methods.
Similar content being viewed by others
Notes
The FLOPs is calculated with the pytorch version implementation.
References
Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation. TPAMI 39(4):640–651
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI 40(4):834–848
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Cvpr, pp 2881–2890
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp 3684–3692
Lin G, Liu F, Milan A, Shen C, Reid I (2020) Refinenet: Multi-path refinement networks for dense prediction. TPAMI 42(5):1228–1242. https://doi.org/10.1109/TPAMI.2019.2893630
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Cvpr, pp 3146–3154
Huang Z, Wang X, Wei Y, Huang L, Shi H, Liu W, Huang T S (2020) Ccnet: Criss-cross attention for semantic segmentation. https://doi.org/10.1109/TPAMI.2020.3007032
Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H, Chen L-C (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, pp 12472–12482
Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: Eccv, pp 173–190
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Eccv, pp 405–420
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. TPAMI 12:2481–2495
Sachin M, Mohammad R, Anat C, Linda S, Hannaneh H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Eccv, pp 552–568
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Eccv, pp 325–341
Li H, Xiong P, Fan H, Sun J (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation. In: CVPR, pp 9522–9531
Wang J, Xiong H, Wang H, Nian X (2020) Adscnet: asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50(4):1045–1056
Wu Y, Jiang J, Huang Z, Tian Y (2021) Fpanet: Feature pyramid aggregation network for real-time semantic segmentation
Hu X, Jing L, Sehar U (2021) Joint pyramid attention network for real-time semantic segmentation of urban scenes
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Cvpr, pp 4510–4520
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Icml, PMLR, pp 6105–6114
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition
Feng Y, Sun X, Diao W, Li J, Gao X (2021) Double similarity distillation for semantic image segmentation. TIP 30:5363–5376. https://doi.org/10.1109/TIP.2021.3083113
Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. In: NIPSW
Ba J, Caruana R (2014) Do deep nets really need to be deep?. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2654–2662
Xu G, Liu Z, Li X, Loy CC (2020) Knowledge distillation meets self-supervision. In: Eccv, pp 588–604
Zhang Z, Zhang H, Arik SO, Lee H, Pfister T (2020) Distilling effective supervision from severe label noise. In: Cvpr, pp 9294–9303
Deng J, Pan Y, Yao T, Zhou W, Li H, Mei T (2019) Relation distillation networks for video object detection. In: ICCV. https://doi.org/10.1109/ICCV.2019.00712. IEEE, pp 7022–7031
Dong N, Zhang Y, Ding M, Xu S, Bai Y (2021) One-stage object detection knowledge distillation via adversarial learning
Huang Y, Shen P, Tai Y, Li S, Liu X, Li J, Huang F, Ji R (2020) Improving face recognition from hard samplesvia distribution distillation loss. In: Eccv, pp 138–154
Niu J-Y, Xie Z-H, Li Y, Cheng S-J, Fan J-W (2021) Scale fusion light cnn for hyperspectral face recognition with knowledge distillation and attention mechanism
Zhou Y, Li R, Sun Y, Dong K, Li S (2022) Knowledge self-distillation for visible-infrared cross-modality person re-identification
Wang W, Wei F, Dong L, Bao H, Yang N, Zhou M (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: NIPS
Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured knowledge distillation for semantic segmentation. In: Cvpr, pp 2604–2613
He T, Shen C, Tian Z, Gong D, Sun C, Yan Y (2019) Knowledge adaptation for efficient semantic segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, long beach, ca, usa, june 16-20, 2019. Computer Vision Foundation / IEEE, pp 578–587
Wang Y, Zhou W, Jiang T, Bai X, Xu Y (2020) Intra-class feature variation distillation for semantic segmentation. In: Eccv, Springer, pp 346–362
Li SZ (2009) Markov random field modeling in image analysis, Advances in Pattern Recognition, Springer Science & Business Media. https://doi.org/10.1007/978-1-84800-279-1
Qin X, Zhang Z, Huang C, Gao C, Dehghan M, Jagersand M (2019) Basnet: Boundary-aware salient object detection. In: CVPR, pp 7479–7489
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Cvpr, pp 3213–3223
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. IJCV 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV, pp 1520–1528
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Proc. Medical Image Computing and Computer-Assisted Intervention, pp 234–241
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. TPAMI 43(10):3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686
Yu F, Koltun V (May 2016) Multi-scale context aggregation by dilated convolutions
Li Q, Jin S, Yan J (2017) Mimicking very efficient network for object detection. In: Cvpr, pp 7341–7349
Zagoruyko S, Komodakis N (2017) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In: Iclr. https://openreview.net/forum?id=Sks9_ajex
Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding. In: Eccv, pp 418–434
Liu Y, Shu C, Wang J, Shen C (2020) Structured knowledge distillation for dense prediction
Wang Y, Ye H, Cao F (2022) A novel multi-discriminator deep network for image segmentation. Appl Intell 52(1):1092–1109
Adriana R, Nicolas B, Samira EK, Antoine C, Carlo G, Yoshua B (2015) Fitnets: Hints for thin deep nets. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 1412.6550
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. TIP 13(4):600–612
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Nips, pp 5998–6008
Xie J, Shuai B, Hu J, Lin J, Zheng W (2018) Improving fast segmentation with teacher-student learning. In: Bmvc
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: Iccv. https://doi.org/10.1109/ICCV.2011.6126343, pp 991–998
Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings
Eduardo R, José M, Luis MB, Roberto A (2017) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Cvpr, pp 5168–5177
Drozdzal SJM, Vazquez D, Bengio ARY (2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In: Cvprw. https://doi.org/10.1109/CVPRW.2017.156. IEEE Computer Society, pp 1175–1183
Chandra S, Couprie C, Kokkinos I (2018) Deep spatio-temporal random fields for efficient video segmentation. In: CVPR
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Iccv, pp 603–612
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr Philip HS (2015) Conditional random fields as recurrent neural networks. In: Iccv, pp 1529–1537
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. In: Cvpr
Zhang Z, Zhang X, Peng C, Xue X, Sun J (2018) Exfuse: Enhancing feature fusion for semantic segmentation. In: Eccv, pp 269–284
Acknowledgements
Authors would like to thanks Dingding Chen, Yupin Yang, Ganghong Huang and Yafei Qi for their helps on the codes and discussion. This research was partially supported by the the National Natural Science Foundation of China (62176029 and 61876026), the National Key Research and Development Program of China (2017YFB1402400 and 2017YFB1402401), the Key Research Program of Chongqing Science and Technology Bureau (cstc2020jscx-msxmX0149, cstc2019jscx-mbdxX0012, and cstc2019jscx-fxyd0142).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, C., Zhong, J., Dai, Q. et al. Local structure consistency and pixel-correlation distillation for compact semantic segmentation. Appl Intell 53, 6307–6323 (2023). https://doi.org/10.1007/s10489-022-03656-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03656-4