Multi-view knowledge distillation for efficient semantic segmentation

Wang, Chen; Zhong, Jiang; Dai, Qizhu; Qi, Yafei; Shi, Fengyuan; Fang, Bin; Li, Xue

doi:10.1007/s11554-023-01296-6

Multi-view knowledge distillation for efficient semantic segmentation

Original Research Paper
Published: 30 March 2023

Volume 20, article number 39, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Chen Wang ORCID: orcid.org/0000-0001-9780-0984¹,
Jiang Zhong^nAff1,
Qizhu Dai¹,
Yafei Qi²,
Fengyuan Shi³,
Bin Fang¹ &
…
Xue Li⁴

424 Accesses
2 Citations
Explore all metrics

Abstract

Current state-of-the-art semantic segmentation models achieve remarkable success in segmentation accuracy. However, the huge model size and computing cost restrict their applications on low-latency online systems or devices. Knowledge distillation has been one popular solution for compressing large-scale segmentation models, which train a small segmentation model from a large teacher model. However, one teacher model’s knowledge may be insufficiently diverse to train an accurate student model. Meanwhile, the student model may inherit bias from the teacher model. This paper proposes a multi-view knowledge distillation framework called MVKD for efficient semantic segmentation. MVKD could aggregate the multi-view knowledge from multiple teacher models and transfer the multi-view knowledge to the student model. In MVKD, we introduce one multi-view co-tuning strategy to acquire uniformity among the multi-view knowledge in features from different teachers. In addition, we propose a multi-view feature distillation loss and a multi-view output distillation loss to transfer the multi-view knowledge in the features and outputs from multiple teachers to the student. We evaluate the proposed MVKD on three benchmark datasets, Cityscapes, CamVid, and Pascal VOC 2012. Experimental results demonstrate the effectiveness of the proposed MVKD in compressing semantic segmentation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intra-class Feature Variation Distillation for Semantic Segmentation

MTED: multiple teachers ensemble distillation for compact semantic segmentation

Article 03 March 2023

Local structure consistency and pixel-correlation distillation for compact semantic segmentation

Article 08 July 2022

References

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 12, 2481–2495 (2017)
Article Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (2017)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3213–3223 (2016)
Dvornik, N., Mairal, J., Schmid, C.: Diversity with cooperation: Ensemble methods for few-shot classification. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 3722–3730. IEEE (2019). 10.1109/ICCV.2019.00382. https://doi.org/10.1109/ICCV.2019.00382
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comp. ViS. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3146–3154 (2019)
Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J., Ramabhadran, B.: Efficient Knowledge Distillation from an Ensemble of Teachers. In: Interspeech, pp. 3697–3701. ISCA (2017). https://doi.org/10.21437/Interspeech.2017-614
He, T., Shen, C., Tian, Z., Gong, D., Sun, C., Yan, Y.: Knowledge adaptation for efficient semantic segmentation. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 578–587 (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv: Comp. Res. Repository abs/1503.02531 (2015). arxiv.org/abs/1503.02531
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 603–612 (2019)
Jain, J., Singh, A., Orlov, N., Huang, Z., Li, J., Walton, S., Shi, H.: Semask: Semantically masked transformers for semantic segmentation. arXiv preprint arXiv:2112.12782 (2021)
Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 5168–5177 (2017). https://doi.org/10.1109/CVPR.2017.549
Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 2604–2613 (2019). https://doi.org/10.1109/CVPR.2019.00271
Park, S., Kwak, N.: Feature-level ensemble knowledge distillation for aggregating knowledge from multiple networks. In: Proceedings of the European Conference on Artificial Intelligence (ECAI), vol. 325, pp. 1411–1418. IOS Press (2020). 10.3233/FAIA200246. https://doi.org/10.3233/FAIA200246
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. arXiv: Comp. Res. Repository abs/1606.02147 (2016)
Sachin, M., Mohammad, R., Anat, C., Linda, S., Hannaneh, H.: Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proc. Eur. Conf. Comp. Vis., pp. 552–568 (2018)
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2016)
Article Google Scholar
Shen, Z., He, Z., Xue, X.: Meal: Multi-model ensemble via adversarial learning. In: Proc. AAAI Conf. Artificial Intell., pp. 4886–4893 (2019)
Shu, C., Liu, Y., Gao, J., Yan, Z., Shen, C.: Channel-wise knowledge distillation for dense prediction. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 5311–5320 (2021)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 5693–5703 (2019)
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: Proc. Int. Conf. Mach. Learn., pp. 6105–6114. PMLR (2019)
Wang, C., Zhong, J., Dai, Q., Li, R., Yu, Q., Fang, B.: Local structure consistency and pixel-correlation distillation for compact semantic segmentation. Applied Intelligence, 53(6), 6307–6323 (2022)
Wang, Y., Zhou, W., Jiang, T., Bai, X., Xu, Y.: Intra-class feature variation distillation for semantic segmentation. In: Proc. Eur. Conf. Comp. Vis., pp. 346–362. Springer (2020)
Wu, C., Wu, F., Huang, Y.: One teacher is enough? pre-trained language model distillation from multiple teachers. In: Proc. Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4408–4413 (2021). 10.18653/v1/2021.findings-acl.387. https://doi.org/10.18653/v1/2021.findings-acl.387
Wu, C., Wu, F., Qi, T., Huang, Y.: Unified and effective ensemble knowledge distillation. arXiv preprint arXiv:2204.00548 (2022)
Wu, M., Chiu, C., Wu, K.: Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. In: Pro. IEEE Conf. Acoustics, Speech and Signal Processing, pp. 2202–2206. IEEE (2019). 10.1109/ICASSP.2019.8682450. https://doi.org/10.1109/ICASSP.2019.8682450
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. In: Proc. Advances in Neural Inf. Process. Syst., pp. 12077–12090 (2021)
You, S., Xu, C., Xu, C., Tao, D.: Learning from multiple teacher networks. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1285–1294 (2017). 10.1145/3097983.3098135. https://doi.org/10.1145/3097983.3098135
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proc. Eur. Conf. Comp. Vis., pp. 325–341 (2018)
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Proc. Eur. Conf. Comp. Vis., pp. 173–190 (2020)
Yuan, Y., Wang, J.: Ocnet: Object context network for scene parsing. arXiv: Comp. Res. Repository abs/1809.00916 (2018). http://arxiv.org/abs/1809.00916
Zhang, X., Lu, S., Gong, H., Luo, Z., Liu, M.: Amln: adversarial-based mutual learning network for online knowledge distillation. In: Proc. Eur. Conf. Comp. Vis., pp. 158–173. Springer (2020)
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 4320–4328 (2018)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proc. Eur. Conf. Comp. Vis., pp. 405–420 (2018)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 2881–2890 (2017). https://doi.org/10.1109/CVPR.2017.660
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.: Conditional random fields as recurrent neural networks. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 1529–1537 (2015)
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H.S., Zhang, L.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 6881–6890 (2021). https://doi.org/10.1109/CVPR46437.2021.00681
Zhou, Z.H.: Ensemble learning. In: Machine learning, pp. 181–210. Springer (2021)

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (62176029 and 61876026), the National Key Research and Development Program of China (2017YFB1402401), the Key Research Program of Chongqing Science and Technology Bureau (cstc2020jscx-msxmX0149).

Author information

Jiang Zhong
Present address: School of Computer Science, Chongqing University, Chongqing, 400044, China

Authors and Affiliations

School of Computer Science, Chongqing University, Chongqing, 400044, China
Chen Wang, Qizhu Dai & Bin Fang
School of Computer Science and Engineering, Central South University, Changsha, 410083, China
Yafei Qi
School of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
Fengyuan Shi
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, 4072, Australia
Xue Li

Authors

Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qizhu Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yafei Qi
View author publications
You can also search for this author in PubMed Google Scholar
Fengyuan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Bin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xue Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, C., Zhong, J., Dai, Q. et al. Multi-view knowledge distillation for efficient semantic segmentation. J Real-Time Image Proc 20, 39 (2023). https://doi.org/10.1007/s11554-023-01296-6

Download citation

Received: 13 September 2022
Accepted: 03 March 2023
Published: 30 March 2023
DOI: https://doi.org/10.1007/s11554-023-01296-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view knowledge distillation for efficient semantic segmentation

Abstract

Access this article

Similar content being viewed by others

Intra-class Feature Variation Distillation for Semantic Segmentation

MTED: multiple teachers ensemble distillation for compact semantic segmentation

Local structure consistency and pixel-correlation distillation for compact semantic segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Multi-view knowledge distillation for efficient semantic segmentation

Abstract

Access this article

Similar content being viewed by others

Intra-class Feature Variation Distillation for Semantic Segmentation

MTED: multiple teachers ensemble distillation for compact semantic segmentation

Local structure consistency and pixel-correlation distillation for compact semantic segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation