Skip to main content
Log in

MTED: multiple teachers ensemble distillation for compact semantic segmentation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Current state-of-the-art semantic segmentation models achieve great success. However, their vast model size and computational cost limit their applications in many real-time systems and mobile devices. Knowledge distillation is one promising solution to compress the segmentation models. However, the knowledge from a single teacher may be insufficient, and the student may also inherit bias from the teacher. This paper proposes a multi-teacher ensemble distillation framework named MTED for semantic segmentation. The key idea is to effectively transfer the comprehensive knowledge from multiple teachers to one student. We present one multi-teacher output-based distillation loss to effectively distill the valuable knowledge in output probabilities to the student. We construct one adaptive weight assignment module to dynamically assign different weights to different teachers at each pixel. In addition, we introduce one multi-teacher feature-based distillation loss to transfer the comprehensive knowledge in the feature maps efficiently. We conduct extensive experiments on three benchmark datasets, Cityscapes, CamVid, and Pascal VOC 2012. The results show that the proposed MTED performs much better than the single-teacher methods on three datasets, e.g., Cityscapes, CamVid, and Pascal VOC 2012.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Cityscapes: https://www.cityscapes-dataset.com/, CamVid: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/, Pascal VOC 2012:http://pjreddie.com/media/files/VOC2012test.

Code availability

We will public the code in the future.

References

  1. Adriana R, Nicolas B, Samira EK, Antoine C, Carlo G, Yoshua B (2015) Fitnets: Hints for thin deep nets. In: Proceedings of the international conference on learning representations (ICLR). http://arxiv.org/abs/1412.6550

  2. Amirkhani A, Khosravian A, Masih-Tehrani M, Kashiani H (2021) Robust semantic segmentation with multi-teacher knowledge distillation. IEEE Access 9:119049–119066

    Article  Google Scholar 

  3. An S, Liao Q, Lu Z, Xue J (2022) Efficient semantic segmentation via self-attention and self-distillation. IEEE Trans Intell Transp Syst 23(9):15256–15266. https://doi.org/10.1109/TITS.2021.3139001

    Article  Google Scholar 

  4. Bhardwaj R, Majumder N, Poria S (2021) Investigating gender bias in BERT. Cogn Comput 13(4):1008–1018. https://doi.org/10.1007/s12559-021-09881-2

    Article  Google Scholar 

  5. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  6. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  7. Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H, Chen L (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 12472–12482. Computer Vision Foundation / IEEE. 10.1109/CVPR42600.2020.01249

  8. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223

  9. Deng J, Pan Y, Yao T, Zhou W, Li H, Mei T (2019) Relation distillation networks for video object detection. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp 7022–7031. IEEE. 10.1109/ICCV.2019.00712

  10. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  11. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  12. Feng Y, Sun X, Diao W, Li J, Gao X (2021) Double similarity distillation for semantic image segmentation. IEEE Trans Image Process 30:5363–5376. https://doi.org/10.1109/TIP.2021.3083113

    Article  Google Scholar 

  13. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3146–3154

  14. Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural network. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Proceedings of the conference on neural information processing systems (NIPS), pp 1135–1143. https://proceedings.neurips.cc/paper/2015/hash/ae0eb3eed39d2bcef4622b2499a05fe6-Abstract.html

  15. Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 991–998. 10.1109/ICCV.2011.6126343

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) pp 770–778

  17. He T, Shen C, Tian Z, Gong D, Sun C, Yan Y (2019) Knowledge adaptation for efficient semantic segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp 578–587. Computer Vision Foundation / IEEE. 10.1109/CVPR.2019.00067

  18. Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. In: Proceedings of the conference on neural information processing systems workshops (NIPSW)

  19. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv: Comp. Res. Repository abs/1704.04861

  20. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) pp 2261–2269

  21. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 603–612

  22. Ji D, Wang H, Tao M, Huang J, Hua X, Lu H (2022) Structural and statistical texture knowledge distillation for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 16855–16864. 10.1109/CVPR52688.2022.01637

  23. Ju M, Luo J, Wang Z, Luo H (2021) Adaptive feature fusion with attention mechanism for multi-scale target detection. Neural Comput Appl 33(7):2769–2781

    Article  Google Scholar 

  24. Li H, Xiong P, Fan H, Sun J (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9522–9531. 10.1109/CVPR.2019.00975

  25. Lin G, Liu F, Milan A, Shen C, Reid I (2020) Refinenet: Multi-path refinement networks for dense prediction. IEEE Trans Pattern Anal Mach Intell 42(5):1228–1242. https://doi.org/10.1109/TPAMI.2019.2893630

    Article  Google Scholar 

  26. Liu R, Yang K, Liu H, Zhang J, Peng K, Stiefelhagen R (2022) Transformer-based knowledge distillation for efficient semantic segmentation of road-driving scenes. arXiv preprint arXiv:2202.13393

  27. Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured knowledge distillation for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2604–2613. 10.1109/CVPR.2019.00271

  28. Liu Y, Shu C, Wang J, Shen C (2020) Structured knowledge distillation for dense prediction. IEEE Trans Pattern Anal Mach Intell

  29. Park S, Kwak N (2020) Feature-level ensemble knowledge distillation for aggregating knowledge from multiple networks. In: Giacomo GD, Catalá A, Dilkina B, Milano M, Barro S, Bugarín A, Lang J (eds) Proceedings of the European conference on artificial intelligence (ECAI), Frontiers in Artificial Intelligence and Applications, vol 325, pp 1411–1418. IOS Press. 10.3233/FAIA200246

  30. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv: Comp. Res. Repository abs/1606.02147

  31. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: Leibe B, Matas J, Sebe N, Welling M (eds) Proceedings of the European conference on computer vision (ECCV), Lecture Notes in Computer Science, vol 9908, pp 525–542. Springer. 10.1007/978-3-319-46493-0_32

  32. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the medical image computing and computer-assisted intervention, pp 234–241

  33. Sachin M, Mohammad R, Anat C, Linda S, Hannaneh H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 552–568

  34. Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Article  Google Scholar 

  35. Shen Z, He Z, Xue X (2019) MEAL: multi-model ensemble via adversarial learning. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 4886–4893. 10.1609/aaai.v33i01.33014886

  36. Shu C, Liu Y, Gao J, Yan Z, Shen C (2021) Channel-wise knowledge distillation for dense prediction. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 5311–5320

  37. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5693–5703

  38. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the international conference on machine learning (ICML), pp 6105–6114. PMLR

  39. Tian Z, Chen P, Lai X, Jiang L, Liu S, Zhao H, Yu B, Yang MC, Jia J (2022) Adaptive perspective distillation for semantic segmentation. IEEE Trans Pattern Anal Mach Intell pp 1–1. 10.1109/TPAMI.2022.3159581

  40. Wang W, Wei F, Dong L, Bao H, Yang N, Zhou M (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: Proceedings of the conference on neural information processing systems (NIPS)

  41. Wang Y, Zhou W, Jiang T, Bai X, Xu Y (2020) Intra-class feature variation distillation for semantic segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 346–362. Springer

  42. Wu A, Zheng W, Guo X, Lai J (2019) Distilled person re-identification: towards a more scalable system. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1187–1196. 10.1109/CVPR.2019.00128. http://openaccess.thecvf.com/content_CVPR_2019/html/Wu_Distilled_Person_Re-Identification_Towards_a_More_Scalable_System_CVPR_2019_paper.html

  43. Wu C, Wu F, Huang Y (2021) One teacher is enough? pre-trained language model distillation from multiple teachers. In: Zong C, Xia F, Li W, Navigli R (eds) Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 4408–4413. 10.18653/v1/2021.findings-acl.387

  44. Wu M, Chiu C, Wu K (2019) Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 2202–2206. IEEE. 10.1109/ICASSP.2019.8682450

  45. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. In: Proceedings of the conference on neural information processing systems (NIPS), pp 12077–12090. https://proceedings.neurips.cc/paper/2021/hash/64f1f27bf1b4ec22924fd0acb550c235-Abstract.html

  46. Yang C, Zhou H, An Z, Jiang X, Xu Y, Zhang Q (2022) Cross-image relational knowledge distillation for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 12319–12328

  47. Yang K, Hu X, Fang Y, Wang K, Stiefelhagen R (2022) Omnisupervised omnidirectional semantic segmentation. IEEE Trans Intell Transp Syst 23(2):1184–1199. https://doi.org/10.1109/TITS.2020.3023331

    Article  Google Scholar 

  48. Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp 3684–3692. 10.1109/CVPR.2018.00388

  49. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341

  50. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of the international conference on learning representations (ICLR)

  51. Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 173–190

  52. Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of the international conference on learning representations (ICLR). https://openreview.net/forum?id=Sks9_ajex

  53. Zhang J, Yang K, Constantinescu A, Peng K, Müller K, Stiefelhagen R (2022) Trans4trans: Efficient transformer for transparent object and semantic scene segmentation in real-world navigation assistance. IEEE Trans Intell Transp Syst 23(10):19173–19186. https://doi.org/10.1109/TITS.2022.3161141

    Article  Google Scholar 

  54. Zhang W, Huang Z, Luo G, Chen T, Wang X, Liu W, Yu G, Shen C (2022) Topformer: Token pyramid transformer for mobile semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 12073–12083. IEEE. 10.1109/CVPR52688.2022.01177

  55. Zhang X, Du B, Wu Z, Wan T (2022) Laanet: lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput Appl 34(7):3573–3587

    Article  Google Scholar 

  56. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) pp 6848–6856

  57. Zhang Z, Zhou C, Tu Z (2022) Distilling inter-class distance for semantic segmentation. In: Raedt LD (ed) Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 1686–1692. ijcai.org. 10.24963/ijcai.2022/235

  58. Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420

  59. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890. 10.1109/CVPR.2017.660

  60. Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PHS, Zhang L (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6881–6890. 10.1109/CVPR46437.2021.00681

Download references

Acknowledgements

The authors thank Dingding Chen, Fengyuan Shi, and Ganghong Huang for their great helps with codes, suggestions, and discussion.

Funding

This work is partially supported by the National Natural Science Foundation of China (62176029 and 61876026), the Project supported by graduate scientific research and innovation foundation of Chongqing, China (CYS21061), the National Key Research and Development Program of China (2017YFB1402401), and the Key Research Program of Chongqing Science and Technology Bureau (cstc2020jscx-msxmX0149).

Author information

Authors and Affiliations

Authors

Contributions

CW contribute to conceptualization, methodology, software, and original draft writing. JZ contributes to conceptualization, supervision, resources, and funding acquisition. QD contributes to original draft writing and Investigation. QY contributes to conceptualization and methodology. YQ contributes to software and validation. BF contributes to conceptualization and review. XL contributes to investigation and review.

Corresponding author

Correspondence to Chen Wang.

Ethics declarations

Conflict of interest

No conflicts to declare.

Ethics approval

Not Applicable.

Consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Zhong, J., Dai, Q. et al. MTED: multiple teachers ensemble distillation for compact semantic segmentation. Neural Comput & Applic 35, 11789–11806 (2023). https://doi.org/10.1007/s00521-023-08321-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08321-6

Keywords

Navigation