Skip to main content

Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13846))

Included in the following conference series:

Abstract

Knowledge distillation (KD) is an effective and widely used technique of model compression which enables the deployment of deep networks in low-memory or fast-execution scenarios. Feature-based knowledge distillation is an important component of KD which leverages intermediate layers to supervise the training procedure of a student network. Nevertheless, the potential mismatch of intermediate layers may be counterproductive in the training procedure. In this paper, we propose a novel distillation framework, termed Decoupled Spatial Pyramid Pooling Knowledge Distillation, to distinguish the importance of regions in feature maps. Specifically, we reveal that (1) spatial pyramid pooling is an outstanding method to define the knowledge and (2) the lower activation regions in feature maps play a more important role in KD. Our experiments on CIFAR-100 and Tiny-ImageNet achieve state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Hints mean the output of a teacher’s hidden layers that supervise the student’s training.

References

  1. Ahn, S., Hu, S.X., Damianou, A.C., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 9163–9171 (2019). https://doi.org/10.1109/CVPR.2019.00938

  2. Anil, R., Pereyra, G., Passos, A., Ormándi, R., Dahl, G.E., Hinton, G.E.: Large scale distributed neural network training through online distillation. In: 6th International Conference on Learning Representations, ICLR (2018)

    Google Scholar 

  3. Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp. 2654–2662 (2014)

    Google Scholar 

  4. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50

    Article  Google Scholar 

  5. Buciluundefined, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541, New York, NY, USA (2006). https://doi.org/10.1145/1150402.1150464

  6. Chen, D., Mei, J., Wang, C., Feng, Y., Chen, C.: Online knowledge distillation with diverse peers. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, pp. 3430–3437 (2020)

    Google Scholar 

  7. Chen, D., et al.: Cross-layer distillation with semantic calibration. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI, pp. 7028–7036 (2021)

    Google Scholar 

  8. Chen, G., Choi, W., Yu, X., Han, T.X., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 742–751 (2017)

    Google Scholar 

  9. Cheng, X., Rao, Z., Chen, Y., Zhang, Q.: Explaining knowledge distillation by quantifying the knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12925–12935 (2020)

    Google Scholar 

  10. Courbariaux, M., Bengio, Y., David, J.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423

  12. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

    Google Scholar 

  13. Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vision 129(6), 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z

    Article  Google Scholar 

  14. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. In: 4th International Conference on Learning Representations, ICLR (2016)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23

    Chapter  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  17. Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI, pp. 3779–3787 (2019). https://doi.org/10.1609/aaai.v33i01.33013779

  18. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 38–39 (2015)

    Google Scholar 

  19. Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  20. Huang, Z., Wang, N.: Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)

  21. Jin, X., et al.: Knowledge distillation via route constrained optimization. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV, pp. 1345–1354 (2019). https://doi.org/10.1109/ICCV.2019.00143

  22. Kim, J., Park, S., Kwak, N.: Paraphrasing complex network: Network compression via factor transfer. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 2765–2774 (2018)

    Google Scholar 

  23. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases 1(4), 7 (2009)

    Google Scholar 

  24. Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N 7(7), 3 (2015)

    Google Scholar 

  25. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: 5th International Conference on Learning Representations, ICLR (2017)

    Google Scholar 

  26. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8

    Chapter  Google Scholar 

  27. Mirzadeh, S., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, pp. 5191–5198 (2020)

    Google Scholar 

  28. Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help? In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems. pp. 4696–4705 (2019)

    Google Scholar 

  29. Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3967–3976 (2019). https://doi.org/10.1109/CVPR.2019.00409

  30. Passalis, N., Tefas, A.: Learning deep representations with probabilistic knowledge transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 283–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_17

    Chapter  Google Scholar 

  31. Peng, B., Jin, X., Li, D., Zhou, S., Wu, Y., Liu, J., Zhang, Z., Liu, Y.: Correlation congruence for knowledge distillation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV, pp. 5006–5015 (2019). https://doi.org/10.1109/ICCV.2019.00511

  32. Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.E.: Regularizing neural networks by penalizing confident output distributions. In: 5th International Conference on Learning Representations, ICLR (2017)

    Google Scholar 

  33. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: Imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  34. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. In: 3rd International Conference on Learning Representations, ICLR (2015)

    Google Scholar 

  35. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474

  36. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR (2015)

    Google Scholar 

  37. Song, J., Chen, Y., Ye, J., Song, M.: Spot-adaptive knowledge distillation. IEEE Trans. Image Process. 31, 3359–3370 (2022)

    Article  Google Scholar 

  38. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: 8th International Conference on Learning Representations, ICLR (2020)

    Google Scholar 

  39. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV, pp. 1365–1374 (2019). https://doi.org/10.1109/ICCV.2019.00145

  40. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 7130–7138 (2017). https://doi.org/10.1109/CVPR.2017.754

  41. Yuan, L., Tay, F.E.H., Li, G., Wang, T., Feng, J.: Revisiting knowledge distillation via label smoothing regularization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3902–3910 (2020). https://doi.org/10.1109/CVPR42600.2020.00396

  42. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: 5th International Conference on Learning Representations, ICLR (2017)

    Google Scholar 

  43. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 6848–6856 (2018). https://doi.org/10.1109/CVPR.2018.00716

  44. Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4320–4328 (2018). https://doi.org/10.1109/CVPR.2018.00454

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, L., Gao, H. (2023). Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13846. Springer, Cham. https://doi.org/10.1007/978-3-031-26351-4_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26351-4_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26350-7

  • Online ISBN: 978-3-031-26351-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics