Skip to main content

Lightweight Branching Self-distillation: Be Your Own Teacher

  • Conference paper
  • First Online:
Robot Intelligence Technology and Applications 7 (RiTA 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 642))

  • 1170 Accesses

Abstract

When applying deep neural networks (DNNs) in resource-constrained environments, Lightweight model techniques are often used to modify the model to be less computationally intensive. However, some of these techniques can struggle to balance maintaining their high accuracy against reducing processing costs. Lightweight branching models add early exits to a model for quicker inference, but these earlier exits can suffer a reduced level of accuracy because of the lack of learned knowledge in the preceding layers. We propose the use of a novel form of Knowledge Distillation to assist in the branched model’s training by distilling knowledge from the main classifier exit to the branch exits. This novel technique, entitled branching self-distillation, combines the student-teacher training of knowledge distillation with the effective efficiency optimisation of branching to create a novel lightweight optimisation technique that requires minimal additional training to achieve improvements in both accuracy and processing costs. We demonstrate the effectiveness of the technique and achieve an average of 1.86% increase in accuracy and a 38.02% further reduction in average processing flops per input upon a selection of well-known lightweight model architectures. (Source code and further experiment details can be found online at this work’s repository at https://github.com/SanityLacking/Self-Distillation).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baccarelli, E., Scarpiniti, M., Momenzadeh, A., Ahrabi, S.S.: Learning-in-the-fog (lifo): deep learning meets fog computing for the minimum-energy distributed early-exit of inference in delay-critical iot realms. IEEE Access 9, 25716–25757 (2021)

    Article  Google Scholar 

  2. Fang, W., Xue, F., Ding, Y., Xiong, N., Leung, V.C.: Edgeke: an on-demand deep learning IoT system for cognitive big data on industrial edge devices. IEEE Trans. Ind. Inf. 17(9), 6144–6152 (2020)

    Article  Google Scholar 

  3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  4. Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1921–1930 (2019)

    Google Scholar 

  5. Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7) (2015)

  6. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  7. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50\(\times \) fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)

  8. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  9. Li, D., Li, Y., Liu, Y.: Shoe print image retrieval based on dual knowledge distillation for public security internet of things. IEEE Internet Things J. 91, 18829–18838 (2022)

    Article  Google Scholar 

  10. Nukavarapu, S.K., Ayyat, M., Nadeem, T.: ibranchy: an accelerated edge inference platform for lot devices. In: 2021 IEEE/ACM Symposium on Edge Computing (SEC), pp. 392–396 (2021). https://doi.org/10.1145/3453142.3493517

  11. Panda, P., Sengupta, A., Roy, K.: Conditional deep learning for energy-efficient and enhanced pattern recognition. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 475–480. IEEE (2016)

    Google Scholar 

  12. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)

    Google Scholar 

  13. Tanghatari, E., Kamal, M., Afzali-Kusha, A., Pedram, M.: Distributing DNN training over IoT edge devices based on transfer learning. Neurocomputing 467, 56–65 (2022)

    Article  Google Scholar 

  14. Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469 (2016)

    Google Scholar 

  15. Teerapittayanon, S., McDanel, B., Kung, H.T.: Distributed deep neural networks over the cloud, the edge and end devices. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 328–339. IEEE (2017)

    Google Scholar 

  16. Wang, C., Yang, G., Papanastasiou, G., Zhang, H., Rodrigues, J.J., de Albuquerque, V.H.C.: Industrial cyber-physical systems-based cloud IoT edge for federated heterogeneous distillation. IEEE Trans. Ind. Inf. 17(8), 5511–5521 (2020)

    Article  Google Scholar 

  17. Yun, S., Park, J., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13876–13885 (2020)

    Google Scholar 

  18. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cailen Robertson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Robertson, C., Le, D., Nguyen, T.T., Nguyen, Q.V.H., Jo, J. (2023). Lightweight Branching Self-distillation: Be Your Own Teacher. In: Jo, J., et al. Robot Intelligence Technology and Applications 7. RiTA 2022. Lecture Notes in Networks and Systems, vol 642. Springer, Cham. https://doi.org/10.1007/978-3-031-26889-2_24

Download citation

Publish with us

Policies and ethics