Lightweight Branching Self-distillation: Be Your Own Teacher

Robertson, Cailen; Le, Duong; Nguyen, Thanh Tam; Nguyen, Quoc Viet Hung; Jo, Jun

doi:10.1007/978-3-031-26889-2_24

Cailen Robertson¹⁶,
Duong Le¹⁶,
Thanh Tam Nguyen¹⁶,
Quoc Viet Hung Nguyen¹⁶ &
…
Jun Jo¹⁶

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 642))

Included in the following conference series:

International Conference on Robot Intelligence Technology and Applications

1170 Accesses

Abstract

When applying deep neural networks (DNNs) in resource-constrained environments, Lightweight model techniques are often used to modify the model to be less computationally intensive. However, some of these techniques can struggle to balance maintaining their high accuracy against reducing processing costs. Lightweight branching models add early exits to a model for quicker inference, but these earlier exits can suffer a reduced level of accuracy because of the lack of learned knowledge in the preceding layers. We propose the use of a novel form of Knowledge Distillation to assist in the branched model’s training by distilling knowledge from the main classifier exit to the branch exits. This novel technique, entitled branching self-distillation, combines the student-teacher training of knowledge distillation with the effective efficiency optimisation of branching to create a novel lightweight optimisation technique that requires minimal additional training to achieve improvements in both accuracy and processing costs. We demonstrate the effectiveness of the technique and achieve an average of 1.86% increase in accuracy and a 38.02% further reduction in average processing flops per input upon a selection of well-known lightweight model architectures. (Source code and further experiment details can be found online at this work’s repository at https://github.com/SanityLacking/Self-Distillation).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DistPro: Searching a Fast Knowledge Distillation Process via Meta Optimization

Performance Characterization of Supervision on Knowledge Distillation

PDD: Pruning Neural Networks During Knowledge Distillation

Article 31 August 2024

References

Baccarelli, E., Scarpiniti, M., Momenzadeh, A., Ahrabi, S.S.: Learning-in-the-fog (lifo): deep learning meets fog computing for the minimum-energy distributed early-exit of inference in delay-critical iot realms. IEEE Access 9, 25716–25757 (2021)
Article Google Scholar
Fang, W., Xue, F., Ding, Y., Xiong, N., Leung, V.C.: Edgeke: an on-demand deep learning IoT system for cognitive big data on industrial edge devices. IEEE Trans. Ind. Inf. 17(9), 6144–6152 (2020)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1921–1930 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7) (2015)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50$\times $ fewer parameters and $<$ 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Li, D., Li, Y., Liu, Y.: Shoe print image retrieval based on dual knowledge distillation for public security internet of things. IEEE Internet Things J. 91, 18829–18838 (2022)
Article Google Scholar
Nukavarapu, S.K., Ayyat, M., Nadeem, T.: ibranchy: an accelerated edge inference platform for lot devices. In: 2021 IEEE/ACM Symposium on Edge Computing (SEC), pp. 392–396 (2021). https://doi.org/10.1145/3453142.3493517
Panda, P., Sengupta, A., Roy, K.: Conditional deep learning for energy-efficient and enhanced pattern recognition. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 475–480. IEEE (2016)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Tanghatari, E., Kamal, M., Afzali-Kusha, A., Pedram, M.: Distributing DNN training over IoT edge devices based on transfer learning. Neurocomputing 467, 56–65 (2022)
Article Google Scholar
Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469 (2016)
Google Scholar
Teerapittayanon, S., McDanel, B., Kung, H.T.: Distributed deep neural networks over the cloud, the edge and end devices. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 328–339. IEEE (2017)
Google Scholar
Wang, C., Yang, G., Papanastasiou, G., Zhang, H., Rodrigues, J.J., de Albuquerque, V.H.C.: Industrial cyber-physical systems-based cloud IoT edge for federated heterogeneous distillation. IEEE Trans. Ind. Inf. 17(8), 5511–5521 (2020)
Article Google Scholar
Yun, S., Park, J., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13876–13885 (2020)
Google Scholar
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Communication and Technology, Griffith University, Queensland, QLD, Australia
Cailen Robertson, Duong Le, Thanh Tam Nguyen, Quoc Viet Hung Nguyen & Jun Jo

Authors

Cailen Robertson
View author publications
You can also search for this author in PubMed Google Scholar
Duong Le
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Tam Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Viet Hung Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Jo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cailen Robertson .

Editor information

Editors and Affiliations

School of Information and Communication Technology, Griffith University, Southport, Australia
Jun Jo
Department of Aerospace Engineering, KAIST, Daejeon, Korea (Republic of)
Han-Lim Choi
School of Information and Communication Technology, Griffith University, Southport, Australia
Marde Helbig
Department of Mechanical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea (Republic of)
Hyondong Oh
Department of Mechanical Engineering, KAIST, Daejeon, Korea (Republic of)
Jemin Hwangbo
Department of Mechanical Engineering, KAIST, Daejeon, Korea (Republic of)
Chang-Hun Lee
School of Information and Communication Technology, Griffith University, Southport, Australia
Bela Stantic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Robertson, C., Le, D., Nguyen, T.T., Nguyen, Q.V.H., Jo, J. (2023). Lightweight Branching Self-distillation: Be Your Own Teacher. In: Jo, J., et al. Robot Intelligence Technology and Applications 7. RiTA 2022. Lecture Notes in Networks and Systems, vol 642. Springer, Cham. https://doi.org/10.1007/978-3-031-26889-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-26889-2_24
Published: 01 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26888-5
Online ISBN: 978-3-031-26889-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics