Abstract
When applying deep neural networks (DNNs) in resource-constrained environments, Lightweight model techniques are often used to modify the model to be less computationally intensive. However, some of these techniques can struggle to balance maintaining their high accuracy against reducing processing costs. Lightweight branching models add early exits to a model for quicker inference, but these earlier exits can suffer a reduced level of accuracy because of the lack of learned knowledge in the preceding layers. We propose the use of a novel form of Knowledge Distillation to assist in the branched model’s training by distilling knowledge from the main classifier exit to the branch exits. This novel technique, entitled branching self-distillation, combines the student-teacher training of knowledge distillation with the effective efficiency optimisation of branching to create a novel lightweight optimisation technique that requires minimal additional training to achieve improvements in both accuracy and processing costs. We demonstrate the effectiveness of the technique and achieve an average of 1.86% increase in accuracy and a 38.02% further reduction in average processing flops per input upon a selection of well-known lightweight model architectures. (Source code and further experiment details can be found online at this work’s repository at https://github.com/SanityLacking/Self-Distillation).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baccarelli, E., Scarpiniti, M., Momenzadeh, A., Ahrabi, S.S.: Learning-in-the-fog (lifo): deep learning meets fog computing for the minimum-energy distributed early-exit of inference in delay-critical iot realms. IEEE Access 9, 25716–25757 (2021)
Fang, W., Xue, F., Ding, Y., Xiong, N., Leung, V.C.: Edgeke: an on-demand deep learning IoT system for cognitive big data on industrial edge devices. IEEE Trans. Ind. Inf. 17(9), 6144–6152 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1921–1930 (2019)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7) (2015)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50\(\times \) fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Li, D., Li, Y., Liu, Y.: Shoe print image retrieval based on dual knowledge distillation for public security internet of things. IEEE Internet Things J. 91, 18829–18838 (2022)
Nukavarapu, S.K., Ayyat, M., Nadeem, T.: ibranchy: an accelerated edge inference platform for lot devices. In: 2021 IEEE/ACM Symposium on Edge Computing (SEC), pp. 392–396 (2021). https://doi.org/10.1145/3453142.3493517
Panda, P., Sengupta, A., Roy, K.: Conditional deep learning for energy-efficient and enhanced pattern recognition. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 475–480. IEEE (2016)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Tanghatari, E., Kamal, M., Afzali-Kusha, A., Pedram, M.: Distributing DNN training over IoT edge devices based on transfer learning. Neurocomputing 467, 56–65 (2022)
Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469 (2016)
Teerapittayanon, S., McDanel, B., Kung, H.T.: Distributed deep neural networks over the cloud, the edge and end devices. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 328–339. IEEE (2017)
Wang, C., Yang, G., Papanastasiou, G., Zhang, H., Rodrigues, J.J., de Albuquerque, V.H.C.: Industrial cyber-physical systems-based cloud IoT edge for federated heterogeneous distillation. IEEE Trans. Ind. Inf. 17(8), 5511–5521 (2020)
Yun, S., Park, J., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13876–13885 (2020)
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Robertson, C., Le, D., Nguyen, T.T., Nguyen, Q.V.H., Jo, J. (2023). Lightweight Branching Self-distillation: Be Your Own Teacher. In: Jo, J., et al. Robot Intelligence Technology and Applications 7. RiTA 2022. Lecture Notes in Networks and Systems, vol 642. Springer, Cham. https://doi.org/10.1007/978-3-031-26889-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-26889-2_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26888-5
Online ISBN: 978-3-031-26889-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)