Consistent Knowledge Distillation Based on Siamese Networks

Tang, Jialiang; Yang, Xiaoyan; Cheng, Xin; Jiang, Ning; Yu, Wenxin; Zhang, Peng

doi:10.1007/978-3-030-92307-5_38

Jialiang Tang¹⁰,
Xiaoyan Yang¹⁰,
Xin Cheng¹⁰,
Ning Jiang¹⁰,
Wenxin Yu¹⁰ &
…
Peng Zhang¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1516))

Included in the following conference series:

International Conference on Neural Information Processing

2304 Accesses

Abstract

In model compression, knowledge distillation is most used technique that uses a large teacher network to transfer knowledge to a small student network to improve student network performance. However, most knowledge distillation algorithms only focus on exploring informative knowledge for transferring but ignore the consistency between the teacher network and the student network. In this paper, we propose a new knowledge distillation framework (SNKD) to calculate the consistency of the teacher network and the student network based on the siamese networks. The teacher network and student network features are input into the siamese networks to calculate the discrepancies between them based on the contrastive learning loss. Through minimizing the contrastive learning loss, the student network is promoted to consistent with the teacher network and obtain a ability close to the teacher. We have verify the efficiency of the SNKD by experiment on popular datasets. All SNKD trained student network models have reached ability similar or even better than teacher networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a siamese time delay neural network. Adv. Neural Inf. Process. Syst. 6, 737–744 (1993)
Google Scholar
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 215–223. JMLR Workshop and Conference Proceedings (2011)
Google Scholar
He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3779–3787 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 38–39 (2015)
Google Scholar
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Chapter Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Google Scholar
Peng, B., et al.: Correlation congruence for knowledge distillation (2019)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. Comput. Sci. (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Google Scholar
Wang, K., Gao, X., Zhao, Y., Li, X., Dou, D., Xu, C.Z.: Pay attention to features, transfer learn faster CNNs. In: International Conference on Learning Representations (2019)
Google Scholar
Xu, Z., Hsu, Y.C., Huang, J.: Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

Download references

Acknowledgement

This work was supported by the Mianyang Science and Technology Program 2020YFZJ016, SWUST Doctoral Foundation under Grant 19zx7102, 21zx7114, Sichuan Science and Technology Program under Grant 2020YFS0307.

Author information

Authors and Affiliations

School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, China
Jialiang Tang, Xiaoyan Yang, Xin Cheng, Ning Jiang & Wenxin Yu
School of Science, Southwest University of Science and Technology, Mianyang, China
Peng Zhang

Authors

Jialiang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Ning Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Jiang .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, J., Yang, X., Cheng, X., Jiang, N., Yu, W., Zhang, P. (2021). Consistent Knowledge Distillation Based on Siamese Networks. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-92307-5_38
Published: 02 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Consistent Knowledge Distillation Based on Siamese Networks