Data-Free Knowledge Distillation with Positive-Unlabeled Learning

Tang, Jialiang; Yang, Xiaoyan; Cheng, Xin; Jiang, Ning; Yu, Wenxin; Zhang, Peng

doi:10.1007/978-3-030-92270-2_27

Jialiang Tang¹³,
Xiaoyan Yang¹³,
Xin Cheng¹³,
Ning Jiang¹³,
Wenxin Yu¹³ &
…
Peng Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13109))

Included in the following conference series:

International Conference on Neural Information Processing

1700 Accesses

Abstract

In model compression, knowledge distillation is a popular algorithm, which trains a lightweight network (student) by learning the knowledge from a pre-trained complicated network (teacher). It is essential to acquire the training data that the teacher used since the knowledge is obtained by inputting training data to the teacher network. However, the data is often unavailable due to privacy problems or storage costs. Its lead exiting data-driven knowledge distillation methods is unable to apply to the real world. To solve these problems, in this paper, we propose a data-free knowledge distillation method called DFPU, which introduce positive-unlabeled (PU) learning. For training a compact neural network without data, a generator is introduced to generate pseudo data under the supervision of the teacher network. By feeding the generated data into the teacher network and student network, the attention features are extracted for knowledge transfer. The student network is promoted to produce more similar features to the teacher network by PU learning. Without any data, the efficient student network trained by DFPU contains only half parameters and calculations of the teacher network and achieves an accuracy similar to the teacher network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587. IEEE Computer Society (2014)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2604–2613 (2019)
Google Scholar
Ba, L.J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp. 2654–2662 (2013)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 38–39 (2015)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. Comput. Sci. (2014)
Google Scholar
Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: ICML, Series Proceedings of Machine Learning Research, vol. 70. PMLR, pp. 527–536 (2017)
Google Scholar
Huang, Z., Wang, N.: Data-driven sparse structure selection for deep neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 317–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_19
Chapter Google Scholar
Xu, S., Ren, X., Ma, S., Wang, H.: meProp: Sparsified back propagation for accelerated deep learning with reduced overfitting. In: ICML 2017 (2017)
Google Scholar
Lopes, R.G., Fenu, S., Starner, T.: Data-free knowledge distillation for deep neural networks (2017)
Google Scholar
Liu, Z., et al.: MetaPruning: meta learning for automatic neural network channel pruning. arXiv preprint arXiv:1903.10258 (2019)
Chen, H.: Data-free learning of student networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3514–3522 (2019)
Google Scholar
Goodfellow, I.: Nips 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016)
Odena, A.: Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583 (2016)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks (2018)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Yoo, J., Cho, M., Kim, T., Kang, U.: Knowledge extraction with no observable data (2019)
Google Scholar
Nayak, G.K., Mopuri, K.R., Shaj, V., Radhakrishnan, V.B., Chakraborty, A.: Zero-shot knowledge distillation in deep networks. In: International Conference on Machine Learning. PMLR, pp. 4743–4751 (2019)
Google Scholar
Wang, Z.: Data-free knowledge distillation with soft targeted transfer set synthesis. arXiv preprint arXiv:2104.04868 (2021)
Xu, Y., Xu, C., Xu, C., Tao, D.: Multi-positive and unlabeled learning. In: IJCAI, pp. 3182–3188 (2017)
Google Scholar
Kiryo, R., Niu, G., Plessis, M.C.D., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator. arXiv preprint arXiv:1703.00593 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Xi, C., Yan, D., Houthooft, R., Schulman, J., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Guo, T.: On positive-unlabeled classification in GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8385–8393 (2020)
Google Scholar
Wang, F.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Google Scholar
Yin, H.: Dreaming to distill: data-free knowledge transfer via DeepInversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8715–8724 (2020)
Google Scholar
Bottou, L.: Stochastic gradient descent tricks (2012)
Google Scholar

Download references

Acknowledgement

This work was supported by the Mianyang Science and Technology Program 2020YFZJ016, SWUST Doctoral Foundation under Grant 19zx7102, 21zx7114, Sichuan Science and Technology Program under Grant 2020YFS0307.

Author information

Authors and Affiliations

School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, China
Jialiang Tang, Xiaoyan Yang, Xin Cheng, Ning Jiang & Wenxin Yu
School of Science, Southwest University of Science and Technology, Mianyang, China
Peng Zhang

Authors

Jialiang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Ning Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Jiang .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, J., Yang, X., Cheng, X., Jiang, N., Yu, W., Zhang, P. (2021). Data-Free Knowledge Distillation with Positive-Unlabeled Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-92270-2_27
Published: 07 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92269-6
Online ISBN: 978-3-030-92270-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data-Free Knowledge Distillation with Positive-Unlabeled Learning