On Dark Knowledge for Distilling Generators

Hong, Chi; Birke, Robert; Chen, Pin-Yu; Chen, Lydia Y.

doi:10.1007/978-981-97-2253-2_19

Chi Hong¹³,
Robert Birke¹⁵,
Pin-Yu Chen¹⁶ &
…
Lydia Y. Chen^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

149 Accesses

Abstract

Knowledge distillation has been applied on generative models, such as Variational Autoencoder (VAE) and Generative Adversarial Networks (GANs). To distill the knowledge, the synthetic outputs of a teacher generator are used to train a student model. While the dark knowledge, i.e., the probabilistic output, is well explored in distilling classifiers, little is known about the existence of an equivalent dark knowledge for generative models and its extractability. In this paper, we derive the first kind of empirical risk bound for distilling generative models from a Bayesian perspective. Through our analysis, we show the existence of the dark knowledge for generative models, i.e., Bayes probability distribution of a synthetic output from a given input, which achieves lower empirical risk bound than merely using the synthetic output of the generators. Furthermore, we propose a Dark Knowledge based Distillation, DKtill, which trains the student generator based on the (approximate) dark knowledge. Our extensive evaluation on distilling VAE, conditional GANs, and translation GANs on Facades and CelebA datasets show that the FID of student generators trained by DKtill combining dark knowledge are lower than student generators trained only by the synthetic outputs by up to 42.66%, and 78.99%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We interchangeably use teacher or target model/generator.
2.
The analysis of this paper can be straightforwardly extended to three channel images.

References

Aguinaldo, A., Chiang, P., Gain, A., Patil, A., Pearson, K., Feizi, S.: Compressing GANs using knowledge distillation. CoRR abs/1902.00159 (2019)
Google Scholar
Chandrasekaran, V., Chaudhuri, K., Giacomelli, I., Jha, S., Yan, S.: Exploring connections between active learning and model extraction. In: USENIX Security (2020)
Google Scholar
Chen, H., et al.: Distilling portable generative adversarial networks for image translation. In: AAAI (2020)
Google Scholar
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
Google Scholar
Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High accuracy and high fidelity extraction of neural networks. In: USENIX Security (2020)
Google Scholar
Ji, G., Zhu, Z.: Knowledge distillation in wide neural networks: risk bound, data efficiency and imperfect teacher. In: NeurIPS 2020 (2020)
Google Scholar
Kanwal, N., Eftestøl, T., Khoraminia, F., Zuiverloon, T.C., Engan, K.: Vision transformers for small histological datasets learned through knowledge distillation. In: PAKDD (2023)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2014)
Google Scholar
Krishna, K., Tomar, G.S., Parikh, A.P., Papernot, N., Iyyer, M.: Thieves on sesame street! Model extraction of BERT-based APIs. In: ICLR (2020)
Google Scholar
Liu, Z., Zhu, Y., Gao, Z., Sheng, X., Xu, L.: Itrievalkd: an iterative retrieval framework assisted with knowledge distillation for noisy text-to-image retrieval. In: PAKDD (2023)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: (ICCV) (2015)
Google Scholar
Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2016)
Google Scholar
Maurer, A., Pontil, M.: Empirical Bernstein bounds and sample variance penalization. In: COLT 2009 - The 22nd Conference on Learning Theory (2009)
Google Scholar
Mobahi, H., Farajtabar, M., Bartlett, P.L.: Self-distillation amplifies regularization in Hilbert space. In: NeurIPS (2020)
Google Scholar
Phuong, M., Lampert, C.H.: Towards understanding knowledge distillation. CoRR abs/2105.13093 (2021)
Google Scholar
Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A., Carin, L.: Variational autoencoder for deep learning of images, labels and captions. In: NIPS 29 (2016)
Google Scholar
Truong, J., Maini, P., Walls, R.J., Papernot, N.: Data-free model extraction. In: CVPR (2021)
Google Scholar
Wang, X., Zhang, R., Sun, Y., Qi, J.: KDGAN: knowledge distillation with generative adversarial networks. In: NeurIPS, pp. 783–794 (2018)
Google Scholar
Zhang, Z., Sabuncu, M.R.: Self-distillation as instance-specific label smoothing. In: NeurIPS (2020)
Google Scholar
Zhou, H., et al.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective. In: 9th International Conference on Learning Representations, ICLR 2021 (2021)
Google Scholar
Zhou, M., Wu, J., Liu, Y., Liu, S., Zhu, C.: Dast: data-free substitute training for adversarial attacks. In: CVPR, pp. 231–240. IEEE (2020)
Google Scholar

Download references

Acknowledgements

This work has been supported by the Spoke “FutureHPC & BigData” of the ICSC - Centro Nazionale di Ricerca in “High Performance Computing, Big Data and Quantum Computing”, funded by EU - NextGenerationEU and the EPI project funded by EuroHPC JU under G.A. 101036168.

Author information

Authors and Affiliations

Delft University of Technology, Delft, Netherlands
Chi Hong & Lydia Y. Chen
University of Neuchatel, Neuchatel, Switzerland
Lydia Y. Chen
University of Torino, Turin, Italy
Robert Birke
IBM Research, New York, USA
Pin-Yu Chen

Authors

Chi Hong
View author publications
You can also search for this author in PubMed Google Scholar
Robert Birke
View author publications
You can also search for this author in PubMed Google Scholar
Pin-Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lydia Y. Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lydia Y. Chen .

Editor information

Editors and Affiliations

Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, C., Birke, R., Chen, PY., Chen, L.Y. (2024). On Dark Knowledge for Distilling Generators. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_19

Download citation

DOI: https://doi.org/10.1007/978-981-97-2253-2_19
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Dark Knowledge for Distilling Generators