Skip to main content

On Dark Knowledge for Distilling Generators

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

  • 149 Accesses

Abstract

Knowledge distillation has been applied on generative models, such as Variational Autoencoder (VAE) and Generative Adversarial Networks (GANs). To distill the knowledge, the synthetic outputs of a teacher generator are used to train a student model. While the dark knowledge, i.e., the probabilistic output, is well explored in distilling classifiers, little is known about the existence of an equivalent dark knowledge for generative models and its extractability. In this paper, we derive the first kind of empirical risk bound for distilling generative models from a Bayesian perspective. Through our analysis, we show the existence of the dark knowledge for generative models, i.e., Bayes probability distribution of a synthetic output from a given input, which achieves lower empirical risk bound than merely using the synthetic output of the generators. Furthermore, we propose a Dark Knowledge based Distillation, DKtill, which trains the student generator based on the (approximate) dark knowledge. Our extensive evaluation on distilling VAE, conditional GANs, and translation GANs on Facades and CelebA datasets show that the FID of student generators trained by DKtill combining dark knowledge are lower than student generators trained only by the synthetic outputs by up to 42.66%, and 78.99%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We interchangeably use teacher or target model/generator.

  2. 2.

    The analysis of this paper can be straightforwardly extended to three channel images.

References

  1. Aguinaldo, A., Chiang, P., Gain, A., Patil, A., Pearson, K., Feizi, S.: Compressing GANs using knowledge distillation. CoRR abs/1902.00159 (2019)

    Google Scholar 

  2. Chandrasekaran, V., Chaudhuri, K., Giacomelli, I., Jha, S., Yan, S.: Exploring connections between active learning and model extraction. In: USENIX Security (2020)

    Google Scholar 

  3. Chen, H., et al.: Distilling portable generative adversarial networks for image translation. In: AAAI (2020)

    Google Scholar 

  4. Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)

    Google Scholar 

  5. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)

    Google Scholar 

  6. Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High accuracy and high fidelity extraction of neural networks. In: USENIX Security (2020)

    Google Scholar 

  7. Ji, G., Zhu, Z.: Knowledge distillation in wide neural networks: risk bound, data efficiency and imperfect teacher. In: NeurIPS 2020 (2020)

    Google Scholar 

  8. Kanwal, N., Eftestøl, T., Khoraminia, F., Zuiverloon, T.C., Engan, K.: Vision transformers for small histological datasets learned through knowledge distillation. In: PAKDD (2023)

    Google Scholar 

  9. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2014)

    Google Scholar 

  10. Krishna, K., Tomar, G.S., Parikh, A.P., Papernot, N., Iyyer, M.: Thieves on sesame street! Model extraction of BERT-based APIs. In: ICLR (2020)

    Google Scholar 

  11. Liu, Z., Zhu, Y., Gao, Z., Sheng, X., Xu, L.: Itrievalkd: an iterative retrieval framework assisted with knowledge distillation for noisy text-to-image retrieval. In: PAKDD (2023)

    Google Scholar 

  12. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: (ICCV) (2015)

    Google Scholar 

  13. Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2016)

    Google Scholar 

  14. Maurer, A., Pontil, M.: Empirical Bernstein bounds and sample variance penalization. In: COLT 2009 - The 22nd Conference on Learning Theory (2009)

    Google Scholar 

  15. Mobahi, H., Farajtabar, M., Bartlett, P.L.: Self-distillation amplifies regularization in Hilbert space. In: NeurIPS (2020)

    Google Scholar 

  16. Phuong, M., Lampert, C.H.: Towards understanding knowledge distillation. CoRR abs/2105.13093 (2021)

    Google Scholar 

  17. Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A., Carin, L.: Variational autoencoder for deep learning of images, labels and captions. In: NIPS 29 (2016)

    Google Scholar 

  18. Truong, J., Maini, P., Walls, R.J., Papernot, N.: Data-free model extraction. In: CVPR (2021)

    Google Scholar 

  19. Wang, X., Zhang, R., Sun, Y., Qi, J.: KDGAN: knowledge distillation with generative adversarial networks. In: NeurIPS, pp. 783–794 (2018)

    Google Scholar 

  20. Zhang, Z., Sabuncu, M.R.: Self-distillation as instance-specific label smoothing. In: NeurIPS (2020)

    Google Scholar 

  21. Zhou, H., et al.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective. In: 9th International Conference on Learning Representations, ICLR 2021 (2021)

    Google Scholar 

  22. Zhou, M., Wu, J., Liu, Y., Liu, S., Zhu, C.: Dast: data-free substitute training for adversarial attacks. In: CVPR, pp. 231–240. IEEE (2020)

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the Spoke “FutureHPC & BigData” of the ICSC - Centro Nazionale di Ricerca in “High Performance Computing, Big Data and Quantum Computing”, funded by EU - NextGenerationEU and the EPI project funded by EuroHPC JU under G.A. 101036168.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lydia Y. Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hong, C., Birke, R., Chen, PY., Chen, L.Y. (2024). On Dark Knowledge for Distilling Generators. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2253-2_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2252-5

  • Online ISBN: 978-981-97-2253-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics