Skip to main content

Privacy Distillation: Reducing Re-identification Risk of Diffusion Models

  • Conference paper
  • First Online:
Deep Generative Models (MICCAI 2023)

Abstract

Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a generative model. A question that immediately arises is “How can a data provider ensure that the generative model is not leaking patient identity?”. Our solution consists of (i) training a first diffusion model on real data; (ii) generating a synthetic dataset using this model and filter it to exclude images with a re-identifiability risk; (iii) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with Privacy Distillation can effectively reduce re-identification risk whilst maintaining downstream performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    As seen in Stable Diffusion’s successful public release followed by over 6 million downloads (by March 2023) of its weights by the community https://stability.ai/blog/stable-diffusion-public-release.

  2. 2.

    https://huggingface.co/runwayml/stable-diffusion-v1-5.

  3. 3.

    We used their code, available at https://github.com/Optimization-AI/LibAUC.

References

  1. Abadi, M., et al.: Deep learning with differential privacy. In: ACM SIGSAC, pp. 308–318 (2016)

    Google Scholar 

  2. Boecking, B., et al.: Making the most of text semantics to improve biomedical vision-language processing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13696, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_1

    Chapter  Google Scholar 

  3. Carlini, N., et al.: Extracting training data from diffusion models. arXiv (2023)

    Google Scholar 

  4. Chambon, P., et al.: RoentGen: vision-language foundation model for chest X-ray generation. arXiv preprint arXiv:2211.12737 (2022)

  5. Chen, R.J., Lu, M.Y., Chen, T.Y., Williamson, D.F., Mahmood, F.: Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 6 (2021)

    Article  Google Scholar 

  6. Cohen, J.P., et al.: TorchXRayVision: a library of chest X-ray datasets and models. In: MIDL (2022)

    Google Scholar 

  7. Dockhorn, T., Cao, T., Vahdat, A., Kreis, K.: Differentially private diffusion models (2022)

    Google Scholar 

  8. Fernandez, V., et al.: Can segmentation models be trained with fully synthetically generated data? In: Zhao, C., Svoboda, D., Wolterink, J.M., Escobar, M. (eds.) SASHIMI 2022. LNCS, vol. 13570, pp. 79–90. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16980-9_8

    Chapter  Google Scholar 

  9. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPs, vol. 30 (2017)

    Google Scholar 

  10. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)

    Google Scholar 

  11. Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop

    Google Scholar 

  12. Jacenkow, G., O’Neil, A.Q., Tsaftaris, S.A.: Indication as prior knowledge for multimodal disease classification in chest radiographs with transformers. In: IEEE ISBI (2022)

    Google Scholar 

  13. Jegorova, M., et al.: Survey: leakage and privacy at inference time. IEEE Trans. Pattern Anal. Mach. Intell. 45, 1–20 (2023)

    Google Scholar 

  14. Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019)

    Article  Google Scholar 

  15. Jordon, J., Wilson, A., van der Schaar, M.: Synthetic data: Opening the data floodgates to enable faster, more directed development of machine learning methods. arXiv preprint arXiv:2012.04580 (2020)

  16. Kaissis, G.A., Makowski, M.R., Rückert, D., Braren, R.F.: Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2, 305–311 (2020)

    Article  Google Scholar 

  17. Kazerouni, A., et al.: Diffusion models for medical image analysis: a comprehensive survey. arXiv:2211.07804 (2022)

  18. Kumar, K., Desrosiers, C., Siddiqi, K., Colliot, O., Toews, M.: Fiberprint: a subject fingerprint based on sparse code pooling for white matter fiber analysis. Neuroimage 158, 242–259 (2017)

    Article  Google Scholar 

  19. Liu, L., Ren, Y., Lin, Z., Zhao, Z.: Pseudo numerical methods for diffusion models on manifolds. In: ICLR (2022)

    Google Scholar 

  20. Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: state of the art in health care domain. Comput. Sci. Rev. 48, 100546 (2023)

    Article  Google Scholar 

  21. Packhäuser, K., Folle, L., Thamm, F., Maier, A.: Generation of Anonymous Chest Radiographs Using Latent Diffusion Models for Training Thoracic Abnormality Classification Systems (2022)

    Google Scholar 

  22. Packhäuser, K., Gündel, S., Münster, N., Syben, C., Christlein, V., Maier, A.: Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest X-ray data. Sci. Rep. 12(1), 1–13 (2022)

    Google Scholar 

  23. Pinaya, W.H.L., et al.: Fast unsupervised brain anomaly detection and segmentation with diffusion models. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13438, pp. 705–714. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16452-1_67

    Chapter  Google Scholar 

  24. Pinaya, W.H., et al.: Brain imaging generation with latent diffusion models. In: Mukhopadhyay, A., Oksuz, I., Engelhardt, S., Zhu, D., Yuan, Y. (eds.) DGM4MICCAI 2022. LNCS, vol. 13609, pp. 117–126. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18576-2_12

    Chapter  Google Scholar 

  25. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

    Google Scholar 

  26. Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML (2021)

    Google Scholar 

  27. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  28. Sanchez, P., Kascenas, A., Liu, X., O’Neil, A.Q., Tsaftaris, S.A.: What is healthy? Generative counterfactual diffusion for lesion localization. In: Mukhopadhyay, A., Oksuz, I., Engelhardt, S., Zhu, D., Yuan, Y. (eds.) DGM4MICCAI 2022. LNCS, vol. 13609, pp. 34–44. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18576-2_4

    Chapter  Google Scholar 

  29. Schuhmann, C., et al.: LAION-5b: an open large-scale dataset for training next generation image-text models. In: NeurIPS Datasets and Benchmarks Track (2022)

    Google Scholar 

  30. Somepalli, G., Singla, V., Goldblum, M., Geiping, Wu, J., Goldstein, T.: Diffusion art or digital forgery? Investigating data replication in diffusion models. In: CVPR (2023)

    Google Scholar 

  31. Su, R., Liu, X., Tsaftaris, S.A.: Why patient data cannot be easily forgotten? In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13438, pp. 632–641. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16452-1_60

  32. Yoon, J., Drumright, L.N., van der Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Inform. 24 (2020)

    Google Scholar 

  33. Yuan, Z., Yan, Y., Sonka, M., Yang, T.: Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Virginia Fernandez .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2604 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernandez, V., Sanchez, P., Pinaya, W.H.L., Jacenków, G., Tsaftaris, S.A., Cardoso, M.J. (2024). Privacy Distillation: Reducing Re-identification Risk of Diffusion Models. In: Mukhopadhyay, A., Oksuz, I., Engelhardt, S., Zhu, D., Yuan, Y. (eds) Deep Generative Models. MICCAI 2023. Lecture Notes in Computer Science, vol 14533. Springer, Cham. https://doi.org/10.1007/978-3-031-53767-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53767-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53766-0

  • Online ISBN: 978-3-031-53767-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics