Privacy Distillation: Reducing Re-identification Risk of Diffusion Models

Fernandez, Virginia; Sanchez, Pedro; Pinaya, Walter Hugo Lopez; Jacenków, Grzegorz; Tsaftaris, Sotirios A.; Cardoso, M. Jorge

doi:10.1007/978-3-031-53767-7_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14533))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

180 Accesses

Abstract

Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a generative model. A question that immediately arises is “How can a data provider ensure that the generative model is not leaking patient identity?”. Our solution consists of (i) training a first diffusion model on real data; (ii) generating a synthetic dataset using this model and filter it to exclude images with a re-identifiability risk; (iii) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with Privacy Distillation can effectively reduce re-identification risk whilst maintaining downstream performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As seen in Stable Diffusion’s successful public release followed by over 6 million downloads (by March 2023) of its weights by the community https://stability.ai/blog/stable-diffusion-public-release.
2.
https://huggingface.co/runwayml/stable-diffusion-v1-5.
3.
We used their code, available at https://github.com/Optimization-AI/LibAUC.

References

Abadi, M., et al.: Deep learning with differential privacy. In: ACM SIGSAC, pp. 308–318 (2016)
Google Scholar
Boecking, B., et al.: Making the most of text semantics to improve biomedical vision-language processing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13696, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_1
Chapter Google Scholar
Carlini, N., et al.: Extracting training data from diffusion models. arXiv (2023)
Google Scholar
Chambon, P., et al.: RoentGen: vision-language foundation model for chest X-ray generation. arXiv preprint arXiv:2211.12737 (2022)
Chen, R.J., Lu, M.Y., Chen, T.Y., Williamson, D.F., Mahmood, F.: Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 6 (2021)
Article Google Scholar
Cohen, J.P., et al.: TorchXRayVision: a library of chest X-ray datasets and models. In: MIDL (2022)
Google Scholar
Dockhorn, T., Cao, T., Vahdat, A., Kreis, K.: Differentially private diffusion models (2022)
Google Scholar
Fernandez, V., et al.: Can segmentation models be trained with fully synthetically generated data? In: Zhao, C., Svoboda, D., Wolterink, J.M., Escobar, M. (eds.) SASHIMI 2022. LNCS, vol. 13570, pp. 79–90. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16980-9_8
Chapter Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPs, vol. 30 (2017)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)
Google Scholar
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop
Google Scholar
Jacenkow, G., O’Neil, A.Q., Tsaftaris, S.A.: Indication as prior knowledge for multimodal disease classification in chest radiographs with transformers. In: IEEE ISBI (2022)
Google Scholar
Jegorova, M., et al.: Survey: leakage and privacy at inference time. IEEE Trans. Pattern Anal. Mach. Intell. 45, 1–20 (2023)
Google Scholar
Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019)
Article Google Scholar
Jordon, J., Wilson, A., van der Schaar, M.: Synthetic data: Opening the data floodgates to enable faster, more directed development of machine learning methods. arXiv preprint arXiv:2012.04580 (2020)
Kaissis, G.A., Makowski, M.R., Rückert, D., Braren, R.F.: Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2, 305–311 (2020)
Article Google Scholar
Kazerouni, A., et al.: Diffusion models for medical image analysis: a comprehensive survey. arXiv:2211.07804 (2022)
Kumar, K., Desrosiers, C., Siddiqi, K., Colliot, O., Toews, M.: Fiberprint: a subject fingerprint based on sparse code pooling for white matter fiber analysis. Neuroimage 158, 242–259 (2017)
Article Google Scholar
Liu, L., Ren, Y., Lin, Z., Zhao, Z.: Pseudo numerical methods for diffusion models on manifolds. In: ICLR (2022)
Google Scholar
Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: state of the art in health care domain. Comput. Sci. Rev. 48, 100546 (2023)
Article Google Scholar
Packhäuser, K., Folle, L., Thamm, F., Maier, A.: Generation of Anonymous Chest Radiographs Using Latent Diffusion Models for Training Thoracic Abnormality Classification Systems (2022)
Google Scholar
Packhäuser, K., Gündel, S., Münster, N., Syben, C., Christlein, V., Maier, A.: Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest X-ray data. Sci. Rep. 12(1), 1–13 (2022)
Google Scholar
Pinaya, W.H.L., et al.: Fast unsupervised brain anomaly detection and segmentation with diffusion models. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13438, pp. 705–714. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16452-1_67
Chapter Google Scholar
Pinaya, W.H., et al.: Brain imaging generation with latent diffusion models. In: Mukhopadhyay, A., Oksuz, I., Engelhardt, S., Zhu, D., Yuan, Y. (eds.) DGM4MICCAI 2022. LNCS, vol. 13609, pp. 117–126. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18576-2_12
Chapter Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Google Scholar
Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML (2021)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Sanchez, P., Kascenas, A., Liu, X., O’Neil, A.Q., Tsaftaris, S.A.: What is healthy? Generative counterfactual diffusion for lesion localization. In: Mukhopadhyay, A., Oksuz, I., Engelhardt, S., Zhu, D., Yuan, Y. (eds.) DGM4MICCAI 2022. LNCS, vol. 13609, pp. 34–44. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18576-2_4
Chapter Google Scholar
Schuhmann, C., et al.: LAION-5b: an open large-scale dataset for training next generation image-text models. In: NeurIPS Datasets and Benchmarks Track (2022)
Google Scholar
Somepalli, G., Singla, V., Goldblum, M., Geiping, Wu, J., Goldstein, T.: Diffusion art or digital forgery? Investigating data replication in diffusion models. In: CVPR (2023)
Google Scholar
Su, R., Liu, X., Tsaftaris, S.A.: Why patient data cannot be easily forgotten? In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13438, pp. 632–641. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16452-1_60
Yoon, J., Drumright, L.N., van der Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Inform. 24 (2020)
Google Scholar
Yuan, Z., Yan, Y., Sonka, M., Yang, T.: Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification
Google Scholar

Download references

Author information

Authors and Affiliations

King’s College London, London, WC2R 2LS, UK
Virginia Fernandez, Walter Hugo Lopez Pinaya & M. Jorge Cardoso
The University of Edinburgh, Edinburgh, EH9 3FG, UK
Pedro Sanchez, Grzegorz Jacenków & Sotirios A. Tsaftaris

Authors

Virginia Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Walter Hugo Lopez Pinaya
View author publications
You can also search for this author in PubMed Google Scholar
Grzegorz Jacenków
View author publications
You can also search for this author in PubMed Google Scholar
Sotirios A. Tsaftaris
View author publications
You can also search for this author in PubMed Google Scholar
M. Jorge Cardoso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Virginia Fernandez .

Editor information

Editors and Affiliations

TU Darmstadt, Darmstadt, Germany
Anirban Mukhopadhyay
Istanbul Technical University, Istanbul, Türkiye
Ilkay Oksuz
University Hospital Heidelberg, Heidelberg, Germany
Sandy Engelhardt
The University of Texas at Arlington, Arlington, TX, USA
Dajiang Zhu
University of Hong Kong, Hong Kong, Hong Kong
Yixuan Yuan

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2604 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernandez, V., Sanchez, P., Pinaya, W.H.L., Jacenków, G., Tsaftaris, S.A., Cardoso, M.J. (2024). Privacy Distillation: Reducing Re-identification Risk of Diffusion Models. In: Mukhopadhyay, A., Oksuz, I., Engelhardt, S., Zhu, D., Yuan, Y. (eds) Deep Generative Models. MICCAI 2023. Lecture Notes in Computer Science, vol 14533. Springer, Cham. https://doi.org/10.1007/978-3-031-53767-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-53767-7_1
Published: 20 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53766-0
Online ISBN: 978-3-031-53767-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Privacy Distillation: Reducing Re-identification Risk of Diffusion Models