DP-CTGAN: Differentially Private Medical Data Generation Using CTGANs

Fang, Mei Ling; Dhami, Devendra Singh; Kersting, Kristian

doi:10.1007/978-3-031-09342-5_17

Mei Ling Fang^10,11,
Devendra Singh Dhami^11,12 &
Kristian Kersting^11,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13263))

Included in the following conference series:

International Conference on Artificial Intelligence in Medicine

2087 Accesses
9 Citations

Abstract

Generative Adversarial Networks (GANs) are an important tool to generate synthetic medical data, in order to combat the limited and difficult access to the real data sets and accelerate the innovation in the healthcare domain. Despite their promising capability, they are vulnerable to various privacy attacks that might reveal information of individuals from the training data. Preserving privacy while keeping the quality of the generated data still remains a challenging problem. We propose DP-CTGAN, which incorporates differential privacy into a conditional tabular generative model. Our experiments demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget on several benchmark data sets. In addition, we combine our method with federated learning, enabling a more secure way of synthetic data generation without the need of uploading locally collected data to a central repository.

M. L. Fang and D. S. Dhami—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We consider only tabular medical data set generation.

References

Abadi, M., et al.: Deep learning with differential privacy. In: CCS (2016)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Aviñó, L., Ruffini, M., Gavaldà, R.: Generating synthetic but plausible healthcare record datasets. arXiv preprint arXiv:1807.01514 (2018)
Buczak, A.L., Babin, S., Moniz, L.: Data-driven approach for creating synthetic electronic medical records. BMC Med. Inform. Decis. Making 10, 1–28 (2010)
Article Google Scholar
Deprez, P., Shevchenko, P.V., Wüthrich, M.V.: Machine learning techniques for mortality modeling. Eur. Actuar. J. 7(2), 337–352 (2017). https://doi.org/10.1007/s13385-017-0152-4
Article MathSciNet MATH Google Scholar
Dhami, D.S., Das, M., Natarajan, S.: Beyond simple images: human knowledge-guided GANs for clinical data generation. In: KR (2021)
Google Scholar
Dhami, D.S., Kunapuli, G., Das, M., Page, D., Natarajan, S.: Drug-drug interaction discovery: kernel learning from heterogeneous similarities. Smart Health 9, 88–100 (2018)
Article Google Scholar
Dhami, D.S., Soni, A., Page, D., Natarajan, S.: Identifying Parkinson’s patients: a functional gradient boosting approach. In: ten Teije, A., Popow, C., Holmes, J.H., Sacchi, L. (eds.) AIME 2017. LNCS (LNAI), vol. 10259, pp. 332–337. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59758-4_39
Chapter Google Scholar
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
MathSciNet MATH Google Scholar
Fan, L.: A survey of differentially private generative adversarial networks. In: The AAAI Workshop on Privacy-Preserving Artificial Intelligence (2020)
Google Scholar
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using GAN for improved liver lesion classification. In: ISBI (2018)
Google Scholar
Geyer, R.C., Klein, T., Nabi, M.: Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. In: NeurIPS (2017)
Google Scholar
Havaei, M., et al.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017)
Article Google Scholar
Jordon, J., Yoon, J., Van Der Schaar, M.: PATE-GAN: generating synthetic data with differential privacy guarantees. In: ICLR (2018)
Google Scholar
Konečnỳ, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016)
Lin, Z., Khetan, A., Fanti, G., Oh, S.: PACGAN: the power of two samples in generative adversarial networks. In: NeurIPS (2018)
Google Scholar
Mahmood, F., Chen, R., Durr, N.J.: Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE T-MI 37, 2572–2581 (2018)
Google Scholar
Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., Talwar, K.: Semi-supervised knowledge transfer for deep learning from private training data. In: ICLR (2017)
Google Scholar
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., Kim, Y.: Data synthesis based on generative adversarial networks. VLDB Endow. (2018)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)
Google Scholar
Salihovic, I., Serdarevic, H., Kevric, J.: The role of feature selection in machine learning for detection of spam and phishing attacks. In: Avdaković, S. (ed.) IAT 2018. LNNS, vol. 60, pp. 476–483. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02577-9_47
Chapter Google Scholar
Shamsuddin, R., Maweu, B.M., Li, M., Prabhakaran, B.: Virtual patient model: an approach for generating synthetic healthcare time series data. In: ICHI (2018)
Google Scholar
Tango, F., Botta, M.: Real-time detection system of driver distraction using machine learning. IEEE Trans. Intell. Transp. Syst. 14, 894–905 (2013)
Article Google Scholar
Torfi, A., Fox, E.A.: CorGAN: correlation-capturing convolutional generative adversarial networks for generating synthetic healthcare records. In: FLAIRS (2020)
Google Scholar
Tucker, A., Wang, Z., Rotalinti, Y., Myles, P.: Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. NPJ Digit. Med. 3, 1–13 (2020)
Article Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., Bottou, L.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR (2010)
Google Scholar
Walonoski, J., et al.: Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. JAMIA 25, 230–238 (2018)
Google Scholar
Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739 (2018)
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: NeurIPS (2019)
Google Scholar
Zhang, X., Ji, S., Wang, T.: Differentially private releasing via deep generative model (technical report). arXiv preprint arXiv:1801.01594 (2018)

Download references

Acknowledgements

The authors thank the anonymous reviewers for their valuable feedback. This work was supported by the ICT-48 Network of AI Research Excellence Center “TAILOR” (EU Horizon 2020, GA No 952215) and the Nexplore Collaboration Lab “AI in Construction” (AICO). It benefited from “safeFBDC - Financial Big Data Cluster” (FKZ:01MK21002K), funded by the BMWK as part of the GAIA-X initiative, and the HMWK cluster projects “The Third Wave of AI” and “The Adaptive Mind”.

Author information

Authors and Affiliations

Merck KGaA, Darmstadt, Germany
Mei Ling Fang
Technical University of Darmstadt, Darmstadt, Germany
Mei Ling Fang, Devendra Singh Dhami & Kristian Kersting
Hessian Center for AI (hessian.AI), Darmstadt, Germany
Devendra Singh Dhami & Kristian Kersting

Authors

Mei Ling Fang
View author publications
You can also search for this author in PubMed Google Scholar
Devendra Singh Dhami
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Kersting
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Devendra Singh Dhami .

Editor information

Editors and Affiliations

University of Minnesota, Minneapolis, MN, USA
Martin Michalowski
Dalhousie University, Halifax, NS, Canada
Syed Sibte Raza Abidi
Dalhousie University, Halifax, NS, Canada
Samina Abidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, M.L., Dhami, D.S., Kersting, K. (2022). DP-CTGAN: Differentially Private Medical Data Generation Using CTGANs. In: Michalowski, M., Abidi, S.S.R., Abidi, S. (eds) Artificial Intelligence in Medicine. AIME 2022. Lecture Notes in Computer Science(), vol 13263. Springer, Cham. https://doi.org/10.1007/978-3-031-09342-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-09342-5_17
Published: 09 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09341-8
Online ISBN: 978-3-031-09342-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics