Abstract
Generative Adversarial Networks (GANs) are an important tool to generate synthetic medical data, in order to combat the limited and difficult access to the real data sets and accelerate the innovation in the healthcare domain. Despite their promising capability, they are vulnerable to various privacy attacks that might reveal information of individuals from the training data. Preserving privacy while keeping the quality of the generated data still remains a challenging problem. We propose DP-CTGAN, which incorporates differential privacy into a conditional tabular generative model. Our experiments demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget on several benchmark data sets. In addition, we combine our method with federated learning, enabling a more secure way of synthetic data generation without the need of uploading locally collected data to a central repository.
M. L. Fang and D. S. Dhami—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We consider only tabular medical data set generation.
References
Abadi, M., et al.: Deep learning with differential privacy. In: CCS (2016)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Aviñó, L., Ruffini, M., Gavaldà, R.: Generating synthetic but plausible healthcare record datasets. arXiv preprint arXiv:1807.01514 (2018)
Buczak, A.L., Babin, S., Moniz, L.: Data-driven approach for creating synthetic electronic medical records. BMC Med. Inform. Decis. Making 10, 1–28 (2010)
Deprez, P., Shevchenko, P.V., Wüthrich, M.V.: Machine learning techniques for mortality modeling. Eur. Actuar. J. 7(2), 337–352 (2017). https://doi.org/10.1007/s13385-017-0152-4
Dhami, D.S., Das, M., Natarajan, S.: Beyond simple images: human knowledge-guided GANs for clinical data generation. In: KR (2021)
Dhami, D.S., Kunapuli, G., Das, M., Page, D., Natarajan, S.: Drug-drug interaction discovery: kernel learning from heterogeneous similarities. Smart Health 9, 88–100 (2018)
Dhami, D.S., Soni, A., Page, D., Natarajan, S.: Identifying Parkinson’s patients: a functional gradient boosting approach. In: ten Teije, A., Popow, C., Holmes, J.H., Sacchi, L. (eds.) AIME 2017. LNCS (LNAI), vol. 10259, pp. 332–337. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59758-4_39
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Fan, L.: A survey of differentially private generative adversarial networks. In: The AAAI Workshop on Privacy-Preserving Artificial Intelligence (2020)
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using GAN for improved liver lesion classification. In: ISBI (2018)
Geyer, R.C., Klein, T., Nabi, M.: Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. In: NeurIPS (2017)
Havaei, M., et al.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017)
Jordon, J., Yoon, J., Van Der Schaar, M.: PATE-GAN: generating synthetic data with differential privacy guarantees. In: ICLR (2018)
Konečnỳ, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016)
Lin, Z., Khetan, A., Fanti, G., Oh, S.: PACGAN: the power of two samples in generative adversarial networks. In: NeurIPS (2018)
Mahmood, F., Chen, R., Durr, N.J.: Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE T-MI 37, 2572–2581 (2018)
Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., Talwar, K.: Semi-supervised knowledge transfer for deep learning from private training data. In: ICLR (2017)
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., Kim, Y.: Data synthesis based on generative adversarial networks. VLDB Endow. (2018)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)
Salihovic, I., Serdarevic, H., Kevric, J.: The role of feature selection in machine learning for detection of spam and phishing attacks. In: Avdaković, S. (ed.) IAT 2018. LNNS, vol. 60, pp. 476–483. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02577-9_47
Shamsuddin, R., Maweu, B.M., Li, M., Prabhakaran, B.: Virtual patient model: an approach for generating synthetic healthcare time series data. In: ICHI (2018)
Tango, F., Botta, M.: Real-time detection system of driver distraction using machine learning. IEEE Trans. Intell. Transp. Syst. 14, 894–905 (2013)
Torfi, A., Fox, E.A.: CorGAN: correlation-capturing convolutional generative adversarial networks for generating synthetic healthcare records. In: FLAIRS (2020)
Tucker, A., Wang, Z., Rotalinti, Y., Myles, P.: Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. NPJ Digit. Med. 3, 1–13 (2020)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., Bottou, L.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR (2010)
Walonoski, J., et al.: Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. JAMIA 25, 230–238 (2018)
Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739 (2018)
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: NeurIPS (2019)
Zhang, X., Ji, S., Wang, T.: Differentially private releasing via deep generative model (technical report). arXiv preprint arXiv:1801.01594 (2018)
Acknowledgements
The authors thank the anonymous reviewers for their valuable feedback. This work was supported by the ICT-48 Network of AI Research Excellence Center “TAILOR” (EU Horizon 2020, GA No 952215) and the Nexplore Collaboration Lab “AI in Construction” (AICO). It benefited from “safeFBDC - Financial Big Data Cluster” (FKZ:01MK21002K), funded by the BMWK as part of the GAIA-X initiative, and the HMWK cluster projects “The Third Wave of AI” and “The Adaptive Mind”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fang, M.L., Dhami, D.S., Kersting, K. (2022). DP-CTGAN: Differentially Private Medical Data Generation Using CTGANs. In: Michalowski, M., Abidi, S.S.R., Abidi, S. (eds) Artificial Intelligence in Medicine. AIME 2022. Lecture Notes in Computer Science(), vol 13263. Springer, Cham. https://doi.org/10.1007/978-3-031-09342-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-09342-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09341-8
Online ISBN: 978-3-031-09342-5
eBook Packages: Computer ScienceComputer Science (R0)