Using deep learning to preserve data confidentiality

Li, Wei; Meng, Pengqiu; Hong, Yi; Cui, Xiaohui

doi:10.1007/s10489-019-01515-3

Using deep learning to preserve data confidentiality

Published: 24 July 2019

Volume 50, pages 341–353, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Wei Li ORCID: orcid.org/0000-0002-3135-0447¹,
Pengqiu Meng²,
Yi Hong³ &
…
Xiaohui Cui¹

654 Accesses
6 Altmetric
Explore all metrics

Abstract

Preserving data confidentiality is crucial when releasing microdata for public-use. There are a variety of proposed approaches; many of them are based on traditional probability theory and statistics. These approaches mainly focus on masking the original data. In practice, these masking techniques, despite covering part of the data, risk leaving sensitive data open to release. In this paper, we approach this problem using a deep learning-based generative model which generates simulation data to mask the original data. Generating simulation data that holds the same statistical characteristics as the raw data becomes the key idea and also the main challenge in this study. In particular, we explore the statistical similarities between the raw data and the generated data, given that the generated data and raw data are not obviously distinguishable. Two statistical evaluation metrics, Absolute Relative Residual Values and Hellinger Distance, are the evaluation methods we have decided upon to evaluate our results. We also conduct extensive experiments to validate our idea with two real-world datasets: the Census Dataset and the Environmental Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geolocated Data Generation and Protection Using Generative Adversarial Networks

Exploring Privacy-Preserving Techniques on Synthetic Data as a Defense Against Model Inversion Attacks

Privacy Preserving Synthetic Data Release Using Deep Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Rubin DB (1993) Discussion: Statistical disclosure limitation. J Off Stat 9
Min CL, Mitra R, Lazaridis E, An CL, Yong KG, Yap WS Data privacy preserving scheme using generalized linear models. Computers Security 2016
Gurjar SPS, Pasupuleti SK (2017) A privacy-preserving multi- keyword ranked search scheme over encrypted cloud data using mir-tree. In: Interna- tional conference on computing, analytics and security trends, pages 533–538
Google Scholar
Andruszkiewicz P (2007) Optimization for mask scheme in privacy preserving data mining for association rules. In: International conference on rough sets and intelligent systems paradigms, pp 465–474
Chapter Google Scholar
Willenborg L, De Waal T (2001) Elements of statistical disclosure control. Springer
Fienberg SE, Mcintyre J (2004) Data swapping: variations on a theme by dalenius and reiss. In International Workshop on Privacy in Statistical Databases, pages:14–29
Chapter Google Scholar
Fuller WA (1993) Masking procedures for microdata disclosure limitation. J Off Stat 9(2)
Diederik P Kingma and Jimmy Ba. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
Bengio Y (2009) Learning deep architectures for ai. Foundations Trends R in Machine Learning 2(1):1–127
Article MathSciNet Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: International conference on neural information processing systems, pp 2672–2680
Google Scholar
Martin Arjovsky, Soumith Chintala, and L’eon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017
Barrow NJ, Campbell NA (1972) Methods of measuring residual value of fertilizers. Aus- tralian Journal of Experimental Agriculture 12(58):502–510
Article Google Scholar
Simpson DG (1987) Minimum hellinger distance estimation for the analysis of count data. J Am Stat Assoc 82(399):802–807
Article MathSciNet Google Scholar
Rubin DB (2009) Statistical Disclosure Limitation. Springer US
Li N, Li T, Venkatasubramanian S (2007) T-closeness: privacy beyond k-anonymity and l-diversity. In: Data engineering, 2007. ICDE 2007. IEEE 23rd international conference on. IEEE, pp 106–115
Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation. Springer, pp 1–19
Van Tilborg HCA, Jajodia S (2014) Encyclopedia of cryptography and security. Springer Science & Business Media
Yang W, Li T, Jia H (2004) Simulation and experiment of machine vision guidance of agriculture vehicles. Transactions of the Chinese Society of Agricultural Engineering
Dormand JR, Prince PJ (1978) New runge-kutta algorithms for numerical simulation in dynamical astronomy. Celest Mech 18(3):223–232
Article MathSciNet Google Scholar
Stukowski A (2009) Visualization and analysis of atomistic simulation data with ovito-the open visualization tool. IEEE Trans Fuzzy Syst 23(6):2154–2162
Google Scholar
Devia N, Weber R (2013) Generating crime data using agent-based simulation. Comput Environ Urban Syst 42(7):26–41
Article Google Scholar
Phillips A, Cardelli L (2007) Efficient, correct simulation of biological processes in the stochastic pi-calculus. In International Conference on Computational Methods in Systems Biology, pages:184–199
Roe C, Meliopoulos AP, Meisel J, Overbye T (2008) Power system level impacts of plug-in hybrid electric vehicles using simulation data. In: Energy 2030 conference. Energy, pp 1–6, 2008
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. Comput Therm Sci
Mirza M, Osindero S (2014) Conditional generative adversarial nets. Comput Therm Sci:2672–2680
Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. 2016
Google Scholar
Dai F, Zhang D, Li J (2013) Encoder/decoder for privacy protection video with privacy region detection and scrambling. In: International conference on multimedia modeling. Springer, pp 525–527
Ismini Psychoula, Erinc Merdivan, Deepika Singh, Liming Chen, Feng Chen, Sten Hanke, Johannes Kropf, Andreas Holzinger, and Matthieu Geist. A deep learning ap- proach for privacy preservation in assisted living. arXiv preprint arXiv:1802.09359, 2018
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian Goodfellow. Adversarial autoencoders. In ICLR, 2016
Google Scholar
Daskalakis C, Goldberg PW, Papadimitriou CH (2009) The com- plexity of computing a Nash equilibrium. ACM
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. Comput Therm Sci
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929–1958
MathSciNet MATH Google Scholar
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013), 2013
Google Scholar
Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schul- man, Ilya Sutskever, and Pieter Abbeel. Variational lossy autoencoder arXiv preprint arXiv:1611.02731, 2016
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, pp 448–456
Google Scholar
Hardt M, Recht B, Singer Y (2015) Train faster, generalize better: stability of stochastic gradient descent. Mathematics
Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, and Benjamin Recht. The marginal value of adaptive gradient methods in machine learning. 2017
Google Scholar
Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, ter Haar Romeny B, Zimmerman JB, Zuiderveld K (1987) Adap- tive histogram equalization and its variations. Computer vision, graphics, and image processing 39(3):355–368
Article Google Scholar
Nowozin S, Cseke B, Tomioka R (2016) F-Gan: training generative neural samplers using variational divergence minimization. In: Advances in neural in- formation processing systems, pp 271–279
Google Scholar
Bordes A, Bottou L, Gallinari P (2009) Sgdqn: Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10(3):1737–1754
MathSciNet MATH Google Scholar
T. Tieleman and G. Hinton. Rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: neural networks for machine learning., 2012
Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):257–269
MathSciNet MATH Google Scholar
Aleksandar Botev, Guy Lever, and David Barber. Nesterov’s accelerated gradient and momentum as approximations to regularised update descent. 2016
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5767–5777
Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge the support provided by the National Key R&D Program of China (No.2018YFC1604000).

Author information

Authors and Affiliations

School of Cyber Science and Engineering, Wuhan University, Wuhan City, 430079, China
Wei Li & Xiaohui Cui
School of Engineering & Applied Science, Washington University, St. Louis, MO, 63130, USA
Pengqiu Meng
Sumo Logic Inc., Redwood City, CA, USA
Yi Hong

Authors

Wei Li
View author publications
You can also search for this author inPubMed Google Scholar
Pengqiu Meng
View author publications
You can also search for this author inPubMed Google Scholar
Yi Hong
View author publications
You can also search for this author inPubMed Google Scholar
Xiaohui Cui
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiaohui Cui.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, W., Meng, P., Hong, Y. et al. Using deep learning to preserve data confidentiality. Appl Intell 50, 341–353 (2020). https://doi.org/10.1007/s10489-019-01515-3

Download citation

Published: 24 July 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10489-019-01515-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using deep learning to preserve data confidentiality

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Geolocated Data Generation and Protection Using Generative Adversarial Networks

Exploring Privacy-Preserving Techniques on Synthetic Data as a Defense Against Model Inversion Attacks

Privacy Preserving Synthetic Data Release Using Deep Learning

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now