Abstract
Current anonymization techniques for statistical databases exhibit significant limitations, related to the utility-privacy trade-off, the introduction of artefacts, and the vulnerability to correlation. We propose an anonymization technique based on the whitening/recolouring procedure that considers the database as an instance of a random population and applies statistical signal processing methods to it. In response to a query, the technique estimates the covariance matrix of the true data and builds a linear transformation of the data, producing an output that has the same statistical characteristics of the true data up to the second order, but is not directly linked to single records. The technique is applied to a real database containing the location data of taxi trips in New York. We show that the technique reduces the amount of artefacts introduced by noise addition while preserving first- and second-order statistical features of the true data (hence maintaining the utility of the query output).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Balle, B., Wang, Y.X.: Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. arXiv preprint arXiv:1805.06530 (2018)
Beaulieu-Jones, B.K., et al.: Privacy-preserving generative deep neural networks support clinical data sharing. Circ. Cardiovasc. Qual. Outcomes 12(7), e005122 (2019)
Björck, Å., Hammarling, S.: A schur method for the square root of a matrix. Linear Algebra Appl. 52, 127–140 (1983)
Brand, R.: Microdata protection through noise addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47804-3_8
Ciriani, V., Capitani di Vimercati, S., Foresti, S., Samarati, P.: Microdata protection. In: Yu, T., Jajodia, S. (eds.) Secure Data Management in Decentralized Systems. Advances in Information Security, vol 33, pp. 291–321. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-27696-0_9
Domingo-Ferrer, J., Sebé, F., Castellà-Roca, J.: On the security of noise addition for privacy in statistical databases. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 149–161. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25955-8_12
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
He, X., Yuan, H., Chen, Y.: Exploring the privacy bound for differential privacy: from theory to practice. EAI Endorsed Trans. Secur. Saf. 5(18) (2019)
Hsu, J., et al.: Differential privacy: an economic method for choosing epsilon. In: 2014 IEEE 27th Computer Security Foundations Symposium (CSF), pp. 398–410. IEEE (2014)
Johnson, N., Near, J.P., Song, D.: Towards practical differential privacy for SQL queries. Proc. VLDB Endow. 11(5), 526–539 (2018)
Kessy, A., Lewin, A., Strimmer, K.: Optimal whitening and decorrelation. Am. Statist. 72(4), 309–314 (2018)
Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, pp. 303–308. American Statistical Association (1986)
Kohli, N., Laskowski, P.: Epsilon voting: Mechanism design for parameter selection in differential privacy. In: 2018 IEEE Symposium on Privacy-Aware Computing (PAC), pp. 19–30. IEEE (2018)
Lasko, T.A., Vinterbo, S.A.: Spectral anonymization of data. IEEE Trans. Knowl. Data Eng. 22(3), 437–446 (2009)
Lee, J., Clifton, C.: How much is enough? Choosing \({\varepsilon }\) for differential privacy. In: Lai, X., Zhou, J., Li, H. (eds.) ISC 2011. LNCS, vol. 7001, pp. 325–340. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24861-0_22
Liu, C., Chakraborty, S., Mittal, P.: Dependence makes you vulnerable: differential privacy under dependent tuples. In: Proceedings Network and Distributed System Security Symposium (NDSS 2016) (2016)
Meiser, S.: Approximate and probabilistic differential privacy definitions. IACR Cryptology ePrint Archive 2018, 277 (2018)
Mivule, K.: Utilizing noise addition for data privacy, an overview. arXiv preprint arXiv:1309.3958 (2013)
Naldi, M., D’Acquisto, G.: Differential privacy for counting queries: can Bayes estimation help uncover the true value? arXiv preprint arXiv:1407.0116 (2014)
Naldi, M., D’Acquisto, G.: Differential privacy: an estimation theory-based method for choosing epsilon. CoRR, arXiv Preprint Series abs/1510.00917 (2015). http://arxiv.org/abs/1510.00917
Naldi, M., Mazzoccoli, A., D’Acquisto, G.: Hiding alice in wonderland: a case for the use of signal processing techniques in differential privacy. In: Medina, M., Mitrakas, A., Rannenberg, K., Schweighofer, E., Tsouroulas, N. (eds.) APF 2018. LNCS, vol. 11079, pp. 77–90. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02547-2_5
Shoshani, A.: Statistical databases: characteristics, problems, and some solutions. In: Proceedings of the 8th International Conference on Very Large Data Bases, pp. 208–222. Morgan Kaufmann Publishers Inc., Burlington (1982)
Spruill, N.L.: The confidentiality and analytic usefulness of masked business microdata. Rev. Public Data Use 12(4), 307–314 (1984)
Sullivan, G.R.: The use of added error to avoid disclosure in microdata releases. Ph.D. thesis, Iowa State University (1989)
Tendick, P.: Optimal noise addition for preserving confidentiality in multivariate data. J. Statist. Plan. Infer. 27(3), 341–353 (1991)
Tendick, P., Matloff, N.: A modified random perturbation method for database security. ACM Trans. Database Syst. (TODS) 19(1), 47–63 (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
D’Acquisto, G., Mazzoccoli, A., Ciminelli, F., Naldi, M. (2020). Privacy Through Data Recolouring. In: Antunes, L., Naldi, M., Italiano, G., Rannenberg, K., Drogkaris, P. (eds) Privacy Technologies and Policy. APF 2020. Lecture Notes in Computer Science(), vol 12121. Springer, Cham. https://doi.org/10.1007/978-3-030-55196-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-55196-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55195-7
Online ISBN: 978-3-030-55196-4
eBook Packages: Computer ScienceComputer Science (R0)