Abstract
Generating synthetic data for the dissemination of individual information in a privacy-preserving way is an approach that is often presented as superior to other statistical disclosure control techniques. The reason for such claim is straightforward at first glance: since all records disseminated are synthetic and not actual observed values, no individual can reasonably claim to face a privacy threat. Thus, and if the synthesizer used is good enough, synthetic data will potentially always offer a high level of information with low disclosure risk attached. Building on recent advances in the literature regarding the conceptualization of an intruder, this paper aims at challenging this claim by reassessing the privacy guarantees of synthetic data. Using the concept of a maximum-knowledge intruder, we demonstrate that synthetic data can in fact be always expressed as a re-arrangement of the original data and that, as a result, they may lead to configurations where disclosure risk may be higher than for non-synthetic disclosure control approaches. We illustrate the application of these results by an empirical example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Using these notations, \( o_{ij} \) is the rank of attribute j in original record i and \( s_{lj}^{m} \) is the rank of attribute j in synthetic record l of the mth synthetic data set.
- 2.
The two other attributes are not shown here due to space constraints but their reverse-mapped versions can be displayed in exactly the same way.
References
Domingo-Ferrer, J., Muralidhar, K.: New directions in anonymization: permutation paradigm, verifiability by subjects and intruders, transparency to users. Inf. Sci. 337, 11–24 (2016)
Domingo-Ferrer, J., Ricci, S., Soria-Comas, J.: Disclosure risk assessment via record linkage by a maximum-knowledge attacker. In: 13th Annual International Conference on Privacy, Security and Trust-PST 2015, Izmir, Turkey, September 2015
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)
Drechsler, J.: Synthetic Datasets for Statistical Disclosure Control. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-0326-5
Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB establishment panel. Trans. Data Priv. 1, 105–130 (2008)
Hu, J., Reiter, J.P., Wang, Q.: Disclosure risk evaluation for fully synthetic categorical data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 185–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_15
Hundepool, A., et al.: Statistical Disclosure Control. Wiley, Hoboken (2012)
Muralidhar, K., Domingo-Ferrer, J.: Rank-based record linkage for re-identification risk assessment. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds.) PSD 2016. LNCS, vol. 9867, pp. 225–236. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45381-1_17
Muralidhar, K., Domingo-Ferrer, J.: Microdata masking as permutation. In: UNECE/EUROSTAT Work Session on Statistical Data Confidentiality, Helsinki, Finland, October 2015
Muralidhar, K., Sarathy, R.: A comparison of multiple imputation and data perturbation for masking numerical variables. J. Off. Stat. 22, 507–524 (2006)
Muralidhar, K., Sarathy, R., Domingo-Ferrer, J.: Reverse mapping to preserve the marginal distributions of attributes in masked microdata. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 105–116. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_9
Reiter, J.P., Wang, Q., Zhang, B.: Bayesian estimation of disclosure risks in multiply imputed, synthetic data. J. Priv. Confid. 6(1), 17–33 (2014). Article no. 2
Reiter, J.P.: Satisfying disclosure restrictions with synthetic data sets. J. Off. Stat. 18, 531–544 (2002)
Reiter, J.P.: Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study. J. Roy. Stat. Soc. Ser. A 168, 185–205 (2005)
Rubin, D.B.: Discussion: statistical disclosure control limitation. J. Off. Stat. 9, 462–468 (1993)
Ruiz, N.: On some consequences of the permutation paradigm for data anonymization: centrality of permutation matrices, universal measures of disclosure risk and information loss, evaluation by dominance. Inf. Sci. 430–431, 620–633 (2018)
Ruiz, N.: A general cipher for individual data anonymization. Inf. Sci. (2017, under review). (https://arxiv.org/abs/1712.02557)
Soria-Comas, J., Domingo-Ferrer, J.: A non-parametric model for accurate and provably private synthetic data sets. In: Proceedings of International Conference on Availability, Reliability and Security-ARES 2017, Article no. 3. ACM (2017)
Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001). https://doi.org/10.1007/978-1-4613-0121-9
Acknowledgments and Disclaimer
The following funding sources are gratefully acknowledged by the third author: European Commission (project H2020-700540 “CANVAS”), Government of Catalonia (ICREA Acadèmia Prize) and Spanish Government (projects TIN2014-57364-C2-1-R “SmartGlacis” and TIN2015-70054-REDC). The views in this paper are the authors’ own and do not necessarily reflect the views of UNESCO or any of the funders.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ruiz, N., Muralidhar, K., Domingo-Ferrer, J. (2018). On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker Perspective. In: Domingo-Ferrer, J., Montes, F. (eds) Privacy in Statistical Databases. PSD 2018. Lecture Notes in Computer Science(), vol 11126. Springer, Cham. https://doi.org/10.1007/978-3-319-99771-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-99771-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99770-4
Online ISBN: 978-3-319-99771-1
eBook Packages: Computer ScienceComputer Science (R0)