Abstract
In this paper we propose a method for statistical disclosure limitation of categorical variables that we call Conditional Group Swapping. This approach is suitable for design and strata-defining variables, the cross-classification of which leads to the formation of important groups or subpopulations. These groups are considered important because from the point of view of data analysis it is desirable to preserve analytical characteristics within them. In general data swapping can be quite distorting [13, 16, 20], especially for the relationships between the variables not only within the subpopulations but for the overall data. To reduce the damage incurred by swapping, we propose to choose the records for swapping using conditional probabilities which depend on the characteristics of the exchanged records. In particular, our approach exploits the results of propensity scores methodology for the computation of swapping probabilities. The experimental results presented in the paper show good utility properties of the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brand, R.: Microdata protection through noise addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Heidelberg (2002)
Dalenius, T., Reiss, S.P.: Data-swapping: A technique for disclosure control. J. Stat. Plann. Infer. 6, 73–85 (1982)
Dandekar, R.A., Cohen, M., Kirkendall, N.: Sensitive micro data protection using latin hypercube sampling technique. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 117–125. Springer, Heidelberg (2002)
Defays, D., Anwar, N.: Micro-aggregation: a generic method. In: Proceedings of the 2nd International Symposium on Statistical Confidentiality, pp. 69–78. Office for Official Publications of the European Community, Luxembourg (1995)
Drechsler, J.: Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation. Springer, New York (2011)
Elinder, M., Erixson, O.: Gender, social norms, and survival in maritime disasters. Proc Nat. Acad. Sci. USA 109(33), 13220–13224 (2012)
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64, 1183–1210 (1969)
Gomatam, S., Karr, A.F., Chunhua, L., Sanil, A.: Data swapping: a risk-utility framework and web service implementation. Technical Report 134, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Schulte-Nordholt, E., Seri, G., DeWolf, P.-P.: Handbook on Statistical Disclosure Control (version 1.2). ESSNET SDC project (2010). http://neon.vb.cbs.nl/casc
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E., Spicer, K., Wolf, P.-P.: Statistical Disclosure Control. Wiley, New York (2012)
Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. J. Am. Stat. Assoc. 84, 414–420 (1989)
Kaggle. The Home of Data Science. http://www.kaggle.com
Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. Am. Stat. 60(3), 224–232 (2006)
Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research Methodology, pp. 303–308 (1986)
Lin, Y.-X.: Density approximant based on noise multiplied data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 89–104. Springer, Heidelberg (2014)
Mitra, R., Reiter, J.P.: Adjusting survey weights when altering identifying design variables via synthetic data. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 177–188. Springer, Heidelberg (2006)
Moor, R.: Controlled data swapping techniques for masking public use microdata sets. U.S. Census Bureau (1996)
Muralidhar, K., Sarathy, R.: Data shuffling: a new masking approach for numerical data. Manag. Sci. 52(5), 658–670 (2006)
Oganian, A.: Security and Information Loss in Statistical Database Protection. Ph.D. thesis, Universitat Politecnica de Catalunya (2003)
Oganian, A., Karr, A.F.: Combinations of SDC methods for microdata protection. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 102–113. Springer, Heidelberg (2006)
Oganian, A., Karr, A.F.: Masking methods that preserve positivity constraints in microdata. J. Stat. Plann. Infer. 141(1), 31–41 (2011)
Reiss, S.P., Post, M.J., Dalenius, T.: Non-reversible privacy transformations. In: Proceedings of the ACM Symposium on Principles of Database Systems, 29–31 March, pp. 139–146 (1982)
Rosenbaum, P.R., Rubin, D.B.: The Central Role of the propensity score in observational studies for Causal Effects. Biometrika 70, 41–55 (1983)
Takemura, A.: Local recoding and record swapping by maximum weight matching for disclosure control of microdata sets. J. Offic. Stat. 18, 275–289 (2002)
Templ, M.: Statistical disclosure control for microdata using the R-package sdcMicro. Trans. Data Priv. 1(2), 67–85 (2008)
Torra, V.: Microaggregation for categorical variables: a median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
Valliant, R., Dever, J.A., Kreuter, F.: Package ‘PracTools’: Tools for Designing and Weighting Survey Samples (2015). https://cran.r-project.org/web/packages/PracTools/PracTools.pdf
Woo, M.-J., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1), 111–124 (2009)
Acknowledgments
The authors would like to thank Alan Dorfman and Van Parsons for valuable suggestions and help during the preparation of the paper. The findings and conclusions in this paper are those of of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Oganian, A., Lesaja, G. (2016). Propensity Score Based Conditional Group Swapping for Disclosure Limitation of Strata-Defining Variables. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds) Privacy in Statistical Databases. PSD 2016. Lecture Notes in Computer Science(), vol 9867. Springer, Cham. https://doi.org/10.1007/978-3-319-45381-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-45381-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45380-4
Online ISBN: 978-3-319-45381-1
eBook Packages: Computer ScienceComputer Science (R0)