Abstract
The rank based proximity swap has been suggested as a data masking mechanism for numerical data. Recently, more sophisticated procedures for masking numerical data that are based on the concept of “shuffling” the data have been proposed. In this study, we compare and contrast the performance of the swapping and shuffling procedures. The results indicate that the shuffling procedures perform better than data swapping both in terms of data utility and disclosure risk.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burridge, J.: Information Preserving Statistical Obfuscation. Statistics and Computing 13, 321–327 (2003)
Carlson, M., Salabasis, M.: A data swapping technique for generating synthetic samples: A method for disclosure control. Research in Official Statistics 6, 35–64 (2002)
Dalenius, T., Reiss, S.P.: Data-swapping: A Technique for Disclosure Control. Journal of Statistical Planning and Inference 6, 73–85 (1982)
Dandekar, R.A., Cohen, M., Kirkendall, N.: Sensitive Microdata Protection Using Latin Hypercube Sampling Technique. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases, Springer, New York (2002)
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access, pp. 91–110. North-Holland, Amsterdam (2001)
Fienberg, S.E., McIntyre, J.: Data swapping: Variations on a theme by Dalenius and Reiss. Journal of Official Statistics 21, 309–323 (2005)
Fuller, W.A.: Masking procedures for microdata disclosure limitation. Journal of Official Statistics 9, 383–406 (1993)
Iman, R.L., Conover, W.J.: A distribution free approach to inducing rank correlation among input variables. Communication in Statistics B11, 311–334 (1982)
McKay, M.D., Conover, W.J., Beckman, R.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245 (1979)
Moore, R.A.: Controlled data swapping for masking public use microdatasets. U.S. Census Bureau Research Report 96/04 (1996)
Muralidhar, K., Sarathy, R.: Application of the Two-step Data Shuffle to the 1993 AHS Data: A Report on the Feasibility of Applying Data Shuffling for Microdata Release, research report prepared for the Census Bureau (2002), http://gatton.uky.edu/faculty/muralidhar/maskingpapers/
Muralidhar, K., Sarathy, R.: A theoretical basis for perturbation methods. Statistics and Computing 13, 329–335 (2003)
Muralidhar, K., Sarathy, R.: Data Shuffling - A New Masking Approach for Numerical Data. Management Science 52, 658–670 (2006)
Reiss, S.P., Post, M.J., Dalenius, T.: Non-reversible privacy transformations. In: Proceedings of the ACM Symposium on Principles of Database Systems, Los Angeles, CA, pp. 139–146 (1982)
Sarathy, R., Muralidhar, K., Parsa, R.: Perturbing non-normal confidential variables: The copula approach. Management Science 48, 1613–1627 (2002)
Sarathy, R., Muralidhar, K.: The Security of Confidential Numerical Data in Databases. Information Systems Research 389-403 (2002)
Wall Street Journal, Bureau Blurs Data to Keep Names Confidential. B1-B2 (February 14, 2001)
Winkler, W.E.: Advanced methods for record linkage. In: Proceedings of the American Statistical Association Section on Survey Research Methods, pp. 467–472 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Muralidhar, K., Sarathy, R., Dandekar, R. (2006). Why Swap When You Can Shuffle? A Comparison of the Proximity Swap and Data Shuffle for Numeric Data. In: Domingo-Ferrer, J., Franconi, L. (eds) Privacy in Statistical Databases. PSD 2006. Lecture Notes in Computer Science, vol 4302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11930242_15
Download citation
DOI: https://doi.org/10.1007/11930242_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49330-3
Online ISBN: 978-3-540-49332-7
eBook Packages: Computer ScienceComputer Science (R0)