Why Swap When You Can Shuffle? A Comparison of the Proximity Swap and Data Shuffle for Numeric Data

Muralidhar, Krish; Sarathy, Rathindra; Dandekar, Ramesh

doi:10.1007/11930242_15

Krish Muralidhar¹⁸,
Rathindra Sarathy¹⁹ &
Ramesh Dandekar²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4302))

Included in the following conference series:

International Conference on Privacy in Statistical Databases

814 Accesses
7 Citations

Abstract

The rank based proximity swap has been suggested as a data masking mechanism for numerical data. Recently, more sophisticated procedures for masking numerical data that are based on the concept of “shuffling” the data have been proposed. In this study, we compare and contrast the performance of the swapping and shuffling procedures. The results indicate that the shuffling procedures perform better than data swapping both in terms of data utility and disclosure risk.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burridge, J.: Information Preserving Statistical Obfuscation. Statistics and Computing 13, 321–327 (2003)
Article MathSciNet Google Scholar
Carlson, M., Salabasis, M.: A data swapping technique for generating synthetic samples: A method for disclosure control. Research in Official Statistics 6, 35–64 (2002)
Google Scholar
Dalenius, T., Reiss, S.P.: Data-swapping: A Technique for Disclosure Control. Journal of Statistical Planning and Inference 6, 73–85 (1982)
Article MATH MathSciNet Google Scholar
Dandekar, R.A., Cohen, M., Kirkendall, N.: Sensitive Microdata Protection Using Latin Hypercube Sampling Technique. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases, Springer, New York (2002)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access, pp. 91–110. North-Holland, Amsterdam (2001)
Google Scholar
Fienberg, S.E., McIntyre, J.: Data swapping: Variations on a theme by Dalenius and Reiss. Journal of Official Statistics 21, 309–323 (2005)
Google Scholar
Fuller, W.A.: Masking procedures for microdata disclosure limitation. Journal of Official Statistics 9, 383–406 (1993)
Google Scholar
Iman, R.L., Conover, W.J.: A distribution free approach to inducing rank correlation among input variables. Communication in Statistics B11, 311–334 (1982)
Google Scholar
McKay, M.D., Conover, W.J., Beckman, R.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245 (1979)
Article MATH MathSciNet Google Scholar
Moore, R.A.: Controlled data swapping for masking public use microdatasets. U.S. Census Bureau Research Report 96/04 (1996)
Google Scholar
Muralidhar, K., Sarathy, R.: Application of the Two-step Data Shuffle to the 1993 AHS Data: A Report on the Feasibility of Applying Data Shuffling for Microdata Release, research report prepared for the Census Bureau (2002), http://gatton.uky.edu/faculty/muralidhar/maskingpapers/
Muralidhar, K., Sarathy, R.: A theoretical basis for perturbation methods. Statistics and Computing 13, 329–335 (2003)
Article MathSciNet Google Scholar
Muralidhar, K., Sarathy, R.: Data Shuffling - A New Masking Approach for Numerical Data. Management Science 52, 658–670 (2006)
Article Google Scholar
Reiss, S.P., Post, M.J., Dalenius, T.: Non-reversible privacy transformations. In: Proceedings of the ACM Symposium on Principles of Database Systems, Los Angeles, CA, pp. 139–146 (1982)
Google Scholar
Sarathy, R., Muralidhar, K., Parsa, R.: Perturbing non-normal confidential variables: The copula approach. Management Science 48, 1613–1627 (2002)
Article Google Scholar
Sarathy, R., Muralidhar, K.: The Security of Confidential Numerical Data in Databases. Information Systems Research 389-403 (2002)
Google Scholar
Wall Street Journal, Bureau Blurs Data to Keep Names Confidential. B1-B2 (February 14, 2001)
Google Scholar
Winkler, W.E.: Advanced methods for record linkage. In: Proceedings of the American Statistical Association Section on Survey Research Methods, pp. 467–472 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Kentucky, Lexington, KY, 40506, USA
Krish Muralidhar
Oklahoma State University, Stillwater, OK, 74078, USA
Rathindra Sarathy
Department of Energy, Energy Information Administration, Washington, DC, USA
Ramesh Dandekar

Authors

Krish Muralidhar
View author publications
You can also search for this author in PubMed Google Scholar
Rathindra Sarathy
View author publications
You can also search for this author in PubMed Google Scholar
Ramesh Dandekar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, UNESCO Chair in Data Privacy, Av. Països Catalans 26, E-43007, Tarragona, Catalonia
Josep Domingo-Ferrer
Istat, Servizio Progettazione e Supporto Metodologico, nei Processi di Produzione Statistica, Via Cesare Balbo 16, 00184, Roma, Italy
Luisa Franconi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muralidhar, K., Sarathy, R., Dandekar, R. (2006). Why Swap When You Can Shuffle? A Comparison of the Proximity Swap and Data Shuffle for Numeric Data. In: Domingo-Ferrer, J., Franconi, L. (eds) Privacy in Statistical Databases. PSD 2006. Lecture Notes in Computer Science, vol 4302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11930242_15

Download citation

DOI: https://doi.org/10.1007/11930242_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49330-3
Online ISBN: 978-3-540-49332-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics