Abstract
Dissemination of data with sensitive information has an implicit risk of unauthorised disclosure. Several masking methods have been developed in order to protect the data without losing too much information. One of the methods is called the Post Randomisation Method (PRAM) which is based on perturbations according to a Markov probability transition matrix. However, the method has the drawback that it is difficult to find an optimal transition matrix to perform perturbations which maximise data utility. In this paper we present an study of data privacy from the point of view of optimisation using evolutionary algorithms to generate optimal probability transition matrices. Optimality is with respect to a pre-defined fitness function which aims to preserve several data protection properties such as data utility and disclosure risk. We also provide experimental results using real datasets in order to illustrate and empirically evaluate the application of this technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
De Wolf, P., Van Gelder, I.: An empirical evaluation of PRAM. Discussion Paper No. 04012. Statistics Netherlands, Voorburg/Heerlen (2004)
DeGroot, M., Schervish, M.: Probability and Statistics. Addison-Wesley Series in Statistics, 4th edn. Addison-Wesley, Boston (2012)
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theuwes, J.J.M., Vatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier, Amsterdam (2001) (chap. 5)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, isclosure, and Data Access : Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier, Amsterdam (2001)
Domingo-Ferrer, J., Torra, V.: Distance-based and probabilistic record linkage for re-identification of records with categorical variables. Butlletí de lACIA 28, 243–250 (2002)
Fienberg, S.: Conflict between the needs for access to statistical information and demands for confidentiality. J. Off. Stat. 10(2), 115–132 (1994)
Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co. Inc., Boston (1989)
Gouweleeuw, J., Kooiman, P., Willenborg, L., de Wolf, P.: Post randomization for statistical disclosure control: theory and implementation. J. Off. Stat. 14(4), 463–478 (1998)
Greiner, D., Winter, G., Emperador, J.M., Galván, B.: Gray coding in evolutionary multicriteria optimization: application in frame structural optimum design. In: Proceedings of the Third international conference on Evolutionary Multi-Criterion Optimization, pp. 576–591. EMO’05, Springer, Berlin, Heidelberg (2005). http://dx.doi.org/10.1007/978-3-540-31880-4_40
Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975) (2nd edn.: MIT Press, 1992)
Kooiman, P., Willenborg, L., Gouweleeuw, J.: PRAM: a method for disclosure limitation of microdata. Research Paper No. 9705. Statistics Netherlands, Voorburg, (1997)
Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Marés, J., Shlomo, N.: Data privacy using an evolutionary algorithm for invariant PRAM matrices. Comput. Stat. Data Anal. 79, 1–13 (2014)
Mars, J., Torra, V.: An evolutionary algorithm to enhance multivariate post-randomization method (PRAM) protections. Inf. Sci. (0) (2014). http://www.sciencedirect.com/science/article/pii/S002002551400348X
Shlomo, N., Young, C.: Invariant post-tabular protection of census frequency counts. In: Domingo-Ferrer, J., Saygin, Y. (eds.) Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, pp. 77–89. Springer, Berlin (2008). http://dblp.uni-trier.de/db/conf/psd/psd2008.html#ShlomoY08
Solanas, A., Martinez-Balleste, A., Mateo-Sanz, J., Domingo-Ferrer, J.: Multivariate microaggregation based genetic algorithms. In: 3rd International IEEE Conference on Intelligent Systems 2006, pp. 65–70, Sept 2006
U.S. Census Bureau: U.S. Housing Survey of 1993 (1993), http://quickfacts.census.gov
Willenborg, L., Waal, T.D.: Elements of Statistical Disclosure Control. In: Lecture Notes in Statistics, vol. 155. Springer, Berlin (2000)
Wolf, P.D., Gouweleeuw, J., Kooiman, P., Willenborg, L.: Reflections on PRAM. In: Statistical Data Protection, pp. 337–349. Office for Official Publications of the European Communities, Luxembourg (1998)
Acknowledgments
This work has been partially supported by the Spanish MECARES-CONSOLIDER INGENIO 2010 CSD2007-00004, and COPRIVACY TIN2011-27076-C03-03 and the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement num. 262608.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Marés, J., Torra, V., Shlomo, N. (2015). Optimisation-Based Study of Data Privacy by Using PRAM. In: Navarro-Arribas, G., Torra, V. (eds) Advanced Research in Data Privacy. Studies in Computational Intelligence, vol 567. Springer, Cham. https://doi.org/10.1007/978-3-319-09885-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-09885-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09884-5
Online ISBN: 978-3-319-09885-2
eBook Packages: EngineeringEngineering (R0)