Abstract
This paper introduces the ‘guessing anonymity,’ a definition of privacy for noise perturbation methods. This definition captures the difficulty of linking identity to a sanitized record using publicly available information. Importantly, this definition leads to analytical expressions that bound data privacy as a function of the noise perturbation parameters. Using these bounds, we can formulate optimization problems to describe the feasible tradeoffs between data distortion and privacy, without exhaustively searching the noise parameter space. This work addresses an important shortcoming of noise perturbation methods, by providing them with an intuitive definition of privacy analogous to the definition used in k-anonymity, and an analytical means for selecting parameters to achieve a desired level of privacy. At the same time, our work maintains the appealing aspects of noise perturbation methods, which have made them popular both in practice and as a subject of academic research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barbaro, M., Zeller, T.: A face is exposed for aol searcher no. 4417749, New York Times (August 9, 2006)
Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. Springer, Heidelberg (2008)
Aggarwal, C.C., Yu, P.S.: A general survey of privacy-preserving data mining models and algorithms. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. Springer, Heidelberg (2008)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: ACM SIGMOD Conference (2000)
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining. In: ACM PODS Conference (2002)
Muralidhar, K., Sarathy, R.: Security of random data perturbation methods. ACM Trans. Database Syst. 24(4) (1999)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Proceedings of the IEEE Symposium on Research in Security and Privacy (1998)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: IEEE International Conference on Data Engineering, pp. 217–228 (2005)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: ACM SIGMOD (2005)
Aggarwal, C.C.: On randomization, public information, and the curse of dimensionality. In: IEEE International Conference on Data Engineering (2007)
Torra, V., Abowd, J., Domingo-Ferrer, J.: Using mahalanobis distance-based record linkage for disclosure risk assessment. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 233–242. Springer, Heidelberg (2006)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies, pp. 111–133. Elsevier, Amsterdam (2001)
Arikan, E.: An inequality on guessing and its application to sequential decoding. IEEE Transactions on Information Theory 42(1), 99–105 (1996)
Aggarwal, C.C.: On unifying privacy and uncertain data models. In: IEEE International Conference on Data Engineering (2008)
Renyi, A.: On measures of entropy and information. In: 4th Berkeley Symposium on Mathematical Statistics and Probability (1961)
Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. J. Official Statistics 2(3), 329–336 (1986)
Massey, J.L.: Guessing and entropy. In: IEEE Symposium on Information Theory (1994)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 571–588 (2002)
Asuncion, A., Newman, D.: UCI machine learning repository adult dataset (2007)
Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rachlin, Y., Probst, K., Ghani, R. (2009). Maximizing Privacy under Data Distortion Constraints in Noise Perturbation Methods. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds) Privacy, Security, and Trust in KDD. PInKDD 2008. Lecture Notes in Computer Science, vol 5456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01718-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-01718-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01717-9
Online ISBN: 978-3-642-01718-6
eBook Packages: Computer ScienceComputer Science (R0)