Abstract
Data perturbation is a sanitization method that helps restrict the disclosure of sensitive information from published data. We present an attack on the privacy of the published data that has been sanitized using data perturbation. The attack employs data mining and fusion to remove some noise from the perturbed sensitive values. Our attack is practical – it can be launched by non-expert adversaries having no background knowledge about the perturbed data and no data mining expertise. Moreover, our attack model also allows to consider informed and expert adversaries having background knowledge and/or expertise in data mining and fusion. Extensive experiments were performed on four databases derived from UCI’s Adult and IPUMS census-based data sets sanitized with noise addition that satisfies ε-differential privacy. The experimental results confirm that our attack presents a significant privacy risk to published perturbed data because the majority of the noise can be effectively removed. The results show that a naive adversary is able to remove around 90% of the noise added during perturbation using general-purpose data miners from the Weka software package, and an informed expert adversary is able to remove 91%–99.93% of the added noise. Interestingly, the higher the aimed privacy, the higher the percentage of noise can be removed. This suggests that adding more noise does not always increase the real privacy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abowd, J.M., Vilhuber, L.: How Protective Are Synthetic Data? In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 239–246. Springer, Heidelberg (2008)
Adam, N.A., Wortman, J.C.: Security-control methods for statistical databases. ACM Computing Surveys 21(4), 515–556 (1989)
Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, May 16-18, pp. 439–450. ACM Press, New York (2000)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the 24nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2005, June 13-15, pp. 128–138. ACM Press, Baltimore (2005)
Dalenius, T.: Towards a methodology for statistical disclosure control. Statistisk Tidskrift 15, 429–444 (1977)
Dutta, H., Kargupta, H., Datta, S., Sivakumar, K.: Analysis of privacy preserving random perturbation techniques: further explorations. In: Proceedings of the 2003 ACM Workshop on Privacy in the Electronic Society, WPES 2003, October 30, pp. 31–38. ACM Press, Washington (2003)
Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Dwork, C.: The Differential Privacy Frontier (Extended Abstract). In: Reingold, O. (ed.) TCC 2009. LNCS, vol. 5444, pp. 496–502. Springer, Heidelberg (2009)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.P.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, May 31 - June 2, pp. 381–390. ACM Press, Bethesda (2009)
Ganta, S.R., Acharya, R.: On Breaching Enterprise Data Privacy Through Adversarial Information Fusion. In: Proceedings of the 24th International Conference on Data Engineering Workshops, Workshop on Information Integration Methods, Architectures, and Systems, ICDE-IIMAS 2008, April 7-12, pp. 246–249. IEEE Computer Society Press, Cancun (2005)
Goodman, I.R., Mahler, R.P., Nguyen, H.T.: Mathematics of Data Fusion. Kluwer Academic Publishers, Norwell (1997)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, July 23-26, pp. 279–288. ACM Press, Edmonton (2002)
Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What Can We Learn Privately? In: Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, pp. 531–540. IEEE Computer Society Press, Philadelphia (2008)
Machanavajjhala, A., Kifer, D., Abowd, J.M., Gehrke, J., Vilhuber, L.: Privacy: Theory meets Practice on the Map. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, pp. 277–286. IEEE Computer Society Press, Cancun (2008)
McSherry, F.: Preserving privacy in large-scale data analysis. A presentation at Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006), Stanford, CA, USA, June 21-24 (2006), http://www.stanford.edu/group/mmds/slides/mcsherry-mmds.pdf
Mironov, I., Pandey, O., Reingold, O., Vadhan, S.: Computational Differential Privacy. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 126–142. Springer, Heidelberg (2009)
Muralidhar, K., Sarathy, R.: Differential Privacy for Numeric Data. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Bilbao, Spain (2009)
Muralidhar, K., Sarathy, R.: Does Differential Privacy Protect Terry Gross” Privacy? In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 200–209. Springer, Heidelberg (2010)
Sarathy, R., Muralidhar, K.: Some Additional Insights on Applying Differential Privacy for Numeric Data. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 210–219. Springer, Heidelberg (2010)
Sramka, M.: A Privacy Attack That Removes the Majority of the Noise From Perturbed Data. In: Proceedings of the 2010 International Joint Conference on Neural Networks, IJCNN 2010, as part of the 2010 IEEE World Congress on Computational Intelligence, WCCI 2010, July 18-23. IEEE Computer Society Press, Barcelona (2010)
Sramka, M.: Data mining as a tool in privacy-preserving data publishing. Tatra Mountains Mathematical Publications 45, 151–159 (2010)
Sramka, M., Safavi-Naini, R., Denzinger, J.: An Attack on the Privacy of Sanitized Data That Fuses the Outputs of Multiple Data Miners. In: Proceedings of the 9th IEEE International Conference on Data Mining Workshops, International workshop on Privacy Aspects of Data Mining, ICDM-PADM 2009, December 6, pp. 130–137. IEEE Computer Society Press, Miami Beach (2009)
Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M.: A Practice-oriented Framework for Measuring Privacy and Utility in Data Sanitization Systems. In: Proceedings of the 12th International Conference on Extending Database Technology Workshops, the 3rd International Workshop on Privacy and Anonymity in the Information Society, EDBT-PAIS 2010, March 22-26. ACM Press, Lausanne (2010)
Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M., Gao, J.: Utility of Knowledge Extracted from Unsanitized Data when Applied to Sanitized Data. In: Proceedings of the 6th Annual Conference on Privacy, Security and Trust, PST 2008, October 1-3, pp. 227–231. IEEE Computer Society Press, Fredericton (2008)
Torra, V. (ed.): Information Fusion in Data Mining. Studies in Fuzziness and Soft Computing, vol. 123. Springer, Heidelberg (2003)
Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. In: Cognitive Technologies. Springer, Heidelberg (2007)
Valls, A., Torra, V., Domingo-Ferrer, J.: Semantic based aggregation for statistical disclosure control. International Journal of Intelligent Systems 18(9), 393–951 (2003)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sramka, M. (2012). Breaching Privacy Using Data Mining: Removing Noise from Perturbed Data. In: Elizondo, D., Solanas, A., Martinez-Balleste, A. (eds) Computational Intelligence for Privacy and Security. Studies in Computational Intelligence, vol 394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25237-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-25237-2_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25236-5
Online ISBN: 978-3-642-25237-2
eBook Packages: EngineeringEngineering (R0)