Breaching Privacy Using Data Mining: Removing Noise from Perturbed Data

Sramka, Michal

doi:10.1007/978-3-642-25237-2_9

Michal Sramka^4,5

Part of the book series: Studies in Computational Intelligence ((SCI,volume 394))

849 Accesses
2 Citations

Abstract

Data perturbation is a sanitization method that helps restrict the disclosure of sensitive information from published data. We present an attack on the privacy of the published data that has been sanitized using data perturbation. The attack employs data mining and fusion to remove some noise from the perturbed sensitive values. Our attack is practical – it can be launched by non-expert adversaries having no background knowledge about the perturbed data and no data mining expertise. Moreover, our attack model also allows to consider informed and expert adversaries having background knowledge and/or expertise in data mining and fusion. Extensive experiments were performed on four databases derived from UCI’s Adult and IPUMS census-based data sets sanitized with noise addition that satisfies ε-differential privacy. The experimental results confirm that our attack presents a significant privacy risk to published perturbed data because the majority of the noise can be effectively removed. The results show that a naive adversary is able to remove around 90% of the noise added during perturbation using general-purpose data miners from the Weka software package, and an informed expert adversary is able to remove 91%–99.93% of the added noise. Interestingly, the higher the aimed privacy, the higher the percentage of noise can be removed. This suggests that adding more noise does not always increase the real privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abowd, J.M., Vilhuber, L.: How Protective Are Synthetic Data? In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 239–246. Springer, Heidelberg (2008)
Chapter Google Scholar
Adam, N.A., Wortman, J.C.: Security-control methods for statistical databases. ACM Computing Surveys 21(4), 515–556 (1989)
Article Google Scholar
Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, May 16-18, pp. 439–450. ACM Press, New York (2000)
Chapter Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008)
Article Google Scholar
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the 24nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2005, June 13-15, pp. 128–138. ACM Press, Baltimore (2005)
Chapter Google Scholar
Dalenius, T.: Towards a methodology for statistical disclosure control. Statistisk Tidskrift 15, 429–444 (1977)
Google Scholar
Dutta, H., Kargupta, H., Datta, S., Sivakumar, K.: Analysis of privacy preserving random perturbation techniques: further explorations. In: Proceedings of the 2003 ACM Workshop on Privacy in the Electronic Society, WPES 2003, October 30, pp. 31–38. ACM Press, Washington (2003)
Chapter Google Scholar
Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Chapter Google Scholar
Dwork, C.: The Differential Privacy Frontier (Extended Abstract). In: Reingold, O. (ed.) TCC 2009. LNCS, vol. 5444, pp. 496–502. Springer, Heidelberg (2009)
Chapter Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Chapter Google Scholar
Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.P.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, May 31 - June 2, pp. 381–390. ACM Press, Bethesda (2009)
Chapter Google Scholar
Ganta, S.R., Acharya, R.: On Breaching Enterprise Data Privacy Through Adversarial Information Fusion. In: Proceedings of the 24th International Conference on Data Engineering Workshops, Workshop on Information Integration Methods, Architectures, and Systems, ICDE-IIMAS 2008, April 7-12, pp. 246–249. IEEE Computer Society Press, Cancun (2005)
Google Scholar
Goodman, I.R., Mahler, R.P., Nguyen, H.T.: Mathematics of Data Fusion. Kluwer Academic Publishers, Norwell (1997)
MATH Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, July 23-26, pp. 279–288. ACM Press, Edmonton (2002)
Chapter Google Scholar
Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What Can We Learn Privately? In: Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, pp. 531–540. IEEE Computer Society Press, Philadelphia (2008)
Chapter Google Scholar
Machanavajjhala, A., Kifer, D., Abowd, J.M., Gehrke, J., Vilhuber, L.: Privacy: Theory meets Practice on the Map. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, pp. 277–286. IEEE Computer Society Press, Cancun (2008)
Chapter Google Scholar
McSherry, F.: Preserving privacy in large-scale data analysis. A presentation at Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006), Stanford, CA, USA, June 21-24 (2006), http://www.stanford.edu/group/mmds/slides/mcsherry-mmds.pdf
Mironov, I., Pandey, O., Reingold, O., Vadhan, S.: Computational Differential Privacy. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 126–142. Springer, Heidelberg (2009)
Chapter Google Scholar
Muralidhar, K., Sarathy, R.: Differential Privacy for Numeric Data. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Bilbao, Spain (2009)
Google Scholar
Muralidhar, K., Sarathy, R.: Does Differential Privacy Protect Terry Gross” Privacy? In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 200–209. Springer, Heidelberg (2010)
Chapter Google Scholar
Sarathy, R., Muralidhar, K.: Some Additional Insights on Applying Differential Privacy for Numeric Data. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 210–219. Springer, Heidelberg (2010)
Chapter Google Scholar
Sramka, M.: A Privacy Attack That Removes the Majority of the Noise From Perturbed Data. In: Proceedings of the 2010 International Joint Conference on Neural Networks, IJCNN 2010, as part of the 2010 IEEE World Congress on Computational Intelligence, WCCI 2010, July 18-23. IEEE Computer Society Press, Barcelona (2010)
Google Scholar
Sramka, M.: Data mining as a tool in privacy-preserving data publishing. Tatra Mountains Mathematical Publications 45, 151–159 (2010)
MATH MathSciNet Google Scholar
Sramka, M., Safavi-Naini, R., Denzinger, J.: An Attack on the Privacy of Sanitized Data That Fuses the Outputs of Multiple Data Miners. In: Proceedings of the 9th IEEE International Conference on Data Mining Workshops, International workshop on Privacy Aspects of Data Mining, ICDM-PADM 2009, December 6, pp. 130–137. IEEE Computer Society Press, Miami Beach (2009)
Google Scholar
Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M.: A Practice-oriented Framework for Measuring Privacy and Utility in Data Sanitization Systems. In: Proceedings of the 12th International Conference on Extending Database Technology Workshops, the 3rd International Workshop on Privacy and Anonymity in the Information Society, EDBT-PAIS 2010, March 22-26. ACM Press, Lausanne (2010)
Google Scholar
Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M., Gao, J.: Utility of Knowledge Extracted from Unsanitized Data when Applied to Sanitized Data. In: Proceedings of the 6th Annual Conference on Privacy, Security and Trust, PST 2008, October 1-3, pp. 227–231. IEEE Computer Society Press, Fredericton (2008)
Chapter Google Scholar
Torra, V. (ed.): Information Fusion in Data Mining. Studies in Fuzziness and Soft Computing, vol. 123. Springer, Heidelberg (2003)
Google Scholar
Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. In: Cognitive Technologies. Springer, Heidelberg (2007)
Google Scholar
Valls, A., Torra, V., Domingo-Ferrer, J.: Semantic based aggregation for statistical disclosure control. International Journal of Intelligent Systems 18(9), 393–951 (2003)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

UNESCO Chair in Data Privacy, Department of Computer Engineering and Maths, Universitat Rovira i Virgili, Av. Paisos Catalans 26, 43007, Tarragona, Spain
Michal Sramka
Department of Applied Informatics, Faculty of Electrical Engineering and Information Technology, Slovak University of Technology, Ilkovicova 3, 81219, Bratislava, Slovakia
Michal Sramka

Authors

Michal Sramka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michal Sramka .

Editor information

Editors and Affiliations

, The Gateway, De Montfort University, Leicester, LE1 9BH, United Kingdom
David A. Elizondo
, Department of Computer Engineering and, Universitat Rovira i Virgili, Av. Paisos Catalans 26, Tarragona, 43007, Spain
Agusti Solanas
, Department of Computer Engineering and, Universitat Rovira i Virgili, Av. Paisos Catalans 26, Tarragona, 43007, Spain
Antoni Martinez-Balleste

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sramka, M. (2012). Breaching Privacy Using Data Mining: Removing Noise from Perturbed Data. In: Elizondo, D., Solanas, A., Martinez-Balleste, A. (eds) Computational Intelligence for Privacy and Security. Studies in Computational Intelligence, vol 394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25237-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-25237-2_9
Published: 10 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25236-5
Online ISBN: 978-3-642-25237-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics