Skip to main content

Breaching Privacy Using Data Mining: Removing Noise from Perturbed Data

  • Chapter
  • First Online:
Computational Intelligence for Privacy and Security

Part of the book series: Studies in Computational Intelligence ((SCI,volume 394))

Abstract

Data perturbation is a sanitization method that helps restrict the disclosure of sensitive information from published data. We present an attack on the privacy of the published data that has been sanitized using data perturbation. The attack employs data mining and fusion to remove some noise from the perturbed sensitive values. Our attack is practical – it can be launched by non-expert adversaries having no background knowledge about the perturbed data and no data mining expertise. Moreover, our attack model also allows to consider informed and expert adversaries having background knowledge and/or expertise in data mining and fusion. Extensive experiments were performed on four databases derived from UCI’s Adult and IPUMS census-based data sets sanitized with noise addition that satisfies ε-differential privacy. The experimental results confirm that our attack presents a significant privacy risk to published perturbed data because the majority of the noise can be effectively removed. The results show that a naive adversary is able to remove around 90% of the noise added during perturbation using general-purpose data miners from the Weka software package, and an informed expert adversary is able to remove 91%–99.93% of the added noise. Interestingly, the higher the aimed privacy, the higher the percentage of noise can be removed. This suggests that adding more noise does not always increase the real privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abowd, J.M., Vilhuber, L.: How Protective Are Synthetic Data? In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 239–246. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Adam, N.A., Wortman, J.C.: Security-control methods for statistical databases. ACM Computing Surveys 21(4), 515–556 (1989)

    Article  Google Scholar 

  3. Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, May 16-18, pp. 439–450. ACM Press, New York (2000)

    Chapter  Google Scholar 

  4. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  5. Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008)

    Article  Google Scholar 

  6. Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the 24nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2005, June 13-15, pp. 128–138. ACM Press, Baltimore (2005)

    Chapter  Google Scholar 

  7. Dalenius, T.: Towards a methodology for statistical disclosure control. Statistisk Tidskrift 15, 429–444 (1977)

    Google Scholar 

  8. Dutta, H., Kargupta, H., Datta, S., Sivakumar, K.: Analysis of privacy preserving random perturbation techniques: further explorations. In: Proceedings of the 2003 ACM Workshop on Privacy in the Electronic Society, WPES 2003, October 30, pp. 31–38. ACM Press, Washington (2003)

    Chapter  Google Scholar 

  9. Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Dwork, C.: The Differential Privacy Frontier (Extended Abstract). In: Reingold, O. (ed.) TCC 2009. LNCS, vol. 5444, pp. 496–502. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.P.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, May 31 - June 2, pp. 381–390. ACM Press, Bethesda (2009)

    Chapter  Google Scholar 

  14. Ganta, S.R., Acharya, R.: On Breaching Enterprise Data Privacy Through Adversarial Information Fusion. In: Proceedings of the 24th International Conference on Data Engineering Workshops, Workshop on Information Integration Methods, Architectures, and Systems, ICDE-IIMAS 2008, April 7-12, pp. 246–249. IEEE Computer Society Press, Cancun (2005)

    Google Scholar 

  15. Goodman, I.R., Mahler, R.P., Nguyen, H.T.: Mathematics of Data Fusion. Kluwer Academic Publishers, Norwell (1997)

    MATH  Google Scholar 

  16. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, July 23-26, pp. 279–288. ACM Press, Edmonton (2002)

    Chapter  Google Scholar 

  17. Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What Can We Learn Privately? In: Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, pp. 531–540. IEEE Computer Society Press, Philadelphia (2008)

    Chapter  Google Scholar 

  18. Machanavajjhala, A., Kifer, D., Abowd, J.M., Gehrke, J., Vilhuber, L.: Privacy: Theory meets Practice on the Map. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, pp. 277–286. IEEE Computer Society Press, Cancun (2008)

    Chapter  Google Scholar 

  19. McSherry, F.: Preserving privacy in large-scale data analysis. A presentation at Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006), Stanford, CA, USA, June 21-24 (2006), http://www.stanford.edu/group/mmds/slides/mcsherry-mmds.pdf

  20. Mironov, I., Pandey, O., Reingold, O., Vadhan, S.: Computational Differential Privacy. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 126–142. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Muralidhar, K., Sarathy, R.: Differential Privacy for Numeric Data. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Bilbao, Spain (2009)

    Google Scholar 

  22. Muralidhar, K., Sarathy, R.: Does Differential Privacy Protect Terry Gross” Privacy? In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 200–209. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  23. Sarathy, R., Muralidhar, K.: Some Additional Insights on Applying Differential Privacy for Numeric Data. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 210–219. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Sramka, M.: A Privacy Attack That Removes the Majority of the Noise From Perturbed Data. In: Proceedings of the 2010 International Joint Conference on Neural Networks, IJCNN 2010, as part of the 2010 IEEE World Congress on Computational Intelligence, WCCI 2010, July 18-23. IEEE Computer Society Press, Barcelona (2010)

    Google Scholar 

  25. Sramka, M.: Data mining as a tool in privacy-preserving data publishing. Tatra Mountains Mathematical Publications 45, 151–159 (2010)

    MATH  MathSciNet  Google Scholar 

  26. Sramka, M., Safavi-Naini, R., Denzinger, J.: An Attack on the Privacy of Sanitized Data That Fuses the Outputs of Multiple Data Miners. In: Proceedings of the 9th IEEE International Conference on Data Mining Workshops, International workshop on Privacy Aspects of Data Mining, ICDM-PADM 2009, December 6, pp. 130–137. IEEE Computer Society Press, Miami Beach (2009)

    Google Scholar 

  27. Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M.: A Practice-oriented Framework for Measuring Privacy and Utility in Data Sanitization Systems. In: Proceedings of the 12th International Conference on Extending Database Technology Workshops, the 3rd International Workshop on Privacy and Anonymity in the Information Society, EDBT-PAIS 2010, March 22-26. ACM Press, Lausanne (2010)

    Google Scholar 

  28. Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M., Gao, J.: Utility of Knowledge Extracted from Unsanitized Data when Applied to Sanitized Data. In: Proceedings of the 6th Annual Conference on Privacy, Security and Trust, PST 2008, October 1-3, pp. 227–231. IEEE Computer Society Press, Fredericton (2008)

    Chapter  Google Scholar 

  29. Torra, V. (ed.): Information Fusion in Data Mining. Studies in Fuzziness and Soft Computing, vol. 123. Springer, Heidelberg (2003)

    Google Scholar 

  30. Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. In: Cognitive Technologies. Springer, Heidelberg (2007)

    Google Scholar 

  31. Valls, A., Torra, V., Domingo-Ferrer, J.: Semantic based aggregation for statistical disclosure control. International Journal of Intelligent Systems 18(9), 393–951 (2003)

    Article  Google Scholar 

  32. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michal Sramka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sramka, M. (2012). Breaching Privacy Using Data Mining: Removing Noise from Perturbed Data. In: Elizondo, D., Solanas, A., Martinez-Balleste, A. (eds) Computational Intelligence for Privacy and Security. Studies in Computational Intelligence, vol 394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25237-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25237-2_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25236-5

  • Online ISBN: 978-3-642-25237-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics