Skip to main content

Abstract

Despite enormous benefits and the extremely fast proliferation of data mining in recent years, data owners and researchers alike have acknowledged that data mining also revives old and introduces new threats to individual privacy. Many believe that data mining is, and will continue to be, one of the most significant privacy challenges in years to come.

We live in an information age where vast amounts of personal data are regularly collected in the process of bank transactions, credit-card payments, making phone calls, using reward cards, visiting doctors and renting videos and cars, to mention but a few examples. All these data are typically used for data mining and statistical analysis and are often sold to other companies and organizations.

A breach of privacy occurs when individuals are not aware that the data have been collected in the first place, have been passed onto other companies and organizations, or have been used for purposes other than the one for which they were originally collected. Even when individuals approve of use of their personal records for data mining and statistical analysis, for example in medical research, it is still assumed that only aggregate values will be made available to researchers and that no individual values will be disclosed.

Various techniques can be employed in order to ensure the confidentiality of individual records and other sensitive information. They include adding noise to the original data, so that disclosing perturbed data does not necessarily reveal the confidential individual values. Some techniques were developed specifically for mining vertically and/or horizontally partitioned data. In this scenario each partition belongs to a different party (e.g., a hospital), and no party is willing to share their data but they all have interest in mining the total data set comprising all of the partitions. There are other techniques that focus on protecting confidentiality of logic rules and patterns discovered from data.

In this chapter we introduce the main issues in privacy-preserving data mining, provide a classification of existing techniques and survey the most important results in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D.E.R. Denning. Cryptography and Data Security. Addison-Wesley, 1982.

    Google Scholar 

  2. A. Cavoukian. Data mining: Staking a claim on your privacy. Information and Privacy Commissioner Ontario, pages 1–22, 1998.

    Google Scholar 

  3. O.H. Gandy Jr. and H.I. Schiller. Data mining and surveillance in the post-9.11 environment. In Political Economy section, IAMCR, pages 1–18, Barcelona, July, 2002.

    Google Scholar 

  4. M.J. Culnan. How did they get my name: An exploratory investigation of consumer attitudes towards secondary information use. MIS Quarterly, 17:341–361, 1993.

    Article  Google Scholar 

  5. L. Willenborg and T. de Waal. Statistical Disclosure Control in Practice. Lecture Notes in Statistics. 1996. Springer.

    Google Scholar 

  6. L. Brankovic and V. Estivill-Castro. Privacy issues in knowledge discovery and data mining. In Proc. of Australian Institute of Computer Ethics Conference (AICEC99), pages 89–99, Melbourne, Victoria, Australia, July 1999.

    Google Scholar 

  7. M. Trottini and S.E. Feinberg. Modelling user uncertainty for disclosure risk and data utility. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):511–527, 2002.

    Article  MATH  Google Scholar 

  8. K. Muralidhar and R. Sarathy. Data access, data utility, and disclosure risk are not always mutually exclusive. In NSF Workshop of Confidentiality, Washington, DC, May 2003.

    Google Scholar 

  9. V.S. Verykios, E. Bertino, I. Nai Fovino, L. Parasiliti Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Record, 33(1):50–57, 2004.

    Article  Google Scholar 

  10. W. Du and M.J. Atallah. Secure multiparty computation problems and their applications: A review and open problems. In Proceedings of New Security Paradigms Workshop, pages 11–20, Cloudcroft, New Mexico, USA, September 11–13 2001.

    Google Scholar 

  11. A.C. Yao. Protocols for secure computations. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, 1982.

    Google Scholar 

  12. C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu. Tools for privacy preserving data mining. SIGKDD Explorations, 4(2):28–34, 2002.

    Article  Google Scholar 

  13. B. Gilburd, A. Schuster, and R. Wolff. Privacy-preserving data mining on data grids in the presence of malicious participants. In Proceedings of 13th International Symposium on High-Performance Distributed Computing (HPDC-13 2004), pages 225–234, Honolulu, Hawaii, USA, June 2004.

    Google Scholar 

  14. J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639–644, Edmonton, Alberta, Canada, July 2002.

    Google Scholar 

  15. M. Kantarcioglu and J. Vaidya. Privacy preserving naive bayes classifier for horizontally partitioned data. In Proceedings of IEEE ICDM Workshop on Privacy Preserving Data Mining, pages 3–9, Melbourne, Florida, USA, November 2003.

    Google Scholar 

  16. A. Veloso, W. Meira Jr., S. Parthasarathy, and M. de Carvalho. Efficient, accurate and privacy-preserving data mining for frequent itemsets in distributed databases. In Proceedings of XVIII Simpósio Brasileiro de Bancos de Dados (SBBD), pages 281–292, Manaus, Amazonas, Brasil, 2003.

    Google Scholar 

  17. J. Vaidya and C. Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 2004.

    Google Scholar 

  18. R.N. Wright and Z. Yang. Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 713–718, Seattle, Washington, USA, August 2004.

    Google Scholar 

  19. W. Du and Z. Zhan. Building decision tree classifier on private data. In Workshop on Privacy, Security, and Data Mining at The 2002 IEEE International Conference on Data Mining (ICDM02), Maebashi City, Japan, December 9 2002.

    Google Scholar 

  20. W. Du, Y.S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22–24 2004.

    Google Scholar 

  21. Y. Lindell and B. Pinkas. Privacy preserving data mining. In Proceedings of Advances in Cryptology-CRYPTO 2000, 20th Annual International Cryptology Conference, pages 36–54, Santa Barbara, California, USA, 2000.

    Google Scholar 

  22. J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 206–215, Washington, DC, USA, August 2003.

    Google Scholar 

  23. A.P. Sanil, A.F. Karr, X. Lin, and J.P. Reiter. Privacy preserving regression modelling via distributed computation. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 677–682, Seattle, Washington, USA, August 2004.

    Google Scholar 

  24. J. Vaidya and C. Clifton. Privacy-preserving outlier detection. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), pages 233–240, Brighton, UK, November 2004.

    Google Scholar 

  25. J.L. Lin and M.H. Dunham. Mining association rules: Anti-skew algorithms. In Proc. of 1998 International Conference on Data Engineering, pages 486–493, 1998.

    Google Scholar 

  26. T. Dalenius and S.P. Reiss. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6(1):73–85, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  27. V. Estivill-Castro and L. Brankovic. Data swapping: Balancing privacy against precision in mining for logic rules. In Proc. of Data Warehousing and Knowledge Discovery (DaWaK99), pages 389–398, 1999.

    Google Scholar 

  28. S.R.M. Oliveira and O.R. Zaïane. Algorithms for balancing privacy and knowledge discovery in association rule mining. In Proc. of the 7th International Database Engineering and Applications Symposium (IDEAS03), page 5463, Hong Kong, China, July 2003.

    Google Scholar 

  29. V.S. Verykios, A.K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni. Association rule hiding. IEEE Trans. Knowl. Data Eng., 16(4):434–447, 2004.

    Article  Google Scholar 

  30. J.J. Kim. A method for limiting disclosure in microdata based on random noise and transformation. In American Statistical Association, Proceedings of the Section on Survey Research Methods, pages 303–308, 1986.

    Google Scholar 

  31. P. Tendick and N.S. Matloff. A modified random perturbation method for database security. ACM Trans. Database Syst., 19(1):47–63, 1994.

    Article  Google Scholar 

  32. K. Muralidhar, R. Parsa, and R. Sarathy. A general additive data perturbation method for database security. Management Science, 45(10):1399–1415, 1999.

    Article  Google Scholar 

  33. R.L. Wilson and P.A. Rosen. The impact of data perturbation techniques on data mining accuracy. In Proceedings of the 33rd Annual Meeting of the Decision Sciences Institute, pages 181–185, 2002.

    Google Scholar 

  34. Md.Z. Islam and L. Brankovic. A framework for privacy preserving classification in data mining. In Proceedings of Workshop on Data Mining and Web Intelligence (DMWI2004), pages 163–168, 2004.

    Google Scholar 

  35. Md.Z. Islam and L. Brankovic. Detective: A decision tree based categorical value clustering and perturbation technique in privacy preserving data mining. In Proceedings of the 3rd International IEEE Conference on Industrial Informatics (INDIN 2005), Perth, Australia, 2005.

    Google Scholar 

  36. Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 439–450. ACM Press, May 2000.

    Google Scholar 

  37. D. Agrawal and C.C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the Twentieth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California, USA, May 2001.

    Google Scholar 

  38. H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. Random-data perturbation techniques and privacy-preserving data mining. Knowledge and Information Systems, 7:387–414, 2005.

    Article  Google Scholar 

  39. K. Liu, H. Kargupta, and J. Ryan. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering, 18(1):92–106, 2006.

    Article  Google Scholar 

  40. H. Giggins and L. Brankovic. Protecting privacy in genetic databases. In R. L. May and W. F. Blyth, editors, Proceedings of the Sixth Engineering Mathematics and Applications Conference, pages 73–78, Sydney, Australia, 2003.

    Google Scholar 

  41. Y. Li, S. Zhu, L. Wang, and S. Jajodia. A privacy-enhanced microaggregation method. In Proc. of 2nd International Symposium on Foundations of Information and Knowledge Systems, pages 148–159, 2002.

    Google Scholar 

  42. S.V. Iyengar. Transforming data to satisy privacy constraints. In Proc. of SIGKDD’02, Edmonton, Alberta, Canada, 2002.

    Google Scholar 

  43. A.A. Hintoglu and Y. Saygin. Suppressing microdata to prevent probabilistic classification based inference. In Proceedings of Secure Data Management, Second VLDB Workshop, SDM 2005, pages 155–169, Trondheim, Norway, 2005.

    Google Scholar 

  44. S. Rizvi and J.R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th VLDB Conference, pages 682–693, Hong Kong, China, 2002.

    Google Scholar 

  45. Y. Saygin, V.S. Verykios, and A.K. Elmagarmid. Privacy preserving association rule mining. In RIDE, pages 151–158, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Brankovic, L., Islam, M.Z., Giggins, H. (2007). Privacy-Preserving Data Mining. In: Petković, M., Jonker, W. (eds) Security, Privacy, and Trust in Modern Data Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69861-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69861-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69860-9

  • Online ISBN: 978-3-540-69861-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics