Skip to main content

A Condensation Approach to Privacy Preserving Data Mining

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2992))

Abstract

In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy preserving data mining of multi-dimensional data. Previous work for privacy preserving data mining uses a perturbation approach which reconstructs data distributions in order to perform the mining. Such an approach treats each dimension independently and therefore ignores the correlations between the different dimensions. In addition, it requires the development of a new distribution based algorithm for each data mining problem, since it does not use the multi-dimensional records, but uses aggregate distributions of the data as input. This leads to a fundamental re-design of data mining algorithms. In this paper, we will develop a new and flexible approach for privacy preserving data mining which does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set. This anonymized data closely matches the characteristics of the original data including the correlations among the different dimensions. We present empirical results illustrating the effectiveness of the method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Privacy Preserving Data Mining. In: Proceedings of the ACM SIGMOD Conference (2000)

    Google Scholar 

  2. Agrawal, D., Aggarwal, C.C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: ACM PODS Conference (2002)

    Google Scholar 

  3. Truste, B.P.: An online privacy seal program. Communications of the ACM 42(2), 56–59 (1999)

    Article  Google Scholar 

  4. Clifton, C., Marks, D.: Security and Privacy Implications of Data Mining. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 15–19 (1996)

    Google Scholar 

  5. Clifton, C., Kantarcioglu, M., Vaidya, J.: Defining Privacy for Data Mining. In: National Science Foundation Workshop on Next Generation Data Mining, pp. 126–133 (2002)

    Google Scholar 

  6. Vaidya, J., Clifton, C.: Privacy Preserving Association Rule Mining in Vertically Partitioned Data. In: ACM KDD Conference (2002)

    Google Scholar 

  7. Cover, T., Thomas, J.: Elements of Information Theory. John Wiley & Sons, Inc., New York (1991)

    Book  MATH  Google Scholar 

  8. Estivill-Castro, V., Brankovic, L.: Data Swapping: Balancing privacy against precision in mining for logic rules. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 389–398. Springer, Heidelberg (1999)

    Google Scholar 

  9. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy Preserving Mining Of Association Rules. In: ACM KDD Conference (2002)

    Google Scholar 

  10. Hinneburg, D.A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: ACM KDD Conference (1998)

    Google Scholar 

  11. Iyengar, V.S.: Transforming Data To Satisfy Privacy Constraints. In: ACM KDD Conference (2002)

    Google Scholar 

  12. Liew, C.K., Choi, U.J., Liew, C.J.: A data distortion by probability distribution. ACM TODS Journal 10(3), 395–411 (1985)

    Article  MATH  Google Scholar 

  13. Lau, T., Etzioni, O., Weld, D.S.: Privacy Interfaces for Information Management. Communications of the ACM 42(10), 89–94 (1999)

    Article  Google Scholar 

  14. Murthy, S.: Automatic Construction of Decision Trees from Data: A Multi- Disciplinary Survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)

    Article  MathSciNet  Google Scholar 

  15. Moore Jr., R.A.: Controlled Data-Swapping Techniques for Masking Public Use Microdata Sets. Statistical Research Division Report Series, RR 96-04, US Bureau of the Census, Washington D.C. (1996)

    Google Scholar 

  16. Rizvi, S., Haritsa, J.: Maintaining Data Privacy in Association Rule Mining. In: VLDB Conference (2002)

    Google Scholar 

  17. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986)

    MATH  Google Scholar 

  18. Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k- Anonymity and its Enforcement Through Generalization and Suppression. In: Proceedings of the IEEE Symposium on Research in Security and Privacy (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aggarwal, C.C., Yu, P.S. (2004). A Condensation Approach to Privacy Preserving Data Mining. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24741-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21200-3

  • Online ISBN: 978-3-540-24741-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics