Skip to main content
Log in

Geometric data perturbation for privacy preserving outsourced data mining

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Proceedings of international conference on extending database technology (EDBT), vol 2992. Springer, Heraklion, pp 183–199

  2. Agrawal D, Aggarwal CC (2002) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of ACM conference on principles of database systems (PODS). ACM, Madison

  3. Agrawal R, Srikant R, (2000) Privacy-preserving data mining. In: Proceedings of ACM SIGMOD conference. ACM, Dallas

  4. Amazon (n.d.) Applications hosted on amazon clouds. http://aws.amazon.com/solutions/case-studies/

  5. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2009) Above the clouds: a berkeley view of cloud computing. Technical report, University of Berkerley

  6. Bhatia R (1997) Matrix analysis. Springer, Berlin

    Book  Google Scholar 

  7. Bruening PJ, Treacy BC (2009) Privacy, security issues raised by cloud computing. BNA Privacy Security Law Report 8(10)

  8. Chen K, Liu L (2005) A random rotation perturbation approach to privacy preserving data classification In: Proceedings of international conference on data mining (ICDM). IEEE, Houston

  9. Chen K, Liu L, Sun G (2007) Towards attack-resilient geometric data perturbation. In: SIAM data mining conference

  10. Clifton C (2003) Tutorial: Privacy-preserving data mining, In Proceedings of ACM SIGKDD Conference

  11. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  12. Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of ACM conference on principles of database systems (PODS)

  13. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of ACM SIGKDD conference

  14. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232

    Article  MATH  Google Scholar 

  15. Fung BC, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey on recent developments. ACM Comput Surv 42(4): 1–53

    Article  Google Scholar 

  16. Gallier J (2000) Methods and applications for computer science and engineering. Springer, New York

    Google Scholar 

  17. Google (n.d.) Google appengine gallery. http://appgallery.appspot.com/

  18. Guo S, Wu X (2007) Deriving private information from arbitrarily projected data. In: Proceedings of the 11th European conference on principles and practice of knowledge Discovery in databases (PKDD07). Warsaw, Poland

  19. Guo S, Wu X, Li Y (2008) Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining. Knowl Inform Syst 17(2): 217–240

    Article  MathSciNet  Google Scholar 

  20. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin

    MATH  Google Scholar 

  21. Huang Z, Du W, Chen B, (2005) Deriving private information from randomized data. In: Proceedings of ACM SIGMOD conference

  22. Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, London

    Book  Google Scholar 

  23. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31: 264–323

    Article  Google Scholar 

  24. Jiang T (2006) How many entries in a typical orthogonal matrix can be approximated by independent normals. Ann Prob 34(4): 1497–1529

    Article  MATH  Google Scholar 

  25. Johnson WB, Lindenstrauss J (1984) Extensions of lipshitz mapping into hilbert space. Contemp Math 26: 189–206

    MathSciNet  MATH  Google Scholar 

  26. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of international conference on data mining (ICDM)

  27. LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrain multidimensional k-anonymity. In: Proceedings of IEEE international conference on data engineering (ICDE)

  28. Lehmann EL, Casella G (1998) Theory of point estimation. Springer, Berlin

    MATH  Google Scholar 

  29. Lindell Y, Pinkas B (2000) Privacy preserving data mining. J Cryptol 15(3): 177–206

    Article  MathSciNet  Google Scholar 

  30. Liu K, Giannella C, Kargupta H (2006) An attacker’s view of distance preserving maps for privacy preserving data mining In: European conference on principles and practice of knowledge discovery in databases (PKDD). Berlin, Germany

  31. Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1): 92–106

    Article  Google Scholar 

  32. Luo H, Fan J, Lin X, Zhou A, Bertino E (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inform Syst 20(2): 157–185

    Article  Google Scholar 

  33. McLachlan G, Peel D (2000) Finite mixture models. Wiley, London

    Book  MATH  Google Scholar 

  34. Oliveira SRM, Zaïane OR (2004) Privacy preservation when sharing data for clustering. In: Proceedings of the international workshop on secure data management in a connected world. Toronto, Canada, pp 67–82

  35. Oliveira SR, Zaiane OR (2010) Privacy preserving clustering by data transformation. J Inform Data Manag (JIDM) 1(1): 67–82

    Google Scholar 

  36. Sadun L (2001) Applied linear algebra: the decoupling principle. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  37. Stewart G (1980) The efficient generation of random orthogonal matrices with an application to condition estimation. SIAM J Num Anal 17: 403–409

    Article  MATH  Google Scholar 

  38. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncert Fuzz Knowl Based Syst 10(5): 557–570

    Article  MathSciNet  MATH  Google Scholar 

  39. Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inform Syst 19(2): 133–157

    Article  Google Scholar 

  40. Vaidya J, Clifton C (2003) Privacy preserving k-means clustering over vertically partitioned data. In: Proceedings of ACM SIGKDD conference

  41. Vempala SS (2005) The random projection method. American Mathematical Society, Providence

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keke Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, K., Liu, L. Geometric data perturbation for privacy preserving outsourced data mining. Knowl Inf Syst 29, 657–695 (2011). https://doi.org/10.1007/s10115-010-0362-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0362-4

Keywords

Navigation