Geometric data perturbation for privacy preserving outsourced data mining

Chen, Keke; Liu, Ling

doi:10.1007/s10115-010-0362-4

Geometric data perturbation for privacy preserving outsourced data mining

Regular Paper
Published: 23 November 2010

Volume 29, pages 657–695, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Keke Chen¹ &
Ling Liu²

642 Accesses
65 Citations
6 Altmetric
Explore all metrics

Abstract

Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Proceedings of international conference on extending database technology (EDBT), vol 2992. Springer, Heraklion, pp 183–199
Agrawal D, Aggarwal CC (2002) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of ACM conference on principles of database systems (PODS). ACM, Madison
Agrawal R, Srikant R, (2000) Privacy-preserving data mining. In: Proceedings of ACM SIGMOD conference. ACM, Dallas
Amazon (n.d.) Applications hosted on amazon clouds. http://aws.amazon.com/solutions/case-studies/
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2009) Above the clouds: a berkeley view of cloud computing. Technical report, University of Berkerley
Bhatia R (1997) Matrix analysis. Springer, Berlin
Book Google Scholar
Bruening PJ, Treacy BC (2009) Privacy, security issues raised by cloud computing. BNA Privacy Security Law Report 8(10)
Chen K, Liu L (2005) A random rotation perturbation approach to privacy preserving data classification In: Proceedings of international conference on data mining (ICDM). IEEE, Houston
Chen K, Liu L, Sun G (2007) Towards attack-resilient geometric data perturbation. In: SIAM data mining conference
Clifton C (2003) Tutorial: Privacy-preserving data mining, In Proceedings of ACM SIGKDD Conference
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Google Scholar
Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of ACM conference on principles of database systems (PODS)
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of ACM SIGKDD conference
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232
Article MATH Google Scholar
Fung BC, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey on recent developments. ACM Comput Surv 42(4): 1–53
Article Google Scholar
Gallier J (2000) Methods and applications for computer science and engineering. Springer, New York
Google Scholar
Google (n.d.) Google appengine gallery. http://appgallery.appspot.com/
Guo S, Wu X (2007) Deriving private information from arbitrarily projected data. In: Proceedings of the 11th European conference on principles and practice of knowledge Discovery in databases (PKDD07). Warsaw, Poland
Guo S, Wu X, Li Y (2008) Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining. Knowl Inform Syst 17(2): 217–240
Article MathSciNet Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin
MATH Google Scholar
Huang Z, Du W, Chen B, (2005) Deriving private information from randomized data. In: Proceedings of ACM SIGMOD conference
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, London
Book Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31: 264–323
Article Google Scholar
Jiang T (2006) How many entries in a typical orthogonal matrix can be approximated by independent normals. Ann Prob 34(4): 1497–1529
Article MATH Google Scholar
Johnson WB, Lindenstrauss J (1984) Extensions of lipshitz mapping into hilbert space. Contemp Math 26: 189–206
MathSciNet MATH Google Scholar
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of international conference on data mining (ICDM)
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrain multidimensional k-anonymity. In: Proceedings of IEEE international conference on data engineering (ICDE)
Lehmann EL, Casella G (1998) Theory of point estimation. Springer, Berlin
MATH Google Scholar
Lindell Y, Pinkas B (2000) Privacy preserving data mining. J Cryptol 15(3): 177–206
Article MathSciNet Google Scholar
Liu K, Giannella C, Kargupta H (2006) An attacker’s view of distance preserving maps for privacy preserving data mining In: European conference on principles and practice of knowledge discovery in databases (PKDD). Berlin, Germany
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1): 92–106
Article Google Scholar
Luo H, Fan J, Lin X, Zhou A, Bertino E (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inform Syst 20(2): 157–185
Article Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, London
Book MATH Google Scholar
Oliveira SRM, Zaïane OR (2004) Privacy preservation when sharing data for clustering. In: Proceedings of the international workshop on secure data management in a connected world. Toronto, Canada, pp 67–82
Oliveira SR, Zaiane OR (2010) Privacy preserving clustering by data transformation. J Inform Data Manag (JIDM) 1(1): 67–82
Google Scholar
Sadun L (2001) Applied linear algebra: the decoupling principle. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Stewart G (1980) The efficient generation of random orthogonal matrices with an application to condition estimation. SIAM J Num Anal 17: 403–409
Article MATH Google Scholar
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncert Fuzz Knowl Based Syst 10(5): 557–570
Article MathSciNet MATH Google Scholar
Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inform Syst 19(2): 133–157
Article Google Scholar
Vaidya J, Clifton C (2003) Privacy preserving k-means clustering over vertically partitioned data. In: Proceedings of ACM SIGKDD conference
Vempala SS (2005) The random projection method. American Mathematical Society, Providence
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Wright State University, Dayton, OH, 45435, USA
Keke Chen
College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Ling Liu

Authors

Keke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ling Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keke Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, K., Liu, L. Geometric data perturbation for privacy preserving outsourced data mining. Knowl Inf Syst 29, 657–695 (2011). https://doi.org/10.1007/s10115-010-0362-4

Download citation

Received: 10 March 2010
Revised: 03 October 2010
Accepted: 04 November 2010
Published: 23 November 2010
Issue Date: December 2011
DOI: https://doi.org/10.1007/s10115-010-0362-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometric data perturbation for privacy preserving outsourced data mining

Abstract

Access this article

Similar content being viewed by others

Big data in healthcare: management, analysis and future prospects

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Trends and Future Perspective Challenges in Big Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geometric data perturbation for privacy preserving outsourced data mining

Abstract

Access this article

Similar content being viewed by others

Big data in healthcare: management, analysis and future prospects

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Trends and Future Perspective Challenges in Big Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation