Abstract
Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Proceedings of international conference on extending database technology (EDBT), vol 2992. Springer, Heraklion, pp 183–199
Agrawal D, Aggarwal CC (2002) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of ACM conference on principles of database systems (PODS). ACM, Madison
Agrawal R, Srikant R, (2000) Privacy-preserving data mining. In: Proceedings of ACM SIGMOD conference. ACM, Dallas
Amazon (n.d.) Applications hosted on amazon clouds. http://aws.amazon.com/solutions/case-studies/
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2009) Above the clouds: a berkeley view of cloud computing. Technical report, University of Berkerley
Bhatia R (1997) Matrix analysis. Springer, Berlin
Bruening PJ, Treacy BC (2009) Privacy, security issues raised by cloud computing. BNA Privacy Security Law Report 8(10)
Chen K, Liu L (2005) A random rotation perturbation approach to privacy preserving data classification In: Proceedings of international conference on data mining (ICDM). IEEE, Houston
Chen K, Liu L, Sun G (2007) Towards attack-resilient geometric data perturbation. In: SIAM data mining conference
Clifton C (2003) Tutorial: Privacy-preserving data mining, In Proceedings of ACM SIGKDD Conference
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of ACM conference on principles of database systems (PODS)
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of ACM SIGKDD conference
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232
Fung BC, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey on recent developments. ACM Comput Surv 42(4): 1–53
Gallier J (2000) Methods and applications for computer science and engineering. Springer, New York
Google (n.d.) Google appengine gallery. http://appgallery.appspot.com/
Guo S, Wu X (2007) Deriving private information from arbitrarily projected data. In: Proceedings of the 11th European conference on principles and practice of knowledge Discovery in databases (PKDD07). Warsaw, Poland
Guo S, Wu X, Li Y (2008) Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining. Knowl Inform Syst 17(2): 217–240
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin
Huang Z, Du W, Chen B, (2005) Deriving private information from randomized data. In: Proceedings of ACM SIGMOD conference
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, London
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31: 264–323
Jiang T (2006) How many entries in a typical orthogonal matrix can be approximated by independent normals. Ann Prob 34(4): 1497–1529
Johnson WB, Lindenstrauss J (1984) Extensions of lipshitz mapping into hilbert space. Contemp Math 26: 189–206
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of international conference on data mining (ICDM)
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrain multidimensional k-anonymity. In: Proceedings of IEEE international conference on data engineering (ICDE)
Lehmann EL, Casella G (1998) Theory of point estimation. Springer, Berlin
Lindell Y, Pinkas B (2000) Privacy preserving data mining. J Cryptol 15(3): 177–206
Liu K, Giannella C, Kargupta H (2006) An attacker’s view of distance preserving maps for privacy preserving data mining In: European conference on principles and practice of knowledge discovery in databases (PKDD). Berlin, Germany
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1): 92–106
Luo H, Fan J, Lin X, Zhou A, Bertino E (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inform Syst 20(2): 157–185
McLachlan G, Peel D (2000) Finite mixture models. Wiley, London
Oliveira SRM, Zaïane OR (2004) Privacy preservation when sharing data for clustering. In: Proceedings of the international workshop on secure data management in a connected world. Toronto, Canada, pp 67–82
Oliveira SR, Zaiane OR (2010) Privacy preserving clustering by data transformation. J Inform Data Manag (JIDM) 1(1): 67–82
Sadun L (2001) Applied linear algebra: the decoupling principle. Prentice Hall, Englewood Cliffs
Stewart G (1980) The efficient generation of random orthogonal matrices with an application to condition estimation. SIAM J Num Anal 17: 403–409
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncert Fuzz Knowl Based Syst 10(5): 557–570
Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inform Syst 19(2): 133–157
Vaidya J, Clifton C (2003) Privacy preserving k-means clustering over vertically partitioned data. In: Proceedings of ACM SIGKDD conference
Vempala SS (2005) The random projection method. American Mathematical Society, Providence
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, K., Liu, L. Geometric data perturbation for privacy preserving outsourced data mining. Knowl Inf Syst 29, 657–695 (2011). https://doi.org/10.1007/s10115-010-0362-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0362-4