Abstract
One of the fundamental challenges that the data mining community faces today is privacy. The question “How are we going to do data mining without violating the privacy of individuals?” is still on the table, and research is being conducted to find efficient methods to do that. Data transformation was previously proposed as one efficient method for privacy preserving data mining when a party needs to out-source the data mining task, or when distributed data mining needs to be performed among multiple parties without each party disclosing its actual data. In this paper we study the safety of distance preserving data transformations proposed for privacy preserving data mining. We show that an adversary can recover the original data values with very high confidence via knowledge of mutual distances between data objects together with the probability distribution from which they are drawn. Experiments conducted on real and synthetic data sets demonstrate the effectiveness of the theoretical results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C.: On randomization, public information and the curse of dimensionality. In: ICDE, pp. 136–145 (2007)
Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: EDBT, pp. 183–199 (2004)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, 2000, pp. 439–450. ACM, New York (2000)
Chen, K.: Geometric Methods for Mining Large and Possibly Private Datasets. PhD thesis, Georgia Institute of Technology (2006)
Chen, K., Sun, G., Liu, L.: Towards attack-resilient geometric data perturbation. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 78–89 (2007)
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004)
Liu, K., Giannella, C., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)
Mann, C.C.: Homeland insecurity. The Atlantic Monthly 290(2) (2002)
Muralidhar, K., Sarathy, R.: Security of random data perturbation methods. ACM Trans. Database Syst. 24(4), 487–493 (1999)
Oliveira, S.R.M., Zaïane, O.R.: Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian Symposium on Databases, pp. 304–318 (2003)
Oliveira, S.R.M., Zaïane, O.R.: Privacy-preserving clustering by object similarity-based representation and dimensionality reduction transformation. In: ICDM 2004, pp. 21–30. IEEE Computer Society, Los Alamitos (2004)
UCI Machine Learning Repository, http://mlearn.ics.uci.edu/MLsummary.html
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, pp. 206–215. ACM Press, New York (2003)
Vijayan, J.: House committee chair wants info on cancelled dhs data-mining programs. Computer World (September 18, 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Turgay, E.O., Pedersen, T.B., Saygın, Y., Savaş, E., Levi, A. (2008). Disclosure Risks of Distance Preserving Data Transformations. In: Ludäscher, B., Mamoulis, N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69497-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-69497-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69476-2
Online ISBN: 978-3-540-69497-7
eBook Packages: Computer ScienceComputer Science (R0)