Importance of Data Standardization in Privacy-Preserving K-Means Clustering

Su, Chunhua; Zhan, Justin; Sakurai, Kouichi

doi:10.1007/978-3-642-04205-8_23

Importance of Data Standardization in Privacy-Preserving K-Means Clustering

Chunhua Su²⁰,
Justin Zhan²¹ &
Kouichi Sakurai²⁰

Conference paper

521 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5667))

Abstract

Privacy-preserving k-means clustering assumes that there are at least two parties in the secure interactive computation. However, the existing schemes do not consider the data standardization which is an important task before executing the clustering among the different database. In this paper, we point out without data standardization, some problems will arise from many applications of data mining. Also, we provide a solution for the secure data standardization in the privacy-preserving k-means clustering.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: Proc. of the 14th ACM conference on Computer and communications security, pp. 486–497 (2007)
Google Scholar
Chu, C.W., Holliday, J., Willett, P.: Effect of data standardization on chemical clustering and similarity searching. Journal of Chemical Information and Modeling (2008)
Google Scholar
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M., Wright, R.: Secure multiparty computation of approximations. In: Proc. of 28th International Colloquium on Automata, Languages and Programming, pp. 927–938 (2001)
Google Scholar
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game or a completeness theorem for protocols with honest majority. In: Proc. of the Nineteenth Annual ACM Symposium on Theory of Computing, pp. 218–229 (1987)
Google Scholar
Jha, S., Kruger, L., McDaniel, P.: Privacy preserving clustering. In: de di Vimercati, S.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)
Chapter Google Scholar
Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A new privacy-preserving distributed k-clustering algorithm. In: Proc. of the 2006 SIAM International Conference on Data Mining, SDM (2006)
Google Scholar
Jagannathan, G., Wright, R.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proc. of the 11th International Conference on Knowledge Discovery and Data Mining, KDD (2005)
Google Scholar
Kiltz, E., Leander, G., Malone-Lee, J.: Secure computation of the mean and related statistics. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 283–302. Springer, Heidelberg (2005)
Chapter Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Naor, M., Pinkas, B.: Oblivious transfer and polynomial evaluation. In: 31st ACM Symposium on Theory of Computing, pp. 245–254. ACM Press, New York (1999)
Google Scholar
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, p. 223. Springer, Heidelberg (1999)
Chapter Google Scholar
Peng, K., Boyd, C., Dawson, E., Lee, B.: An efficient and verifiable solution to the millionaire problem. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 51–66. Springer, Heidelberg (2005)
Chapter Google Scholar
Rakhlin, A., Caponnetto, A.: Stability of k-means clustering. In: Proc. of Neural Information Processing Systems Conference (2006)
Google Scholar
Schaffer, C.M., Green, P.E.: An empirical comparison of variable standardization methods in cluster analysis. Multivariate Behavioral Research 31(2), 149–167 (1996)
Article Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, USA (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Communication Engineering, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, Fukuoka, 819-0395, Japan
Chunhua Su & Kouichi Sakurai
Heinz College & Cylab Japan Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Justin Zhan

Authors

Chunhua Su
View author publications
You can also search for this author in PubMed Google Scholar
Justin Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Kouichi Sakurai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong
Lei Chen
Swinburne University of Technology, Melbourne, Australia
Chengfei Liu
CSIRO, Castray Esplanade, 7000, Hobart, TAS, Australia
Qing Liu
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, Brisbane, QLD, Australia
Ke Deng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, C., Zhan, J., Sakurai, K. (2009). Importance of Data Standardization in Privacy-Preserving K-Means Clustering. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04205-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-04205-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04204-1
Online ISBN: 978-3-642-04205-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics