Skip to main content

Importance of Data Standardization in Privacy-Preserving K-Means Clustering

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5667))

Abstract

Privacy-preserving k-means clustering assumes that there are at least two parties in the secure interactive computation. However, the existing schemes do not consider the data standardization which is an important task before executing the clustering among the different database. In this paper, we point out without data standardization, some problems will arise from many applications of data mining. Also, we provide a solution for the secure data standardization in the privacy-preserving k-means clustering.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: Proc. of the 14th ACM conference on Computer and communications security, pp. 486–497 (2007)

    Google Scholar 

  2. Chu, C.W., Holliday, J., Willett, P.: Effect of data standardization on chemical clustering and similarity searching. Journal of Chemical Information and Modeling (2008)

    Google Scholar 

  3. Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M., Wright, R.: Secure multiparty computation of approximations. In: Proc. of 28th International Colloquium on Automata, Languages and Programming, pp. 927–938 (2001)

    Google Scholar 

  4. Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game or a completeness theorem for protocols with honest majority. In: Proc. of the Nineteenth Annual ACM Symposium on Theory of Computing, pp. 218–229 (1987)

    Google Scholar 

  5. Jha, S., Kruger, L., McDaniel, P.: Privacy preserving clustering. In: de di Vimercati, S.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A new privacy-preserving distributed k-clustering algorithm. In: Proc. of the 2006 SIAM International Conference on Data Mining, SDM (2006)

    Google Scholar 

  7. Jagannathan, G., Wright, R.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proc. of the 11th International Conference on Knowledge Discovery and Data Mining, KDD (2005)

    Google Scholar 

  8. Kiltz, E., Leander, G., Malone-Lee, J.: Secure computation of the mean and related statistics. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 283–302. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  10. Naor, M., Pinkas, B.: Oblivious transfer and polynomial evaluation. In: 31st ACM Symposium on Theory of Computing, pp. 245–254. ACM Press, New York (1999)

    Google Scholar 

  11. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, p. 223. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  12. Peng, K., Boyd, C., Dawson, E., Lee, B.: An efficient and verifiable solution to the millionaire problem. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 51–66. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Rakhlin, A., Caponnetto, A.: Stability of k-means clustering. In: Proc. of Neural Information Processing Systems Conference (2006)

    Google Scholar 

  14. Schaffer, C.M., Green, P.E.: An empirical comparison of variable standardization methods in cluster analysis. Multivariate Behavioral Research 31(2), 149–167 (1996)

    Article  Google Scholar 

  15. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, USA (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Su, C., Zhan, J., Sakurai, K. (2009). Importance of Data Standardization in Privacy-Preserving K-Means Clustering. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04205-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04205-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04204-1

  • Online ISBN: 978-3-642-04205-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics