Skip to main content
Log in

Privacy-preserving kernel k-means clustering outsourcing with random transformation

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Clustering is a common task for organizing data into clusters. The kernel k-means identifies clusters of nonlinearly separable data by applying the kernel trick to the commonly used k-means clustering to group data in the kernel-induced feature space. Since the kernel k-means is costly in computation due to the quadratic complexity, outsourcing the computations of kernel k-means to external computing service providers can benefit the data owner who has only limited computing resources. However, data privacy is a critical concern in outsourcing since the data may contain sensitive information. Existing works of privacy-preserving outsourcing for general kernel methods based on distance preservation are weak in security. We propose a privacy-preserving outsourcing scheme for the kernel k-means based on the randomly linear transformation and the random perturbation of the kernel matrix. The data sent to the service provider are encrypted, and the service provider solves the kernel k-means from the encrypted data. The proposed scheme is much stronger in security than existing works, and the experimental results show that the proposed privacy-preserving kernel k-means method has similar clustering performance with a normal large-scale kernel k-means algorithm and imposes very little overhead on the data owner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Google Prediction API. http://developers.google.com/prediction.

  2. Standard for privacy of individually identifiable health information. http://www.hhs.gov/ocr/privacy.

References

  1. Abolfazli S, Sanaei Z, Ahmed E, Gani A, Buyya R (2014) Cloud-based augmentation for mobile devices: motivation, taxonomies, and open challenges. IEEE Commun Surv Tutor 16(1):337–368

    Article  Google Scholar 

  2. Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th international conference on extending database technology (EDBT)

  3. Agrawal R, Kiernan J, Srikant R, Xu Y (2004) Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data (SIGMOD)

  4. Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD)

  5. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  6. Chan PK, Schlag MDF, Zien JY (1994) Spectral \(k\)-way ratio-cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 13(9):1088–1096

    Article  Google Scholar 

  7. Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining (ICDM)

  8. Chen K, Sun G, Liu L (2007) Towards attack-resilient geometric data perturbation. In: Proceedings of the 7th SIAM international conference on data mining (SDM)

  9. Chitta R, Jin R, Havens TC, Jain AK (2011) Approximate kernel \(k\)-means: solution to large scale kernel clustering. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)

  10. Dhillon IS, Guan Y, Kulis B (2004) Kernel \(k\)-means, spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)

  11. Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957

    Article  Google Scholar 

  12. Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201

    Article  Google Scholar 

  13. Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212

    Article  MathSciNet  Google Scholar 

  14. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)

  15. Gentry C (2010) Computing arbitrary functions of encrypted data. Commun ACM 53(3):97–105

    Article  MATH  Google Scholar 

  16. Hacıgümüş H, Iyer B, Li C, Mehrotra S (2002) Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data (SIGMOD)

  17. Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, Los Altos, CA

    MATH  Google Scholar 

  18. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

  19. Inan A, Kantarcioglu M, Bertino E (2009) Using anonymized data for classification. In: Proceedings of the 25th IEEE international conference on data engineering (ICDE)

  20. Jagannathan G, Gehrke J, Wright RN (2005) Privacy-preserving distributed \(k\)-means clustering over arbitrarily partitioned data, In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (KDD), pp 593–599

  21. Jha S, Kruger L, McDaniel P (2005) Privacy preserving clustering. In: Proceedings of the 10th European symposium on research in computer security (ESORICS), pp 397–417

  22. Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037

    Article  Google Scholar 

  23. Laur S, Lipmaa H, Mielikäinen T (2006) Cryptographically private support vector machines, In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)

  24. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. http://yann.lecun.com/exdb/mnist/

  25. Lee Y-J, Huang S-Y (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13

    Article  Google Scholar 

  26. Lee Y-J, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the 1st SIAM international conference on data mining (SDM)

  27. Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: Proceedings of the 23rd IEEE international conference on data engineering (ICDE), pp 106–115

  28. Lin K-P (2013) Privacy-preserving kernel \(k\)-means outsourcing with randomized kernels. In: Proceedings of the 13th IEEE international conference on data mining workshops (ICDMW), pp 860–866

  29. Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15:177–206

    Article  MathSciNet  MATH  Google Scholar 

  30. Liu D, Bertino E, Yi X (2014) Privacy of outsourced \(k\)-means clustering. In: Proceedings of the 9th ACM symposium on information, computer and communications security (ASIA CCS), pp 123–134

  31. Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  32. Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) \(l\)-Diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE)

  33. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability

  34. Mangasarian OL, Wild EW, Fung GM (2008) Privacy-preserving classification of vertically partitioned data via random kernels. ACM Trans Knowl Discov Data 2(3):12:1–12:16

    Article  Google Scholar 

  35. Mukherjee S, Chen Z, Gangopadhyay A (2006) A privacy-preserving technique for euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J 15(4):293–315

    Article  Google Scholar 

  36. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge, MA, pp 849–856

  37. Oliveira S, Zaïane OR (2003) Privacy-preserving clustering by data transformation, In: Proceedings of the 18th Brazilian symposium on databases, pp 304–318

  38. Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor Newsl 4(2):12–19

    Article  MathSciNet  Google Scholar 

  39. Ryan MD (2011) Cloud computing privacy concerns on our doorstep. Commun ACM 54(1):36–38

    Article  Google Scholar 

  40. Samarati P (2001) ‘Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  41. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA

    Google Scholar 

  42. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  43. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Classif 3:583–617

    MathSciNet  MATH  Google Scholar 

  44. Sweeney L (2002) \(k\)-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  MATH  Google Scholar 

  45. Takabi H, Joshi J, Ahn G-J (2010) Security and privacy challenges in cloud computing environments. IEEE Secur Priv 8(6):24–31

    Article  Google Scholar 

  46. Vaidya J, Clifton C (2002) Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)

  47. Vaidya J, Clifton C (2003) Privacy-preserving \(k\)-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)

  48. Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Advances in neural information processing systems, vol 13. MIT Press, Cambridge, MA, pp 682–688

  49. Wong WK, Cheung DW, Hung E, Kao B, Mamoulis N (2007) Security in outsourcing of association rule mining, In: Proceedings of the 33rd international conference on very large data bases (VLDB)

  50. Wong WK, Cheung DW, Kao B, Mamoulis N (2009) Secure kNN computation on encrypted databases. In: Proceedings of the 35th SIGMOD international conference on management of data (SIGMOD)

Download references

Acknowledgments

This work was supported in part by the Ministry of Science and Technology, Taiwan, under MOST 103-2410-H-110-025-MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keng-Pei Lin.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, KP. Privacy-preserving kernel k-means clustering outsourcing with random transformation. Knowl Inf Syst 49, 885–908 (2016). https://doi.org/10.1007/s10115-016-0923-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0923-2

Keywords

Navigation