Abstract
Clustering is a common task for organizing data into clusters. The kernel k-means identifies clusters of nonlinearly separable data by applying the kernel trick to the commonly used k-means clustering to group data in the kernel-induced feature space. Since the kernel k-means is costly in computation due to the quadratic complexity, outsourcing the computations of kernel k-means to external computing service providers can benefit the data owner who has only limited computing resources. However, data privacy is a critical concern in outsourcing since the data may contain sensitive information. Existing works of privacy-preserving outsourcing for general kernel methods based on distance preservation are weak in security. We propose a privacy-preserving outsourcing scheme for the kernel k-means based on the randomly linear transformation and the random perturbation of the kernel matrix. The data sent to the service provider are encrypted, and the service provider solves the kernel k-means from the encrypted data. The proposed scheme is much stronger in security than existing works, and the experimental results show that the proposed privacy-preserving kernel k-means method has similar clustering performance with a normal large-scale kernel k-means algorithm and imposes very little overhead on the data owner.


Similar content being viewed by others
Notes
Google Prediction API. http://developers.google.com/prediction.
Standard for privacy of individually identifiable health information. http://www.hhs.gov/ocr/privacy.
References
Abolfazli S, Sanaei Z, Ahmed E, Gani A, Buyya R (2014) Cloud-based augmentation for mobile devices: motivation, taxonomies, and open challenges. IEEE Commun Surv Tutor 16(1):337–368
Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th international conference on extending database technology (EDBT)
Agrawal R, Kiernan J, Srikant R, Xu Y (2004) Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data (SIGMOD)
Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD)
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Chan PK, Schlag MDF, Zien JY (1994) Spectral \(k\)-way ratio-cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 13(9):1088–1096
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining (ICDM)
Chen K, Sun G, Liu L (2007) Towards attack-resilient geometric data perturbation. In: Proceedings of the 7th SIAM international conference on data mining (SDM)
Chitta R, Jin R, Havens TC, Jain AK (2011) Approximate kernel \(k\)-means: solution to large scale kernel clustering. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)
Dhillon IS, Guan Y, Kulis B (2004) Kernel \(k\)-means, spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Gentry C (2010) Computing arbitrary functions of encrypted data. Commun ACM 53(3):97–105
Hacıgümüş H, Iyer B, Li C, Mehrotra S (2002) Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data (SIGMOD)
Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, Los Altos, CA
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Inan A, Kantarcioglu M, Bertino E (2009) Using anonymized data for classification. In: Proceedings of the 25th IEEE international conference on data engineering (ICDE)
Jagannathan G, Gehrke J, Wright RN (2005) Privacy-preserving distributed \(k\)-means clustering over arbitrarily partitioned data, In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (KDD), pp 593–599
Jha S, Kruger L, McDaniel P (2005) Privacy preserving clustering. In: Proceedings of the 10th European symposium on research in computer security (ESORICS), pp 397–417
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037
Laur S, Lipmaa H, Mielikäinen T (2006) Cryptographically private support vector machines, In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. http://yann.lecun.com/exdb/mnist/
Lee Y-J, Huang S-Y (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13
Lee Y-J, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the 1st SIAM international conference on data mining (SDM)
Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: Proceedings of the 23rd IEEE international conference on data engineering (ICDE), pp 106–115
Lin K-P (2013) Privacy-preserving kernel \(k\)-means outsourcing with randomized kernels. In: Proceedings of the 13th IEEE international conference on data mining workshops (ICDMW), pp 860–866
Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15:177–206
Liu D, Bertino E, Yi X (2014) Privacy of outsourced \(k\)-means clustering. In: Proceedings of the 9th ACM symposium on information, computer and communications security (ASIA CCS), pp 123–134
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) \(l\)-Diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE)
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability
Mangasarian OL, Wild EW, Fung GM (2008) Privacy-preserving classification of vertically partitioned data via random kernels. ACM Trans Knowl Discov Data 2(3):12:1–12:16
Mukherjee S, Chen Z, Gangopadhyay A (2006) A privacy-preserving technique for euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J 15(4):293–315
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge, MA, pp 849–856
Oliveira S, Zaïane OR (2003) Privacy-preserving clustering by data transformation, In: Proceedings of the 18th Brazilian symposium on databases, pp 304–318
Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor Newsl 4(2):12–19
Ryan MD (2011) Cloud computing privacy concerns on our doorstep. Commun ACM 54(1):36–38
Samarati P (2001) ‘Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Classif 3:583–617
Sweeney L (2002) \(k\)-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Takabi H, Joshi J, Ahn G-J (2010) Security and privacy challenges in cloud computing environments. IEEE Secur Priv 8(6):24–31
Vaidya J, Clifton C (2002) Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Vaidya J, Clifton C (2003) Privacy-preserving \(k\)-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)
Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Advances in neural information processing systems, vol 13. MIT Press, Cambridge, MA, pp 682–688
Wong WK, Cheung DW, Hung E, Kao B, Mamoulis N (2007) Security in outsourcing of association rule mining, In: Proceedings of the 33rd international conference on very large data bases (VLDB)
Wong WK, Cheung DW, Kao B, Mamoulis N (2009) Secure kNN computation on encrypted databases. In: Proceedings of the 35th SIGMOD international conference on management of data (SIGMOD)
Acknowledgments
This work was supported in part by the Ministry of Science and Technology, Taiwan, under MOST 103-2410-H-110-025-MY2.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Rights and permissions
About this article
Cite this article
Lin, KP. Privacy-preserving kernel k-means clustering outsourcing with random transformation. Knowl Inf Syst 49, 885–908 (2016). https://doi.org/10.1007/s10115-016-0923-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0923-2