Privacy-preserving kernel k-means clustering outsourcing with random transformation

Lin, Keng-Pei

doi:10.1007/s10115-016-0923-2

Privacy-preserving kernel k-means clustering outsourcing with random transformation

Regular Paper
Published: 13 February 2016

Volume 49, pages 885–908, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Keng-Pei Lin¹

11k Accesses
18 Citations
Explore all metrics

Abstract

Clustering is a common task for organizing data into clusters. The kernel k-means identifies clusters of nonlinearly separable data by applying the kernel trick to the commonly used k-means clustering to group data in the kernel-induced feature space. Since the kernel k-means is costly in computation due to the quadratic complexity, outsourcing the computations of kernel k-means to external computing service providers can benefit the data owner who has only limited computing resources. However, data privacy is a critical concern in outsourcing since the data may contain sensitive information. Existing works of privacy-preserving outsourcing for general kernel methods based on distance preservation are weak in security. We propose a privacy-preserving outsourcing scheme for the kernel k-means based on the randomly linear transformation and the random perturbation of the kernel matrix. The data sent to the service provider are encrypted, and the service provider solves the kernel k-means from the encrypted data. The proposed scheme is much stronger in security than existing works, and the experimental results show that the proposed privacy-preserving kernel k-means method has similar clustering performance with a normal large-scale kernel k-means algorithm and imposes very little overhead on the data owner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy Preserving Outsourced K-means Clustering Using Kd-tree

Privacy-Preserving Accelerated Clustering for Data Encrypted by Different Keys

K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction

Notes

Google Prediction API. http://developers.google.com/prediction.
Standard for privacy of individually identifiable health information. http://www.hhs.gov/ocr/privacy.

References

Abolfazli S, Sanaei Z, Ahmed E, Gani A, Buyya R (2014) Cloud-based augmentation for mobile devices: motivation, taxonomies, and open challenges. IEEE Commun Surv Tutor 16(1):337–368
Article Google Scholar
Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th international conference on extending database technology (EDBT)
Agrawal R, Kiernan J, Srikant R, Xu Y (2004) Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data (SIGMOD)
Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD)
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Chan PK, Schlag MDF, Zien JY (1994) Spectral \(k\)-way ratio-cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 13(9):1088–1096
Article Google Scholar
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining (ICDM)
Chen K, Sun G, Liu L (2007) Towards attack-resilient geometric data perturbation. In: Proceedings of the 7th SIAM international conference on data mining (SDM)
Chitta R, Jin R, Havens TC, Jain AK (2011) Approximate kernel \(k\)-means: solution to large scale kernel clustering. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)
Dhillon IS, Guan Y, Kulis B (2004) Kernel \(k\)-means, spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957
Article Google Scholar
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Article Google Scholar
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Gentry C (2010) Computing arbitrary functions of encrypted data. Commun ACM 53(3):97–105
Article MATH Google Scholar
Hacıgümüş H, Iyer B, Li C, Mehrotra S (2002) Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data (SIGMOD)
Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, Los Altos, CA
MATH Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Inan A, Kantarcioglu M, Bertino E (2009) Using anonymized data for classification. In: Proceedings of the 25th IEEE international conference on data engineering (ICDE)
Jagannathan G, Gehrke J, Wright RN (2005) Privacy-preserving distributed \(k\)-means clustering over arbitrarily partitioned data, In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (KDD), pp 593–599
Jha S, Kruger L, McDaniel P (2005) Privacy preserving clustering. In: Proceedings of the 10th European symposium on research in computer security (ESORICS), pp 397–417
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037
Article Google Scholar
Laur S, Lipmaa H, Mielikäinen T (2006) Cryptographically private support vector machines, In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. http://yann.lecun.com/exdb/mnist/
Lee Y-J, Huang S-Y (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13
Article Google Scholar
Lee Y-J, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the 1st SIAM international conference on data mining (SDM)
Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: Proceedings of the 23rd IEEE international conference on data engineering (ICDE), pp 106–115
Lin K-P (2013) Privacy-preserving kernel \(k\)-means outsourcing with randomized kernels. In: Proceedings of the 13th IEEE international conference on data mining workshops (ICDMW), pp 860–866
Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15:177–206
Article MathSciNet MATH Google Scholar
Liu D, Bertino E, Yi X (2014) Privacy of outsourced \(k\)-means clustering. In: Proceedings of the 9th ACM symposium on information, computer and communications security (ASIA CCS), pp 123–134
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet MATH Google Scholar
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) \(l\)-Diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE)
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability
Mangasarian OL, Wild EW, Fung GM (2008) Privacy-preserving classification of vertically partitioned data via random kernels. ACM Trans Knowl Discov Data 2(3):12:1–12:16
Article Google Scholar
Mukherjee S, Chen Z, Gangopadhyay A (2006) A privacy-preserving technique for euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J 15(4):293–315
Article Google Scholar
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge, MA, pp 849–856
Oliveira S, Zaïane OR (2003) Privacy-preserving clustering by data transformation, In: Proceedings of the 18th Brazilian symposium on databases, pp 304–318
Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor Newsl 4(2):12–19
Article MathSciNet Google Scholar
Ryan MD (2011) Cloud computing privacy concerns on our doorstep. Commun ACM 54(1):36–38
Article Google Scholar
Samarati P (2001) ‘Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Article Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA
Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Classif 3:583–617
MathSciNet MATH Google Scholar
Sweeney L (2002) \(k\)-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Article MathSciNet MATH Google Scholar
Takabi H, Joshi J, Ahn G-J (2010) Security and privacy challenges in cloud computing environments. IEEE Secur Priv 8(6):24–31
Article Google Scholar
Vaidya J, Clifton C (2002) Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Vaidya J, Clifton C (2003) Privacy-preserving \(k\)-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery in data mining (KDD)
Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Advances in neural information processing systems, vol 13. MIT Press, Cambridge, MA, pp 682–688
Wong WK, Cheung DW, Hung E, Kao B, Mamoulis N (2007) Security in outsourcing of association rule mining, In: Proceedings of the 33rd international conference on very large data bases (VLDB)
Wong WK, Cheung DW, Kao B, Mamoulis N (2009) Secure kNN computation on encrypted databases. In: Proceedings of the 35th SIGMOD international conference on management of data (SIGMOD)

Download references

Acknowledgments

This work was supported in part by the Ministry of Science and Technology, Taiwan, under MOST 103-2410-H-110-025-MY2.

Author information

Authors and Affiliations

Department of Information Management, National Sun Yat-sen University, Kaohsiung, Taiwan
Keng-Pei Lin

Authors

Keng-Pei Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keng-Pei Lin.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, KP. Privacy-preserving kernel k-means clustering outsourcing with random transformation. Knowl Inf Syst 49, 885–908 (2016). https://doi.org/10.1007/s10115-016-0923-2

Download citation

Received: 05 March 2015
Revised: 19 December 2015
Accepted: 30 January 2016
Published: 13 February 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10115-016-0923-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy-preserving kernel k-means clustering outsourcing with random transformation

Abstract

Access this article

Similar content being viewed by others

Privacy Preserving Outsourced K-means Clustering Using Kd-tree

Privacy-Preserving Accelerated Clustering for Data Encrypted by Different Keys

K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Privacy-preserving kernel k-means clustering outsourcing with random transformation

Abstract

Access this article

Similar content being viewed by others

Privacy Preserving Outsourced K-means Clustering Using Kd-tree

Privacy-Preserving Accelerated Clustering for Data Encrypted by Different Keys

K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation