Abstract
Spectral clustering is one of the most popular clustering methods and is particularly useful for pattern recognition and image analysis. When using spectral clustering for analysis, users are either required to implement their own platforms, which requires strong data analytics and machine learning skills, or allow a third party to access and analyze their data, which may compromise their data privacy or security. Traditionally, this problem is solved by privacy-preserving data mining using randomization perturbation or secure multi-party computation. However, the existing methods suffer from the problems of inaccurate results or high computational requirements on the data owner’s side. To address these problems, in this paper, we propose a new secure outsourcing data mining (SODM) paradigm, which allows data owners to encrypt their data to ensure maximum data security. After the encryption, data owners can outsource their encrypted data to data analytics service providers (i.e., data analytics agent) for knowledge discovery, with a guarantee that neither the data analytics agent nor the other parties can compromise data privacy. To allow data mining to be efficiently carried out on encrypted data, we design a secure KD-tree to index all the encrypted data. Based on the SODM framework, a secure spectral clustering algorithm is proposed. The experiments on real-world datasets demonstrate the effectiveness and the efficiency of the system for the secure outsourcing of data mining.
Similar content being viewed by others
References
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: SIGMOD, pp 439–450
Agrawal R, Srikant R, Thomas D (2005) Privacy preserving OLAP. In: SIGMOD, pp 251–262
Ashouri-Talouki M, Baraani-Dastjerdi A, Selçuk AA (2015) The cloaked-centroid protocol: location privacy protection for a group of users of location-based services. Knowl Inf Syst 45(3):589–615
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Bock RK (2007) UC Irvine machine learning repository. http://archive.ics.uci.edu/ml/index.html
Bunn P, Ostrovsky R (2007) Secure two-party k-means clustering. In: CCS, pp 486–497
van Dijk M, Juels A (2010) On the impossibility of cryptography alone for privacy-preserving cloud computing. In: USENIX
Elmehdwi Y, Samanthula BK, Jiang W (2014) Secure k-nearest neighbor query over encrypted data in outsourced environments. In: ICDE, pp 664–675
Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: SIGKDD, pp 217–228
Gambs S, Kégl B, Aïmeur E (2007) Privacy-preserving boosting. Data Min Knowl Discov 14(1):131–170
Goldreich O (2004) Foundations of cryptography, vol 2. Basic applications. University Press, Cambridge
Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: SIGKDD, pp 593–599
Jagannathan G, Pillaipakkamnatt K, Wright RN (2006) A new privacy-preserving distributed k-clustering algorithm. In: SDM, pp 494–498
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: ICDM, pp 99–106
Kieseberg P, Hobel H, Schrittwieser S, Weippl ER, Holzinger A (2014) Protecting anonymity in data-driven biomedical science. In: Interactive knowledge discovery and data mining in biomedical informatics-state-of-the-art and future challenges, pp 301–316
Kieseberg P, Malle B, Frühwirt P, Weippl ER, Holzinger A (2016) A tamper-proof audit and control system for the doctor in the loop. Brain Inf 3(4):269–279
Lee DT, Wong CK (1977) Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Inf 9:23–29
Lin K (2013) Privacy-preserving kernel k-means outsourcing with randomized kernels. In: ICDM workshops, pp 860–866
Lin K, Chang Y, Chen M (2015) Secure support vector machines outsourcing with random linear transformation. Knowl Inf Syst 44(1):147–176
Lin Z, Jaromczyk JW (2011) Privacy preserving spectral clustering over vertically partitioned data sets. In: FSKD, pp 1206–1211
Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. J Priv Confid 1(1):59–98
Liu D, Bertino E, Yi X (2014) Privacy of outsourced k-means clustering. In: ASIACCS, pp 123–134
Ma Q, Deng P (2008) Secure multi-party protocols for privacy preserving data mining. In: WASA, pp 526–537
Malle B, Kieseberg P, Weippl ER, Holzinger A (2016) The right to be forgotten: towards machine learning on perturbed knowledge bases. In: IFIP WG 8.4, 8.9, CD-ARES 2016, and PAML 2016, pp 251–266
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: NIPS, pp 849–856
Ning H, Xu W, Chi Y, Gong Y, Huang TS (2007) Incremental spectral clustering with application to monitoring of evolving blog communities. In: SDM, pp 261–272
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT, pp 223–238
Polat H, Du W (2005) SVD-based collaborative filtering with privacy. In: SAC, pp 791–795
Rao F, Samanthula BK, Bertino E, Yi X, Liu D (2015) Privacy-preserving and outsourced multi-user k-means clustering. In: IEEE conference on collaboration and internet computing (CIC 2015), pp 80–89
Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: VLDB, pp 682–693
Sindhumol SS, Kumar A, Balakrishnan K (2013) Spectral clustering independent component analysis for tissue classification from brain MRI. Biomed Signal Process Control 8(6):667–674
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Sun Y, Wen Q, Zhang Y, Zhang H, Jin Z, Li W (2014) Two-cloud-servers-assisted secure outsourcing multiparty computation. Sci World J 2014:7
Symeonidis P, Iakovidou N, Mantas N, Manolopoulos Y (2013) From biological to social networks: link prediction based on multi-way spectral clustering. Data Knowl Eng 87:226–242
Tasdemir K (2012) Vector quantization based approximate spectral clustering of large datasets. Pattern Recognit 45(8):3034–3044
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: SIGKDD, pp 639–644
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: SIGKDD, pp 206–215
Vaidya J, Kantarcioglu M, Clifton C (2008) Privacy-preserving Naïve Bayes classification. VLDB 17(4):879–898
Yao AC (1986) How to generate and exchange secrets (extended abstract). In: 27th annual symposium on foundations of computer science, pp 162–167
Yi X, Zhang Y (2013) Equally Contributory privacy-preserving k-means clustering over vertically partitioned data. Inf Syst 38(1):97–107
Zhu MY, Liu L (2004) Optimal randomization for privacy preserving data mining. In: SIGKDD, pp 761–766
Acknowledgements
This work was supported, in part, by the Australia Research Council (ARC) Discovery Project under Grant No. DP180100966, National Key Research and Development Program of China under Grant 2017YFB0802704 and program of Shanghai Technology Research Leader under Grant 16XD1424400.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, B., Chen, L., Zhu, X. et al. Encrypted data indexing for the secure outsourcing of spectral clustering. Knowl Inf Syst 60, 1307–1328 (2019). https://doi.org/10.1007/s10115-018-1262-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1262-2