Skip to main content
Log in

Encrypted data indexing for the secure outsourcing of spectral clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Spectral clustering is one of the most popular clustering methods and is particularly useful for pattern recognition and image analysis. When using spectral clustering for analysis, users are either required to implement their own platforms, which requires strong data analytics and machine learning skills, or allow a third party to access and analyze their data, which may compromise their data privacy or security. Traditionally, this problem is solved by privacy-preserving data mining using randomization perturbation or secure multi-party computation. However, the existing methods suffer from the problems of inaccurate results or high computational requirements on the data owner’s side. To address these problems, in this paper, we propose a new secure outsourcing data mining (SODM) paradigm, which allows data owners to encrypt their data to ensure maximum data security. After the encryption, data owners can outsource their encrypted data to data analytics service providers (i.e., data analytics agent) for knowledge discovery, with a guarantee that neither the data analytics agent nor the other parties can compromise data privacy. To allow data mining to be efficiently carried out on encrypted data, we design a secure KD-tree to index all the encrypted data. Based on the SODM framework, a secure spectral clustering algorithm is proposed. The experiments on real-world datasets demonstrate the effectiveness and the efficiency of the system for the secure outsourcing of data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: SIGMOD, pp 439–450

  2. Agrawal R, Srikant R, Thomas D (2005) Privacy preserving OLAP. In: SIGMOD, pp 251–262

  3. Ashouri-Talouki M, Baraani-Dastjerdi A, Selçuk AA (2015) The cloaked-centroid protocol: location privacy protection for a group of users of location-based services. Knowl Inf Syst 45(3):589–615

    Article  Google Scholar 

  4. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  MathSciNet  MATH  Google Scholar 

  5. Bock RK (2007) UC Irvine machine learning repository. http://archive.ics.uci.edu/ml/index.html

  6. Bunn P, Ostrovsky R (2007) Secure two-party k-means clustering. In: CCS, pp 486–497

  7. van Dijk M, Juels A (2010) On the impossibility of cryptography alone for privacy-preserving cloud computing. In: USENIX

  8. Elmehdwi Y, Samanthula BK, Jiang W (2014) Secure k-nearest neighbor query over encrypted data in outsourced environments. In: ICDE, pp 664–675

  9. Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: SIGKDD, pp 217–228

  10. Gambs S, Kégl B, Aïmeur E (2007) Privacy-preserving boosting. Data Min Knowl Discov 14(1):131–170

    Article  MathSciNet  Google Scholar 

  11. Goldreich O (2004) Foundations of cryptography, vol 2. Basic applications. University Press, Cambridge

    Book  MATH  Google Scholar 

  12. Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: SIGKDD, pp 593–599

  13. Jagannathan G, Pillaipakkamnatt K, Wright RN (2006) A new privacy-preserving distributed k-clustering algorithm. In: SDM, pp 494–498

  14. Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037

    Article  Google Scholar 

  15. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: ICDM, pp 99–106

  16. Kieseberg P, Hobel H, Schrittwieser S, Weippl ER, Holzinger A (2014) Protecting anonymity in data-driven biomedical science. In: Interactive knowledge discovery and data mining in biomedical informatics-state-of-the-art and future challenges, pp 301–316

  17. Kieseberg P, Malle B, Frühwirt P, Weippl ER, Holzinger A (2016) A tamper-proof audit and control system for the doctor in the loop. Brain Inf 3(4):269–279

    Article  Google Scholar 

  18. Lee DT, Wong CK (1977) Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Inf 9:23–29

    Article  MathSciNet  MATH  Google Scholar 

  19. Lin K (2013) Privacy-preserving kernel k-means outsourcing with randomized kernels. In: ICDM workshops, pp 860–866

  20. Lin K, Chang Y, Chen M (2015) Secure support vector machines outsourcing with random linear transformation. Knowl Inf Syst 44(1):147–176

    Article  Google Scholar 

  21. Lin Z, Jaromczyk JW (2011) Privacy preserving spectral clustering over vertically partitioned data sets. In: FSKD, pp 1206–1211

  22. Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. J Priv Confid 1(1):59–98

    Google Scholar 

  23. Liu D, Bertino E, Yi X (2014) Privacy of outsourced k-means clustering. In: ASIACCS, pp 123–134

  24. Ma Q, Deng P (2008) Secure multi-party protocols for privacy preserving data mining. In: WASA, pp 526–537

  25. Malle B, Kieseberg P, Weippl ER, Holzinger A (2016) The right to be forgotten: towards machine learning on perturbed knowledge bases. In: IFIP WG 8.4, 8.9, CD-ARES 2016, and PAML 2016, pp 251–266

  26. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: NIPS, pp 849–856

  27. Ning H, Xu W, Chi Y, Gong Y, Huang TS (2007) Incremental spectral clustering with application to monitoring of evolving blog communities. In: SDM, pp 261–272

  28. Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT, pp 223–238

  29. Polat H, Du W (2005) SVD-based collaborative filtering with privacy. In: SAC, pp 791–795

  30. Rao F, Samanthula BK, Bertino E, Yi X, Liu D (2015) Privacy-preserving and outsourced multi-user k-means clustering. In: IEEE conference on collaboration and internet computing (CIC 2015), pp 80–89

  31. Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: VLDB, pp 682–693

  32. Sindhumol SS, Kumar A, Balakrishnan K (2013) Spectral clustering independent component analysis for tissue classification from brain MRI. Biomed Signal Process Control 8(6):667–674

    Article  Google Scholar 

  33. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  34. Sun Y, Wen Q, Zhang Y, Zhang H, Jin Z, Li W (2014) Two-cloud-servers-assisted secure outsourcing multiparty computation. Sci World J 2014:7

    Google Scholar 

  35. Symeonidis P, Iakovidou N, Mantas N, Manolopoulos Y (2013) From biological to social networks: link prediction based on multi-way spectral clustering. Data Knowl Eng 87:226–242

    Article  Google Scholar 

  36. Tasdemir K (2012) Vector quantization based approximate spectral clustering of large datasets. Pattern Recognit 45(8):3034–3044

    Article  Google Scholar 

  37. Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: SIGKDD, pp 639–644

  38. Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: SIGKDD, pp 206–215

  39. Vaidya J, Kantarcioglu M, Clifton C (2008) Privacy-preserving Naïve Bayes classification. VLDB 17(4):879–898

    Article  Google Scholar 

  40. Yao AC (1986) How to generate and exchange secrets (extended abstract). In: 27th annual symposium on foundations of computer science, pp 162–167

  41. Yi X, Zhang Y (2013) Equally Contributory privacy-preserving k-means clustering over vertically partitioned data. Inf Syst 38(1):97–107

    Article  Google Scholar 

  42. Zhu MY, Liu L (2004) Optimal randomization for privacy preserving data mining. In: SIGKDD, pp 761–766

Download references

Acknowledgements

This work was supported, in part, by the Australia Research Council (ARC) Discovery Project under Grant No. DP180100966, National Key Research and Development Program of China under Grant 2017YFB0802704 and program of Shanghai Technology Research Leader under Grant 16XD1424400.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bozhong Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Chen, L., Zhu, X. et al. Encrypted data indexing for the secure outsourcing of spectral clustering. Knowl Inf Syst 60, 1307–1328 (2019). https://doi.org/10.1007/s10115-018-1262-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1262-2

Keywords

Navigation