Skip to main content
Log in

Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Finding the nearest k objects to a query object is a fundamental operation for many data mining algorithms. With the recent interest in privacy, it is not surprising that there is strong interest in k-NN queries to enable clustering, classification and outlier-detection tasks. However, previous approaches to privacy-preserving k-NN have been costly and can only be realistically applied to small data sets. In this paper, we provide efficient solutions for k-NN queries for vertically partitioned data. We provide the first solution for the L (or Chessboard) metric as well as detailed privacy-preserving computation of all other Minkowski metrics. We enable privacy-preserving L by providing a practical approach to the Yao’s millionaires problem with more than two parties. This is based on a pragmatic and implementable solution to Yao’s millionaires problem with shares. We also provide privacy-preserving algorithms for combinations of local metrics into a global metric that handles the large dimensionality and diversity of attributes common in vertically partitioned data. To manage very large data sets, we provide a privacy-preserving SASH (a very successful data structure for associative queries in high dimensions). Besides providing a theoretical analysis, we illustrate the efficiency of our approach with an empirical evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amirbekyan A, Estivill-Castro V (2006) Privacy preserving DBSCAN for vertically partitioned data. In: IEEE international conference on intelligence and security informatics, ISI 2006. Lecture notes in computer science, vol 3975. Springer, San Diego, pp 141–153

  2. Amirbekyan A, Estivill-Castro V (2007) The privacy of k-NN retrieval for horizontal partitioned data—new methods and applications. In: Bailey J, Fekete A (eds) Eighteenth Australasian database conference (ADC2007). Conferences in research and practice in information technology (CRPIT), CORE, Australasian Computer Society, Ballarat, Victoria, Australia, pp 33–42

  3. Ankerst M, Kastenmueller G, Kriegel H, Seidl T (1999) Nearest neighbor classification in 3D protein databases. In: Proceedings of the seventh international conference on intelligent systems for molecular biology (ISMB-99), pp 34–43

  4. Ankerst M, Kriegel H, Seidl T (1998) A multi-step approach for shape similarity in image databases. IEEE Trans Data Eng 10(6): 996–1004

    Article  Google Scholar 

  5. Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1994) An optimal algorithm for approximate nearest neighbor searching. In: 5th annual ACM-SIAM symposium on discrete algorithms, pp 573–582

  6. Avidan S, Butman M (2006) Blind vision. In: Leonardis S, Bischof H, Pinz A (eds) Computer vision—ECCV 2006, 9th European conference on computer vision, Part III’. Lecture notes in computer science, vol 3953. Springer, Graz, pp 1–13

    Google Scholar 

  7. Beimel A, Ishai Y (2005) On the power of nonlinear secret-sharing. SIAM J Discrete Math 19(1): 258–280

    Article  MATH  MathSciNet  Google Scholar 

  8. Bentley J (1975) Multidimensional binary search trees used for associative retrieval. Commun ACM 18(9): 509–517

    Article  MATH  MathSciNet  Google Scholar 

  9. Berchtold S, Keim DA, Kriegel H-P (1996) The X-tree: an index structure for higher dimensional data. In: Proceedings of twentieth VLDB conference, pp 28–39

  10. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful. In: International conference on database theory, Jerusalem, Israel, pp 217–225

  11. Blundo C, Masucci B, Stinson D, Wei R (2002) Constructions and bounds for unconditionally secure non-interactive commitment schemes. Designs Codes Cryptography 26: 97–110

    Article  MATH  MathSciNet  Google Scholar 

  12. Breunig M, Kriegel H-P, Ng R, Sander J (2000) Lof: Identifying density-based local outliers. In: Chen W, Naughton JF, Bernstein P (eds) Proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM Press, Dallas, pp 93–104

    Chapter  Google Scholar 

  13. Cachin C (1999) Efficient private bidding and auctions with an oblivious third party. In: Proceedings of the sixth ACM conference on computer and communications security, SIGSAC, ACM Press, Singapore, pp 120–127

  14. Cheng X, Dolin R, Neary M, Prabhakar S, Ravikanth K, Wu D, Agrawal D, El Abbadi E, Freeston M, Singh A, Smith T, Su J (1997) Scalable access within the context of digital libraries. In: Proceedings of the international conference on advances in digital libraries. ADL, Washington, DC, pp 70–81

  15. Ciaccia P, Patella M (2000) PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the international conference on data engineering, San Jose, California, pp 244–255

  16. Damgard I, Pedersen T, Pfitzmann B (1998) Statistical secrecy and multibit commitments. IEEE Trans Inf Theory 44(3): 1143–1151

    Article  MATH  MathSciNet  Google Scholar 

  17. Du W, Atallah M (2001) Privacy-preserving cooperative statistical analysis. In: Proceedings of the seventeenth annual computer security applications conference (ACSAC), ACM SIGSAC. IEEE Computer Society, New Orleans, pp 102–110

  18. Du W, Han Y-S, Chen S (2004) Privacy-preserving multivariate statistical analysis: linear regression and classification. In: BM W, Dayal U, Kamath C, Skillicorn D (eds) 2004 SIAM international conference on data mining. Lake Buena Vista, Florida, pp 222–233

    Google Scholar 

  19. Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Estivill-Castro V, Clifton C (eds) Privacy, security and data mining, IEEE ICDM workshop proceedings, Conferences in research and practice in information technology series, Vol 14. Australian Computer Society, Sydney, pp 1–8

    Google Scholar 

  20. Ester M, Kriegel H, Sander S, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (eds) Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), AAAI. AAAI Press, Menlo Park, pp 226–231

    Google Scholar 

  21. Estivill-Castro V (2004) Private representative-based clustering for vertically partitioned data. In: Baeza-Yates R, Marroquin J, Chávez E (eds) Fifth Mexican international conference on computer science (ENC 04), SMCC. IEEE Computer Society Press, Colima, pp 160–167

    Chapter  Google Scholar 

  22. Estivill-Castro V, Yang J (2002) Clustering Web visitors by fast, robust and convergent algorithms. Int J Found Comput Sci 13(4): 497–520

    Article  MATH  Google Scholar 

  23. Fagin R (1996) Combining fuzzy information from multiple systems. In: Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM Press, Montreal, pp 216–226

  24. Faloutsos C, Ranganathan M, Manolopoulos Y (May 1994) Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Minneapolis, pp 419–429

  25. Ferhatosmanoglu H, Tuncel E, Agrawal D, El Abbadi A (2002) Approximate nearest neighbor searching in multimedia databases. In: Seventh IEEE international conference on data engineering (ICDE), Germany, pp 503–511

  26. Goethals B, Laur S, Lipmaa H, Mielikäinen (2005) On private scalar product computation for privacy-preserving data mining. In: Park C, Chee S (eds) Information security and cryptology—ICISC 2004. Lecture notes in computer science, vol 3506. Springer, Berlin, pp 104–120

    Google Scholar 

  27. Goldreich O (2004) The foundations of cryptography, vol 2, chapt General cryptographic protocols. Cambridge University Press, London

    Google Scholar 

  28. Goldreich O, Micali S, Wigderson A (1987) How to play any mental game (extended abstract). In: Aho A (eds) Proceedings of the nineteenth ACM annual symposium on theory of computing. ACM Press, New York, pp 218–229

    Chapter  Google Scholar 

  29. Guttmann A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD international conference on management of data, pp 47–57

  30. Houle M (2003a) Navigating massive data sets via local clustering. In: Getoor L, Senator T, Domingos P, Faloutsos C (eds) Ninth ACM SIGKDD conference on knowledge discovery and data mining. ACM Press, Washington, pp 547–552

    Chapter  Google Scholar 

  31. Houle M (2003b) SASH: a spatial approximation sample hierarchy for similarity, Technical Report RT-0517, IBM Tokyo Research Laboratory, p 16

  32. Houle M, Sakuma J (2005) Fast approximate similarity search in extremely high-dimensional data sets. In: Twenty first international conference on data engineering ICDE. IEEE Computer Society, Tokyo, pp 619–630

  33. Ioannidis I, Grama A (2003) An efficient protocol for Yao’s millionaires’ problem. In: Thirty sixth Hawaii international conference on system sciences HICSS. IEEE Computer Society, Big Island, p 205

  34. Kahveci T, Singh AK (2001) An efficient index structure for string databases. In: VLDB conference, Rome, Italy, pp 351–360

  35. Kantarcioğlu M, Clifton C (2004) Privately computing a distributed k-nn classifier. In: Boulicaut J-F (eds) Eighth European conference on principles and practice of knowledge discovery in databases PKDD. Lecture notes in computer science, vol 3202. Springer, Berlin, pp 279–290

    Google Scholar 

  36. Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4): 387–414

    Article  Google Scholar 

  37. Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbour queries. In: ACM SIGMOD conference on management of data, Tucson, USA, pp 369–380

  38. Korn F, Sidropoulos N, Faloutsos C, Siegel E, Proropapas Z (1996) Fast nearest neighbor search in medical image databases. In: VLDB, Mumbai, India, pp 215–226

  39. Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1): 68–81

    Article  Google Scholar 

  40. Malkhi D, Nisan N, Pinkas B, Sella Y (2004) Fairplay—a secure two-party computation system. In: Proceedings of the thirteenth conference on USENIX security symposium, vol 13, USENIX Association, p 20

  41. Mena J (2003) Investigative data mining for security and criminal detection. Butterworth-Heinemann, London

    Google Scholar 

  42. Morimoto Y, Aono M, Houle M, McCurley K (2003) Extracting spatial knowledge from the Web. In: International symposium on applications and the internet (SAINT), Orlando, USA, pp 326–333

  43. Nielsen J, Schwatzbach M (2007) A domain-specific programming language for secure multiparty computation. In: Conference on programming language design and implementation—Proceedings on the 2007 workshop on programming languages and analysis for security, SIGPLAN, ACM Press, New York, pp 21–30

  44. Peng K, Boyd C, Dawson E, Lee B (2005) An efficient and verifiable solution to the millionaire problem. In: Park C, Chee S (eds) Information security and cryptology—ICISC 2004’. Lecture notes in computer science, vol 3506. Springer, Berlin, pp 51–66

    Google Scholar 

  45. Rivest R (1999) Unconditionally secure commitment and oblivious transfer schemes using concealing channels and a trusted initializer (manuscript)

  46. Roussopoulos N, Kelly S, Vincent F (1995) Nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, San Jose, pp 71–79

  47. Shaneck M, Kim Y, Kumar V (2006) Privacy preserving nearest neighbor search. In: ICDMW ’06: Proceedings of the sixth IEEE international conference on data mining—Workshops. IEEE Computer Society, Washington, DC, pp 541–545

  48. Shannon C (1949) Communication theory of secret systems. Bell Systems Technical Journal 28: 656–715

    MathSciNet  Google Scholar 

  49. Stinson DR (1995) Cryptography: theory and practice. CRC, Boca Raton

    MATH  Google Scholar 

  50. Subrahmanian V (1999) Principles of multimedia database systems. Morgan Kaufmann, San Francisco

    Google Scholar 

  51. Sung S, Liu Y, Xiong H, Ng P (2006) Privacy preservation for data cubes. Knowl Inf Syst 9(1): 38–61

    Article  Google Scholar 

  52. Vaidya J, Clifton CC (2002) Privacy preserving association rule mining in vertically partitioned data. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD. ACM Press, Edmonton, pp 639–644

  53. Vaidya J, Clifton CC (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the SIGKDD-ACM conference of data mining. ACM Press, Washington, DC, pp 206–215

  54. Vaidya J, Clifton C (2004) Privacy-preserving outlier detection. In: Fourth IEEE international conference on data mining (ICDM 2004). IEEE Computer Society, Brighton, pp 233–240

  55. Vaidya J, Clifton C (2005) Privacy-preserving top-k queries. In: Twenty first international conference on data engineering, ICDE 2005. IEEE Computer Society, Tokyo, pp 545–546

  56. Vaidya J, Clifton C, Zhu M (2006) Privacy preserving data mining Advances in information security, vol 19. Springer, New York

    Google Scholar 

  57. Vaidya J, Yu H, Jiang X (2008) Privacy-preserving SVM classification. Knowl Inf Syst 14(2): 161–178

    Article  Google Scholar 

  58. van der Putten P, van Someren M (2000) CoIL challenge 2000: the insurance company case. Technical Report 2000-09, Leiden Institute of Advanced Computer Science, Amsterdam

  59. Wang K, Fung B, Yu P (2007) Handicapping attacker’s confidence: an alternative to k-anonymization. Knowl Inf Syst 11(3): 345–368

    Article  Google Scholar 

  60. Witten I, Frank E (2000) Data mining—practical machine learning tools and technologies with JAVA implementations. Morgan Kaufmann, San Mateo

    Google Scholar 

  61. Wu X, Kumar V, Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37

    Article  Google Scholar 

  62. Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. Knowl Inf Syst 10(3): 383–397

    Article  Google Scholar 

  63. Yao A (1982) Protocols for secure computation. In: IEEE symposium of foundations of computer science. IEEE Computer Society, pp 160–164

  64. Yu B, Ooi C, Tan K, Jagadish H (2001) Indexing the distance: an efficient method to KNN processing. In: Twenty seventh VLDB conference, Rome, Italy, pp 421–430

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Estivill-Castro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amirbekyan, A., Estivill-Castro, V. Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets. Knowl Inf Syst 21, 327–363 (2009). https://doi.org/10.1007/s10115-009-0233-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0233-z

Keywords

Navigation