Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Amirbekyan, Artak; Estivill-Castro, Vladimir

doi:10.1007/s10115-009-0233-z

Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Regular Paper
Published: 17 July 2009

Volume 21, pages 327–363, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Artak Amirbekyan¹ &
Vladimir Estivill-Castro²

209 Accesses
7 Citations
Explore all metrics

Abstract

Finding the nearest k objects to a query object is a fundamental operation for many data mining algorithms. With the recent interest in privacy, it is not surprising that there is strong interest in k-NN queries to enable clustering, classification and outlier-detection tasks. However, previous approaches to privacy-preserving k-NN have been costly and can only be realistically applied to small data sets. In this paper, we provide efficient solutions for k-NN queries for vertically partitioned data. We provide the first solution for the L _∞ (or Chessboard) metric as well as detailed privacy-preserving computation of all other Minkowski metrics. We enable privacy-preserving L _∞ by providing a practical approach to the Yao’s millionaires problem with more than two parties. This is based on a pragmatic and implementable solution to Yao’s millionaires problem with shares. We also provide privacy-preserving algorithms for combinations of local metrics into a global metric that handles the large dimensionality and diversity of attributes common in vertically partitioned data. To manage very large data sets, we provide a privacy-preserving SASH (a very successful data structure for associative queries in high dimensions). Besides providing a theoretical analysis, we illustrate the efficiency of our approach with an empirical evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stratified random sampling from streaming and stored data

Article 23 October 2020

Big data privacy: a technological perspective and review

Article Open access 26 November 2016

Big Data Security and Privacy

References

Amirbekyan A, Estivill-Castro V (2006) Privacy preserving DBSCAN for vertically partitioned data. In: IEEE international conference on intelligence and security informatics, ISI 2006. Lecture notes in computer science, vol 3975. Springer, San Diego, pp 141–153
Amirbekyan A, Estivill-Castro V (2007) The privacy of k-NN retrieval for horizontal partitioned data—new methods and applications. In: Bailey J, Fekete A (eds) Eighteenth Australasian database conference (ADC2007). Conferences in research and practice in information technology (CRPIT), CORE, Australasian Computer Society, Ballarat, Victoria, Australia, pp 33–42
Ankerst M, Kastenmueller G, Kriegel H, Seidl T (1999) Nearest neighbor classification in 3D protein databases. In: Proceedings of the seventh international conference on intelligent systems for molecular biology (ISMB-99), pp 34–43
Ankerst M, Kriegel H, Seidl T (1998) A multi-step approach for shape similarity in image databases. IEEE Trans Data Eng 10(6): 996–1004
Article Google Scholar
Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1994) An optimal algorithm for approximate nearest neighbor searching. In: 5th annual ACM-SIAM symposium on discrete algorithms, pp 573–582
Avidan S, Butman M (2006) Blind vision. In: Leonardis S, Bischof H, Pinz A (eds) Computer vision—ECCV 2006, 9th European conference on computer vision, Part III’. Lecture notes in computer science, vol 3953. Springer, Graz, pp 1–13
Google Scholar
Beimel A, Ishai Y (2005) On the power of nonlinear secret-sharing. SIAM J Discrete Math 19(1): 258–280
Article MATH MathSciNet Google Scholar
Bentley J (1975) Multidimensional binary search trees used for associative retrieval. Commun ACM 18(9): 509–517
Article MATH MathSciNet Google Scholar
Berchtold S, Keim DA, Kriegel H-P (1996) The X-tree: an index structure for higher dimensional data. In: Proceedings of twentieth VLDB conference, pp 28–39
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful. In: International conference on database theory, Jerusalem, Israel, pp 217–225
Blundo C, Masucci B, Stinson D, Wei R (2002) Constructions and bounds for unconditionally secure non-interactive commitment schemes. Designs Codes Cryptography 26: 97–110
Article MATH MathSciNet Google Scholar
Breunig M, Kriegel H-P, Ng R, Sander J (2000) Lof: Identifying density-based local outliers. In: Chen W, Naughton JF, Bernstein P (eds) Proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM Press, Dallas, pp 93–104
Chapter Google Scholar
Cachin C (1999) Efficient private bidding and auctions with an oblivious third party. In: Proceedings of the sixth ACM conference on computer and communications security, SIGSAC, ACM Press, Singapore, pp 120–127
Cheng X, Dolin R, Neary M, Prabhakar S, Ravikanth K, Wu D, Agrawal D, El Abbadi E, Freeston M, Singh A, Smith T, Su J (1997) Scalable access within the context of digital libraries. In: Proceedings of the international conference on advances in digital libraries. ADL, Washington, DC, pp 70–81
Ciaccia P, Patella M (2000) PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the international conference on data engineering, San Jose, California, pp 244–255
Damgard I, Pedersen T, Pfitzmann B (1998) Statistical secrecy and multibit commitments. IEEE Trans Inf Theory 44(3): 1143–1151
Article MATH MathSciNet Google Scholar
Du W, Atallah M (2001) Privacy-preserving cooperative statistical analysis. In: Proceedings of the seventeenth annual computer security applications conference (ACSAC), ACM SIGSAC. IEEE Computer Society, New Orleans, pp 102–110
Du W, Han Y-S, Chen S (2004) Privacy-preserving multivariate statistical analysis: linear regression and classification. In: BM W, Dayal U, Kamath C, Skillicorn D (eds) 2004 SIAM international conference on data mining. Lake Buena Vista, Florida, pp 222–233
Google Scholar
Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Estivill-Castro V, Clifton C (eds) Privacy, security and data mining, IEEE ICDM workshop proceedings, Conferences in research and practice in information technology series, Vol 14. Australian Computer Society, Sydney, pp 1–8
Google Scholar
Ester M, Kriegel H, Sander S, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (eds) Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), AAAI. AAAI Press, Menlo Park, pp 226–231
Google Scholar
Estivill-Castro V (2004) Private representative-based clustering for vertically partitioned data. In: Baeza-Yates R, Marroquin J, Chávez E (eds) Fifth Mexican international conference on computer science (ENC 04), SMCC. IEEE Computer Society Press, Colima, pp 160–167
Chapter Google Scholar
Estivill-Castro V, Yang J (2002) Clustering Web visitors by fast, robust and convergent algorithms. Int J Found Comput Sci 13(4): 497–520
Article MATH Google Scholar
Fagin R (1996) Combining fuzzy information from multiple systems. In: Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM Press, Montreal, pp 216–226
Faloutsos C, Ranganathan M, Manolopoulos Y (May 1994) Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Minneapolis, pp 419–429
Ferhatosmanoglu H, Tuncel E, Agrawal D, El Abbadi A (2002) Approximate nearest neighbor searching in multimedia databases. In: Seventh IEEE international conference on data engineering (ICDE), Germany, pp 503–511
Goethals B, Laur S, Lipmaa H, Mielikäinen (2005) On private scalar product computation for privacy-preserving data mining. In: Park C, Chee S (eds) Information security and cryptology—ICISC 2004. Lecture notes in computer science, vol 3506. Springer, Berlin, pp 104–120
Google Scholar
Goldreich O (2004) The foundations of cryptography, vol 2, chapt General cryptographic protocols. Cambridge University Press, London
Google Scholar
Goldreich O, Micali S, Wigderson A (1987) How to play any mental game (extended abstract). In: Aho A (eds) Proceedings of the nineteenth ACM annual symposium on theory of computing. ACM Press, New York, pp 218–229
Chapter Google Scholar
Guttmann A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD international conference on management of data, pp 47–57
Houle M (2003a) Navigating massive data sets via local clustering. In: Getoor L, Senator T, Domingos P, Faloutsos C (eds) Ninth ACM SIGKDD conference on knowledge discovery and data mining. ACM Press, Washington, pp 547–552
Chapter Google Scholar
Houle M (2003b) SASH: a spatial approximation sample hierarchy for similarity, Technical Report RT-0517, IBM Tokyo Research Laboratory, p 16
Houle M, Sakuma J (2005) Fast approximate similarity search in extremely high-dimensional data sets. In: Twenty first international conference on data engineering ICDE. IEEE Computer Society, Tokyo, pp 619–630
Ioannidis I, Grama A (2003) An efficient protocol for Yao’s millionaires’ problem. In: Thirty sixth Hawaii international conference on system sciences HICSS. IEEE Computer Society, Big Island, p 205
Kahveci T, Singh AK (2001) An efficient index structure for string databases. In: VLDB conference, Rome, Italy, pp 351–360
Kantarcioğlu M, Clifton C (2004) Privately computing a distributed k-nn classifier. In: Boulicaut J-F (eds) Eighth European conference on principles and practice of knowledge discovery in databases PKDD. Lecture notes in computer science, vol 3202. Springer, Berlin, pp 279–290
Google Scholar
Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4): 387–414
Article Google Scholar
Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbour queries. In: ACM SIGMOD conference on management of data, Tucson, USA, pp 369–380
Korn F, Sidropoulos N, Faloutsos C, Siegel E, Proropapas Z (1996) Fast nearest neighbor search in medical image databases. In: VLDB, Mumbai, India, pp 215–226
Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1): 68–81
Article Google Scholar
Malkhi D, Nisan N, Pinkas B, Sella Y (2004) Fairplay—a secure two-party computation system. In: Proceedings of the thirteenth conference on USENIX security symposium, vol 13, USENIX Association, p 20
Mena J (2003) Investigative data mining for security and criminal detection. Butterworth-Heinemann, London
Google Scholar
Morimoto Y, Aono M, Houle M, McCurley K (2003) Extracting spatial knowledge from the Web. In: International symposium on applications and the internet (SAINT), Orlando, USA, pp 326–333
Nielsen J, Schwatzbach M (2007) A domain-specific programming language for secure multiparty computation. In: Conference on programming language design and implementation—Proceedings on the 2007 workshop on programming languages and analysis for security, SIGPLAN, ACM Press, New York, pp 21–30
Peng K, Boyd C, Dawson E, Lee B (2005) An efficient and verifiable solution to the millionaire problem. In: Park C, Chee S (eds) Information security and cryptology—ICISC 2004’. Lecture notes in computer science, vol 3506. Springer, Berlin, pp 51–66
Google Scholar
Rivest R (1999) Unconditionally secure commitment and oblivious transfer schemes using concealing channels and a trusted initializer (manuscript)
Roussopoulos N, Kelly S, Vincent F (1995) Nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, San Jose, pp 71–79
Shaneck M, Kim Y, Kumar V (2006) Privacy preserving nearest neighbor search. In: ICDMW ’06: Proceedings of the sixth IEEE international conference on data mining—Workshops. IEEE Computer Society, Washington, DC, pp 541–545
Shannon C (1949) Communication theory of secret systems. Bell Systems Technical Journal 28: 656–715
MathSciNet Google Scholar
Stinson DR (1995) Cryptography: theory and practice. CRC, Boca Raton
MATH Google Scholar
Subrahmanian V (1999) Principles of multimedia database systems. Morgan Kaufmann, San Francisco
Google Scholar
Sung S, Liu Y, Xiong H, Ng P (2006) Privacy preservation for data cubes. Knowl Inf Syst 9(1): 38–61
Article Google Scholar
Vaidya J, Clifton CC (2002) Privacy preserving association rule mining in vertically partitioned data. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD. ACM Press, Edmonton, pp 639–644
Vaidya J, Clifton CC (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the SIGKDD-ACM conference of data mining. ACM Press, Washington, DC, pp 206–215
Vaidya J, Clifton C (2004) Privacy-preserving outlier detection. In: Fourth IEEE international conference on data mining (ICDM 2004). IEEE Computer Society, Brighton, pp 233–240
Vaidya J, Clifton C (2005) Privacy-preserving top-k queries. In: Twenty first international conference on data engineering, ICDE 2005. IEEE Computer Society, Tokyo, pp 545–546
Vaidya J, Clifton C, Zhu M (2006) Privacy preserving data mining Advances in information security, vol 19. Springer, New York
Google Scholar
Vaidya J, Yu H, Jiang X (2008) Privacy-preserving SVM classification. Knowl Inf Syst 14(2): 161–178
Article Google Scholar
van der Putten P, van Someren M (2000) CoIL challenge 2000: the insurance company case. Technical Report 2000-09, Leiden Institute of Advanced Computer Science, Amsterdam
Wang K, Fung B, Yu P (2007) Handicapping attacker’s confidence: an alternative to k-anonymization. Knowl Inf Syst 11(3): 345–368
Article Google Scholar
Witten I, Frank E (2000) Data mining—practical machine learning tools and technologies with JAVA implementations. Morgan Kaufmann, San Mateo
Google Scholar
Wu X, Kumar V, Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Article Google Scholar
Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. Knowl Inf Syst 10(3): 383–397
Article Google Scholar
Yao A (1982) Protocols for secure computation. In: IEEE symposium of foundations of computer science. IEEE Computer Society, pp 160–164
Yu B, Ooi C, Tan K, Jagadish H (2001) Indexing the distance: an efficient method to KNN processing. In: Twenty seventh VLDB conference, Rome, Italy, pp 421–430

Download references

Author information

Authors and Affiliations

Earth Systems Science Computational Centre, The University of Queensland, Brisbane, QLD, 4072, Australia
Artak Amirbekyan
School of ICT, Griffith University, Brisbane, QLD, 4111, Australia
Vladimir Estivill-Castro

Authors

Artak Amirbekyan
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Estivill-Castro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Estivill-Castro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amirbekyan, A., Estivill-Castro, V. Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets. Knowl Inf Syst 21, 327–363 (2009). https://doi.org/10.1007/s10115-009-0233-z

Download citation

Received: 06 September 2007
Revised: 18 March 2009
Accepted: 20 June 2009
Published: 17 July 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/s10115-009-0233-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Big data privacy: a technological perspective and review

Big Data Security and Privacy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Big data privacy: a technological perspective and review

Big Data Security and Privacy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation