Skip to main content

A Plug-n-Play Framework for Scaling Private Set Intersection to Billion-Sized Sets

  • Conference paper
  • First Online:
Cryptology and Network Security (CANS 2023)

Abstract

Motivated by the recent advances in practical secure computation, we design and implement a framework for scaling solutions for the problem of private set intersection (PSI) into the realm of big data. A protocol for PSI enables two parties each holding a set of elements to jointly compute the intersection of these sets without revealing the elements that are not in the intersection. Following a long line of research, recent protocols for PSI only have \({\approx }5\times \) computation and communication overhead over an insecure set intersection. However, this performance is typically demonstrated for set sizes in the order of ten million. In this work, we aim to scale these protocols to efficiently handle set sizes of one billion elements or more.

We achieve this via a careful application of a binning approach that enables parallelizing any arbitrary PSI protocol. Building on this idea, we designed and implemented a framework which takes a pair of PSI executables (i.e., for each of the two parties) that typically works for million-sized sets, and then scales it to billion-sized sets (and beyond). For example, our framework can perform a join of billion-sized sets in 83 min compared to 2000 min of Pinkas et al. (ACM TPS 2018), an improvement of \(25\times \). Furthermore, we present an end-to-end Spark application where two enterprises, each possessing private databases, can perform a restricted class of database join operations (specifically, join operations with only an on clause which is a conjunction of equality checks involving attributes from both parties, followed by a where clause which can be split into conjunctive clauses where each conjunction is a function of a single table) without revealing any data that is not part of the output.

This work was done while all authors were at Visa Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This does not necessarily apply to the setting of unbalanced PSI where the set sizes can be orders of magnitude apart [3, 7, 11, 27, 48]. For instance, [14] do unbalanced PSI with \(2^{28}\) elements on one side and say 1024 elements on the other side.

  2. 2.

    For a further breakdown of this number, [51] note that 30.0 h (88%) are for simple hashing (cuckoo hashing runs in parallel and requires 16.3 h), 3 h (9%) are for computing the OTs, and 1.2 h (4%) are for computing the plaintext intersection.

  3. 3.

    Using \(m \approx n\) in our self-reduction would incur an unacceptable overhead due to padding. Please see Sect. 3.1 on how to choose the optimum value of m.

  4. 4.

    In that Section, they also analyze the choice of m for PSI with unbalanced sets.

  5. 5.

    This is a standard technique to capture protocols in cryptography, for example while designing zero-knowledge compilers that transform a semi-honest secure protocol into a maliciously secure protocol.

  6. 6.

    Most PSI protocols have very few rounds (exceptions include circuit PSI protocols that rely on the GMW compiler).

  7. 7.

    We support any PSI protocol irrespective of the underlying cryptographic assumptions or algorithmic techniques.

  8. 8.

    If we have k worker nodes on each side, then we can run k instances of \(\varPi \) in parallel, and repeat this \(m/k\) times to complete the PSI portion of the execution.

  9. 9.

    Restricting clauses this way enables us to reduce the above problem to the PSI problem. We note that the restriction above can be lifted if we use more sophisticated PSI protocols that can keep the PSI output in secret shared form without revealing it. We leave this for future work.

  10. 10.

    This corresponds to \(\delta _0 = 0.019\) for a bin size of \(\approx \)500K (cf. Sect. 3.1).

References

  1. Abadi, A., Terzis, S., Dong, C.: O-PSI: delegated private set intersection on outsourced datasets. In: Federrath, H., Gollmann, D. (eds.) SEC 2015. IAICT, vol. 455, pp. 3–17. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18467-8_1

    Chapter  Google Scholar 

  2. Abadi, A., Terzis, S., Dong, C.: VD-PSI: verifiable delegated private set intersection on outsourced private datasets. In: Grossklags, J., Preneel, B. (eds.) FC 2016. LNCS, vol. 9603, pp. 149–168. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54970-4_9

    Chapter  Google Scholar 

  3. Kiss, A., Liu, J., Schneider, T., Asokan, N., Pinkas, B.: Private set intersection for unequal set sizes with mobile applications. In: Proceedings on Privacy Enhancing Technologies, no. 4, pp. 177–197 (2017)

    Google Scholar 

  4. Ateniese, G., De Cristofaro, E., Tsudik, G.: (If) size matters: size-hiding private set intersection. In: Catalano, D., Fazio, N., Gennaro, R., Nicolosi, A. (eds.) PKC 2011. LNCS, vol. 6571, pp. 156–173. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19379-8_10

    Chapter  Google Scholar 

  5. Badrinarayanan, S., et al.: A plug-n-play framework for scaling private set intersection to billion-sized sets. Cryptology ePrint Archive, Paper 2022/294 (2022)

    Google Scholar 

  6. Badrinarayanan, S., Miao, P., Rindal, P.: Multi-party threshold private set intersection with sublinear communication. IACR Cryptology ePrint Archive 2020, 600 (2020). https://eprint.iacr.org/2020/600

  7. Pinkas, B., Schneider, T., Tkachenko, O., Yanai, A.: Efficient circuit-based PSI with linear communication. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11478, pp. 122–153. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17659-4_5

    Chapter  Google Scholar 

  8. Blanton, M., Aguiar, E.: Private and oblivious set and multiset operations. Int. J. Inf. Sec. 15(5), 493–518 (2016). https://doi.org/10.1007/s10207-015-0301-1

    Article  Google Scholar 

  9. Brickell, J., Porter, D.E., Shmatikov, V., Witchel, E.: Privacy-preserving remote diagnostics. In: CCS (2007)

    Google Scholar 

  10. Chase, M., Miao, P.: Private set intersection in the internet setting from lightweight oblivious PRF. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020. LNCS, vol. 12172, pp. 34–63. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56877-1_2

    Chapter  Google Scholar 

  11. Chen, H., Laine, K., Rindal, P.: Fast private set intersection from homomorphic encryption. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, 30 October–03 November 2017, pp. 1243–1255. ACM (2017). https://doi.org/10.1145/3133956.3134061

  12. De Cristofaro, E., Gasti, P., Tsudik, G.: Fast and private computation of cardinality of set intersection and union. In: Pieprzyk, J., Sadeghi, A.-R., Manulis, M. (eds.) CANS 2012. LNCS, vol. 7712, pp. 218–231. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35404-5_17

    Chapter  Google Scholar 

  13. De Cristofaro, E., Kim, J., Tsudik, G.: Linear-complexity private set intersection protocols secure in malicious model. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, pp. 213–231. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17373-8_13

    Chapter  Google Scholar 

  14. Kales, D., Rechberger, C., Schneider, T., Senker, M., Weinert, C.: Mobile private contact discovery at scale. In: USENIX Annual Technical Conference, pp. 1447–1464 (2019)

    Google Scholar 

  15. Dave, A., Leung, C., Popa, R.A., Gonzalez, J.E., Stoica, I.: Oblivious coopetitive analytics using hardware enclaves. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–17 (2020)

    Google Scholar 

  16. Davidson, A., Cid, C.: An efficient toolkit for computing private set operations. In: Pieprzyk, J., Suriadi, S. (eds.) ACISP 2017. LNCS, vol. 10343, pp. 261–278. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59870-3_15

    Chapter  Google Scholar 

  17. De Cristofaro, E., Tsudik, G.: Practical private set intersection protocols with linear complexity. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 143–159. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14577-3_13

    Chapter  Google Scholar 

  18. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Sixth Symposium on Operating System Design and Implementation, OSDI 2004, San Francisco, CA, pp. 137–150 (2004)

    Google Scholar 

  19. Demmler, D., Rindal, P., Rosulek, M., Trieu, N.: PIR-PSI: scaling private contact discovery. Proc. Priv. Enhancing Technol. 2018(4), 159–178 (2018). https://doi.org/10.1515/popets-2018-0037

    Article  Google Scholar 

  20. Dong, C., Chen, L., Wen, Z.: When private set intersection meets big data: an efficient and scalable protocol. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 789–800 (2013)

    Google Scholar 

  21. Falk, B.H., Noble, D., Ostrovsky, R.: Private set intersection with linear communication from general assumptions. In: Cavallaro, L., Kinder, J., Domingo-Ferrer, J. (eds.) Proceedings of the 18th ACM Workshop on Privacy in the Electronic Society, WPES@CCS 2019, London, UK, 11 November 2019, pp. 14–25. ACM (2019). https://doi.org/10.1145/3338498.3358645

  22. Freedman, M.J., Hazay, C., Nissim, K., Pinkas, B.: Efficient set intersection with simulation-based security. J. Cryptol. 29(1), 115–155 (2016). https://doi.org/10.1007/s00145-014-9190-0

    Article  MathSciNet  Google Scholar 

  23. Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3_1

    Chapter  Google Scholar 

  24. Ghosh, S., Simkin, M.: The communication complexity of threshold private set intersection. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11693, pp. 3–29. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26951-7_1

    Chapter  Google Scholar 

  25. Asharov, G., Lindell, Y., Schneider, T., Zohner, M.: More efficient oblivious transfer and extensions for faster secure computation. In: CCS, pp. 535–548 (2013)

    Google Scholar 

  26. Hallgren, P.A., Orlandi, C., Sabelfeld, A.: PrivatePool: privacy-preserving ridesharing. In: CSF (2017)

    Google Scholar 

  27. Chen, H., Huang, Z., Laine, K., Rindal, P.: Labeled PSI from fully homomorphic encryption with malicious security. In: CCS, pp. 1223–1237 (2018)

    Google Scholar 

  28. Hazay, C., Nissim, K.: Efficient set operations in the presence of malicious adversaries. In: Nguyen, P.Q., Pointcheval, D. (eds.) PKC 2010. LNCS, vol. 6056, pp. 312–331. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13013-7_19

    Chapter  Google Scholar 

  29. Hazay, C., Venkitasubramaniam, M.: Scalable multi-party private set-intersection. In: Fehr, S. (ed.) PKC 2017. LNCS, vol. 10174, pp. 175–203. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54365-8_8

    Chapter  Google Scholar 

  30. Huang, Y., Evans, D., Katz, J., Malka, L.: Faster secure two-party computation using garbled circuits. In: 20th USENIX Security Symposium, San Francisco, CA, USA, 8–12 August 2011, Proceedings. USENIX Association (2011). http://static.usenix.org/events/sec11/tech/full_papers/Huang.pdf

  31. Huberman, B.A., Franklin, M.K., Hogg, T.: Enhancing privacy and trust in electronic communities. In: Feldman, S.I., Wellman, M.P. (eds.) Proceedings of the First ACM Conference on Electronic Commerce (EC-99), Denver, CO, USA, 3–5 November 1999, pp. 78–86. ACM (1999). https://doi.org/10.1145/336992.337012

  32. Ion, M., et al.: On deploying secure computing commercially: private intersection-sum protocols and their business applications. IACR Cryptology ePrint Archive 2019, 723 (2019). https://eprint.iacr.org/2019/723

  33. Ion, M., et al.: Private intersection-sum protocol with applications to attributing aggregate ad conversions (2017). ia.cr/2017/735

    Google Scholar 

  34. Kissner, L., Song, D.: Privacy-preserving set operations. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 241–257. Springer, Heidelberg (2005). https://doi.org/10.1007/11535218_15

    Chapter  Google Scholar 

  35. Kolesnikov, V., Kumaresan, R., Rosulek, M., Trieu, N.: Efficient batched oblivious PRF with applications to private set intersection. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 818–829 (2016)

    Google Scholar 

  36. Kolesnikov, V., Matania, N., Pinkas, B., Rosulek, M., Trieu, N.: Practical multi-party private set intersection from symmetric-key techniques. In: CCS (2017)

    Google Scholar 

  37. Kolesnikov, V., Rosulek, M., Trieu, N., Wang, X.: Scalable private set union from symmetric-key techniques. In: Galbraith, S.D., Moriai, S. (eds.) ASIACRYPT 2019. LNCS, vol. 11922, pp. 636–666. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34621-8_23

    Chapter  Google Scholar 

  38. Livy, A.: Apache Livy (2017). https://livy.apache.org/

  39. Meadows, C.A.: A more efficient cryptographic matchmaking protocol for use in the absence of a continuously available third party. In: Proceedings of the 1986 IEEE Symposium on Security and Privacy, Oakland, California, USA, 7–9 April 1986, pp. 134–137. IEEE Computer Society (1986). https://doi.org/10.1109/SP.1986.10022

  40. Ciampi, M., Orlandi, C.: Combining private set-intersection with secure two-party computation. In: Catalano, D., De Prisco, R. (eds.) SCN 2018. LNCS, vol. 11035, pp. 464–482. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98113-0_25

    Chapter  Google Scholar 

  41. Nagaraja, S., Mittal, P., Hong, C.Y., Caesar, M., Borisov, N.: BotGrep: finding P2P bots with structured graph analysis. In: USENIX Security Symposium (2010)

    Google Scholar 

  42. Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., Boneh, D.: Location privacy via private proximity testing. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February–9th February 2011. The Internet Society (2011). https://www.ndss-symposium.org/ndss2011/privacy-private-proximity-testing-paper

  43. Orrù, M., Orsini, E., Scholl, P.: Actively secure 1-out-of-N OT extension with application to private set intersection. In: Handschuh, H. (ed.) CT-RSA 2017. LNCS, vol. 10159, pp. 381–396. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52153-4_22

    Chapter  Google Scholar 

  44. Papadimitriou, A., et al.: Big data analytics over encrypted datasets with seabed. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 587–602 (2016)

    Google Scholar 

  45. Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: SpOT-light: lightweight private set intersection from sparse OT extension. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11694, pp. 401–431. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26954-8_13

    Chapter  Google Scholar 

  46. Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: PSI from PaXoS: fast, malicious private set intersection. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020. LNCS, vol. 12106, pp. 739–767. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45724-2_25

    Chapter  Google Scholar 

  47. Pinkas, B., Schneider, T., Segev, G., Zohner, M.: Phasing: private set intersection using permutation-based hashing. In: USENIX (2015)

    Google Scholar 

  48. Pinkas, B., Schneider, T., Weinert, C., Wieder, U.: Efficient circuit-based PSI via cuckoo hashing. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10822, pp. 125–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78372-7_5

    Chapter  Google Scholar 

  49. Pinkas, B., Schneider, T., Zohner, M.: Faster private set intersection based on OT extension. In: USENIX (2014)

    Google Scholar 

  50. Pinkas, B., Schneider, T., Zohner, M.: Scalable private set intersection based on OT extension. IACR Cryptology ePrint Archive 2016, 930 (2016). http://eprint.iacr.org/2016/930

  51. Pinkas, B., Schneider, T., Zohner, M.: Scalable private set intersection based on OT extension. ACM Trans. Priv. Secur. 21(2), 7:1–7:35 (2018). https://doi.org/10.1145/3154794

  52. Popa, R.A., Redfield, C.M., Zeldovich, N., Balakrishnan, H.: CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 85–100 (2011)

    Google Scholar 

  53. Resende, A.C.D., Aranha, D.F.: Unbalanced approximate private set intersection. IACR Cryptology ePrint Archive 2017, 677 (2017). http://eprint.iacr.org/2017/677

  54. Rindal, P.: libPSI: an efficient, portable, and easy to use Private Set Intersection Library. https://github.com/osu-crypto/libPSI

  55. Rindal, P., Rosulek, M.: Improved private set intersection against malicious adversaries. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. LNCS, vol. 10210, pp. 235–259. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56620-7_9

    Chapter  Google Scholar 

  56. Rindal, P., Rosulek, M.: Malicious-secure private set intersection via dual execution. In: CCS (2017)

    Google Scholar 

  57. Poddar, R., Kalra, S., Yanai, A., Deng, R., Popa, R.A., Hellerstein, J.M.: Senate: a maliciously-secure MPC platform for collaborative analytics. IACR Cryptology ePrint Archive 2020, 1350 (2020)

    Google Scholar 

  58. Kamara, S., Mohassel, P., Raykova, M., Sadeghian, S.: Scaling private set intersection to billion-element sets. In: Financial Cryptography and Data Security, pp. 195–215 (2014)

    Google Scholar 

  59. Kolesnikov, V., Kumaresan, R.: Improved OT extension for transferring short secrets. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8043, pp. 54–70. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40084-1_4

    Chapter  Google Scholar 

  60. Wikipedia: Java native interface - Wikipedia (2020). https://en.wikipedia.org/wiki/Java_Native_Interface

  61. Sun, Y., Hua, Y., Jiang, S., Li, Q., Cao, S., Zuo, P.: SmartCuckoo: a fast and cost-efficient hashing index scheme for cloud storage systems. In: USENIX Annual Technical Conference, pp. 553–565 (2017)

    Google Scholar 

  62. Ishai, Y., Kilian, J., Nissim, K., Petrank, E.: Extending oblivious transfers efficiently. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 145–161. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4_9

    Chapter  Google Scholar 

  63. Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as Part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2012), pp. 15–28 (2012)

    Google Scholar 

  64. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, USA, p. 10. USENIX Association (2010)

    Google Scholar 

  65. Zheng, W., Dave, A., Beekman, J.G., Popa, R.A., Gonzalez, J.E., Stoica, I.: Opaque: an oblivious and encrypted distributed analytics platform. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2017), pp. 283–298 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Srinivasan Raghuraman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Badrinarayanan, S. et al. (2023). A Plug-n-Play Framework for Scaling Private Set Intersection to Billion-Sized Sets. In: Deng, J., Kolesnikov, V., Schwarzmann, A.A. (eds) Cryptology and Network Security. CANS 2023. Lecture Notes in Computer Science, vol 14342. Springer, Singapore. https://doi.org/10.1007/978-981-99-7563-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7563-1_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7562-4

  • Online ISBN: 978-981-99-7563-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics