Abstract
Motivated by the recent advances in practical secure computation, we design and implement a framework for scaling solutions for the problem of private set intersection (PSI) into the realm of big data. A protocol for PSI enables two parties each holding a set of elements to jointly compute the intersection of these sets without revealing the elements that are not in the intersection. Following a long line of research, recent protocols for PSI only have \({\approx }5\times \) computation and communication overhead over an insecure set intersection. However, this performance is typically demonstrated for set sizes in the order of ten million. In this work, we aim to scale these protocols to efficiently handle set sizes of one billion elements or more.
We achieve this via a careful application of a binning approach that enables parallelizing any arbitrary PSI protocol. Building on this idea, we designed and implemented a framework which takes a pair of PSI executables (i.e., for each of the two parties) that typically works for million-sized sets, and then scales it to billion-sized sets (and beyond). For example, our framework can perform a join of billion-sized sets in 83 min compared to 2000 min of Pinkas et al. (ACM TPS 2018), an improvement of \(25\times \). Furthermore, we present an end-to-end Spark application where two enterprises, each possessing private databases, can perform a restricted class of database join operations (specifically, join operations with only an on clause which is a conjunction of equality checks involving attributes from both parties, followed by a where clause which can be split into conjunctive clauses where each conjunction is a function of a single table) without revealing any data that is not part of the output.
This work was done while all authors were at Visa Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
For a further breakdown of this number, [51] note that 30.0 h (88%) are for simple hashing (cuckoo hashing runs in parallel and requires 16.3 h), 3 h (9%) are for computing the OTs, and 1.2 h (4%) are for computing the plaintext intersection.
- 3.
Using \(m \approx n\) in our self-reduction would incur an unacceptable overhead due to padding. Please see Sect. 3.1 on how to choose the optimum value of m.
- 4.
In that Section, they also analyze the choice of m for PSI with unbalanced sets.
- 5.
This is a standard technique to capture protocols in cryptography, for example while designing zero-knowledge compilers that transform a semi-honest secure protocol into a maliciously secure protocol.
- 6.
Most PSI protocols have very few rounds (exceptions include circuit PSI protocols that rely on the GMW compiler).
- 7.
We support any PSI protocol irrespective of the underlying cryptographic assumptions or algorithmic techniques.
- 8.
If we have k worker nodes on each side, then we can run k instances of \(\varPi \) in parallel, and repeat this \(m/k\) times to complete the PSI portion of the execution.
- 9.
Restricting clauses this way enables us to reduce the above problem to the PSI problem. We note that the restriction above can be lifted if we use more sophisticated PSI protocols that can keep the PSI output in secret shared form without revealing it. We leave this for future work.
- 10.
This corresponds to \(\delta _0 = 0.019\) for a bin size of \(\approx \)500K (cf. Sect. 3.1).
References
Abadi, A., Terzis, S., Dong, C.: O-PSI: delegated private set intersection on outsourced datasets. In: Federrath, H., Gollmann, D. (eds.) SEC 2015. IAICT, vol. 455, pp. 3–17. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18467-8_1
Abadi, A., Terzis, S., Dong, C.: VD-PSI: verifiable delegated private set intersection on outsourced private datasets. In: Grossklags, J., Preneel, B. (eds.) FC 2016. LNCS, vol. 9603, pp. 149–168. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54970-4_9
Kiss, A., Liu, J., Schneider, T., Asokan, N., Pinkas, B.: Private set intersection for unequal set sizes with mobile applications. In: Proceedings on Privacy Enhancing Technologies, no. 4, pp. 177–197 (2017)
Ateniese, G., De Cristofaro, E., Tsudik, G.: (If) size matters: size-hiding private set intersection. In: Catalano, D., Fazio, N., Gennaro, R., Nicolosi, A. (eds.) PKC 2011. LNCS, vol. 6571, pp. 156–173. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19379-8_10
Badrinarayanan, S., et al.: A plug-n-play framework for scaling private set intersection to billion-sized sets. Cryptology ePrint Archive, Paper 2022/294 (2022)
Badrinarayanan, S., Miao, P., Rindal, P.: Multi-party threshold private set intersection with sublinear communication. IACR Cryptology ePrint Archive 2020, 600 (2020). https://eprint.iacr.org/2020/600
Pinkas, B., Schneider, T., Tkachenko, O., Yanai, A.: Efficient circuit-based PSI with linear communication. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11478, pp. 122–153. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17659-4_5
Blanton, M., Aguiar, E.: Private and oblivious set and multiset operations. Int. J. Inf. Sec. 15(5), 493–518 (2016). https://doi.org/10.1007/s10207-015-0301-1
Brickell, J., Porter, D.E., Shmatikov, V., Witchel, E.: Privacy-preserving remote diagnostics. In: CCS (2007)
Chase, M., Miao, P.: Private set intersection in the internet setting from lightweight oblivious PRF. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020. LNCS, vol. 12172, pp. 34–63. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56877-1_2
Chen, H., Laine, K., Rindal, P.: Fast private set intersection from homomorphic encryption. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, 30 October–03 November 2017, pp. 1243–1255. ACM (2017). https://doi.org/10.1145/3133956.3134061
De Cristofaro, E., Gasti, P., Tsudik, G.: Fast and private computation of cardinality of set intersection and union. In: Pieprzyk, J., Sadeghi, A.-R., Manulis, M. (eds.) CANS 2012. LNCS, vol. 7712, pp. 218–231. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35404-5_17
De Cristofaro, E., Kim, J., Tsudik, G.: Linear-complexity private set intersection protocols secure in malicious model. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, pp. 213–231. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17373-8_13
Kales, D., Rechberger, C., Schneider, T., Senker, M., Weinert, C.: Mobile private contact discovery at scale. In: USENIX Annual Technical Conference, pp. 1447–1464 (2019)
Dave, A., Leung, C., Popa, R.A., Gonzalez, J.E., Stoica, I.: Oblivious coopetitive analytics using hardware enclaves. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–17 (2020)
Davidson, A., Cid, C.: An efficient toolkit for computing private set operations. In: Pieprzyk, J., Suriadi, S. (eds.) ACISP 2017. LNCS, vol. 10343, pp. 261–278. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59870-3_15
De Cristofaro, E., Tsudik, G.: Practical private set intersection protocols with linear complexity. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 143–159. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14577-3_13
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Sixth Symposium on Operating System Design and Implementation, OSDI 2004, San Francisco, CA, pp. 137–150 (2004)
Demmler, D., Rindal, P., Rosulek, M., Trieu, N.: PIR-PSI: scaling private contact discovery. Proc. Priv. Enhancing Technol. 2018(4), 159–178 (2018). https://doi.org/10.1515/popets-2018-0037
Dong, C., Chen, L., Wen, Z.: When private set intersection meets big data: an efficient and scalable protocol. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 789–800 (2013)
Falk, B.H., Noble, D., Ostrovsky, R.: Private set intersection with linear communication from general assumptions. In: Cavallaro, L., Kinder, J., Domingo-Ferrer, J. (eds.) Proceedings of the 18th ACM Workshop on Privacy in the Electronic Society, WPES@CCS 2019, London, UK, 11 November 2019, pp. 14–25. ACM (2019). https://doi.org/10.1145/3338498.3358645
Freedman, M.J., Hazay, C., Nissim, K., Pinkas, B.: Efficient set intersection with simulation-based security. J. Cryptol. 29(1), 115–155 (2016). https://doi.org/10.1007/s00145-014-9190-0
Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3_1
Ghosh, S., Simkin, M.: The communication complexity of threshold private set intersection. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11693, pp. 3–29. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26951-7_1
Asharov, G., Lindell, Y., Schneider, T., Zohner, M.: More efficient oblivious transfer and extensions for faster secure computation. In: CCS, pp. 535–548 (2013)
Hallgren, P.A., Orlandi, C., Sabelfeld, A.: PrivatePool: privacy-preserving ridesharing. In: CSF (2017)
Chen, H., Huang, Z., Laine, K., Rindal, P.: Labeled PSI from fully homomorphic encryption with malicious security. In: CCS, pp. 1223–1237 (2018)
Hazay, C., Nissim, K.: Efficient set operations in the presence of malicious adversaries. In: Nguyen, P.Q., Pointcheval, D. (eds.) PKC 2010. LNCS, vol. 6056, pp. 312–331. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13013-7_19
Hazay, C., Venkitasubramaniam, M.: Scalable multi-party private set-intersection. In: Fehr, S. (ed.) PKC 2017. LNCS, vol. 10174, pp. 175–203. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54365-8_8
Huang, Y., Evans, D., Katz, J., Malka, L.: Faster secure two-party computation using garbled circuits. In: 20th USENIX Security Symposium, San Francisco, CA, USA, 8–12 August 2011, Proceedings. USENIX Association (2011). http://static.usenix.org/events/sec11/tech/full_papers/Huang.pdf
Huberman, B.A., Franklin, M.K., Hogg, T.: Enhancing privacy and trust in electronic communities. In: Feldman, S.I., Wellman, M.P. (eds.) Proceedings of the First ACM Conference on Electronic Commerce (EC-99), Denver, CO, USA, 3–5 November 1999, pp. 78–86. ACM (1999). https://doi.org/10.1145/336992.337012
Ion, M., et al.: On deploying secure computing commercially: private intersection-sum protocols and their business applications. IACR Cryptology ePrint Archive 2019, 723 (2019). https://eprint.iacr.org/2019/723
Ion, M., et al.: Private intersection-sum protocol with applications to attributing aggregate ad conversions (2017). ia.cr/2017/735
Kissner, L., Song, D.: Privacy-preserving set operations. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 241–257. Springer, Heidelberg (2005). https://doi.org/10.1007/11535218_15
Kolesnikov, V., Kumaresan, R., Rosulek, M., Trieu, N.: Efficient batched oblivious PRF with applications to private set intersection. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 818–829 (2016)
Kolesnikov, V., Matania, N., Pinkas, B., Rosulek, M., Trieu, N.: Practical multi-party private set intersection from symmetric-key techniques. In: CCS (2017)
Kolesnikov, V., Rosulek, M., Trieu, N., Wang, X.: Scalable private set union from symmetric-key techniques. In: Galbraith, S.D., Moriai, S. (eds.) ASIACRYPT 2019. LNCS, vol. 11922, pp. 636–666. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34621-8_23
Livy, A.: Apache Livy (2017). https://livy.apache.org/
Meadows, C.A.: A more efficient cryptographic matchmaking protocol for use in the absence of a continuously available third party. In: Proceedings of the 1986 IEEE Symposium on Security and Privacy, Oakland, California, USA, 7–9 April 1986, pp. 134–137. IEEE Computer Society (1986). https://doi.org/10.1109/SP.1986.10022
Ciampi, M., Orlandi, C.: Combining private set-intersection with secure two-party computation. In: Catalano, D., De Prisco, R. (eds.) SCN 2018. LNCS, vol. 11035, pp. 464–482. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98113-0_25
Nagaraja, S., Mittal, P., Hong, C.Y., Caesar, M., Borisov, N.: BotGrep: finding P2P bots with structured graph analysis. In: USENIX Security Symposium (2010)
Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., Boneh, D.: Location privacy via private proximity testing. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February–9th February 2011. The Internet Society (2011). https://www.ndss-symposium.org/ndss2011/privacy-private-proximity-testing-paper
Orrù, M., Orsini, E., Scholl, P.: Actively secure 1-out-of-N OT extension with application to private set intersection. In: Handschuh, H. (ed.) CT-RSA 2017. LNCS, vol. 10159, pp. 381–396. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52153-4_22
Papadimitriou, A., et al.: Big data analytics over encrypted datasets with seabed. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 587–602 (2016)
Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: SpOT-light: lightweight private set intersection from sparse OT extension. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11694, pp. 401–431. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26954-8_13
Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: PSI from PaXoS: fast, malicious private set intersection. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020. LNCS, vol. 12106, pp. 739–767. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45724-2_25
Pinkas, B., Schneider, T., Segev, G., Zohner, M.: Phasing: private set intersection using permutation-based hashing. In: USENIX (2015)
Pinkas, B., Schneider, T., Weinert, C., Wieder, U.: Efficient circuit-based PSI via cuckoo hashing. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10822, pp. 125–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78372-7_5
Pinkas, B., Schneider, T., Zohner, M.: Faster private set intersection based on OT extension. In: USENIX (2014)
Pinkas, B., Schneider, T., Zohner, M.: Scalable private set intersection based on OT extension. IACR Cryptology ePrint Archive 2016, 930 (2016). http://eprint.iacr.org/2016/930
Pinkas, B., Schneider, T., Zohner, M.: Scalable private set intersection based on OT extension. ACM Trans. Priv. Secur. 21(2), 7:1–7:35 (2018). https://doi.org/10.1145/3154794
Popa, R.A., Redfield, C.M., Zeldovich, N., Balakrishnan, H.: CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 85–100 (2011)
Resende, A.C.D., Aranha, D.F.: Unbalanced approximate private set intersection. IACR Cryptology ePrint Archive 2017, 677 (2017). http://eprint.iacr.org/2017/677
Rindal, P.: libPSI: an efficient, portable, and easy to use Private Set Intersection Library. https://github.com/osu-crypto/libPSI
Rindal, P., Rosulek, M.: Improved private set intersection against malicious adversaries. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. LNCS, vol. 10210, pp. 235–259. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56620-7_9
Rindal, P., Rosulek, M.: Malicious-secure private set intersection via dual execution. In: CCS (2017)
Poddar, R., Kalra, S., Yanai, A., Deng, R., Popa, R.A., Hellerstein, J.M.: Senate: a maliciously-secure MPC platform for collaborative analytics. IACR Cryptology ePrint Archive 2020, 1350 (2020)
Kamara, S., Mohassel, P., Raykova, M., Sadeghian, S.: Scaling private set intersection to billion-element sets. In: Financial Cryptography and Data Security, pp. 195–215 (2014)
Kolesnikov, V., Kumaresan, R.: Improved OT extension for transferring short secrets. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8043, pp. 54–70. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40084-1_4
Wikipedia: Java native interface - Wikipedia (2020). https://en.wikipedia.org/wiki/Java_Native_Interface
Sun, Y., Hua, Y., Jiang, S., Li, Q., Cao, S., Zuo, P.: SmartCuckoo: a fast and cost-efficient hashing index scheme for cloud storage systems. In: USENIX Annual Technical Conference, pp. 553–565 (2017)
Ishai, Y., Kilian, J., Nissim, K., Petrank, E.: Extending oblivious transfers efficiently. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 145–161. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4_9
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as Part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2012), pp. 15–28 (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, USA, p. 10. USENIX Association (2010)
Zheng, W., Dave, A., Beekman, J.G., Popa, R.A., Gonzalez, J.E., Stoica, I.: Opaque: an oblivious and encrypted distributed analytics platform. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2017), pp. 283–298 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Badrinarayanan, S. et al. (2023). A Plug-n-Play Framework for Scaling Private Set Intersection to Billion-Sized Sets. In: Deng, J., Kolesnikov, V., Schwarzmann, A.A. (eds) Cryptology and Network Security. CANS 2023. Lecture Notes in Computer Science, vol 14342. Springer, Singapore. https://doi.org/10.1007/978-981-99-7563-1_20
Download citation
DOI: https://doi.org/10.1007/978-981-99-7563-1_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7562-4
Online ISBN: 978-981-99-7563-1
eBook Packages: Computer ScienceComputer Science (R0)