Keywords

1 Introduction

Machine Learning algorithms are data-hungry: more data leads to more accurate models. On the other hand, privacy of data is becoming exceedingly important, for social, business and policy compliance reasons (e.g. GDPR). There has been decades of groundbreaking work in the academic literature in developing cryptographic technology for collaborative computation, but it still has some significant bottlenecks in terms of wide-scale adoption. Although theoretical results demonstrate the possibility of generic secure computation, they are not efficient enough to be adopted, both in terms of computation and communication size. For instance, Google cited network cost as a major hindrance in adopting cryptographic secure computation solution [12].

Secret-Shared Shuffle. In this work, we focus on computation and communication efficiency of a building block used in many important secure computation protocols, which we call “secret-shared shuffle”. Secret-shared shuffle is a protocol which allows two parties to jointly shuffle data and obtain additive secret shares of the result - without any party learning the permutation corresponding to the shuffle. (In the remainder of this paper, by secret sharing we will always mean additive secret sharing.)

Motivation. To see the importance of secret-shared shuffle, consider the task of securely evaluating some function on the intersection of two sets belonging to two parties - in particular, the intersection itself should also remain secret. As a concrete example, consider a merchant who wants to analyze efficiency of its online ads by running some ML algorithm on the data which contains the information about users who both (a) saw the ad and (b) made a purchase. Such data is split between the ad supplier (who knows which person clicked which add) and the merchant (who knows which person made a purchase). Thus, ML should be run on set intersection of the two databases - and both ML and set intersection have to be computed using secure multi-party computation protocols (MPC).

To do this securely, ideally we would use a private set intersection protocol which outputs an intersection in some “encrypted” form - e.g. by encrypting or secret sharing elements in the intersection - and then evaluate the ML function securely under MPC. However, currently known efficient protocols for private set intersection do not output an encrypted intersection: instead they output an encrypted indicator vector - i.e. a vector of bits indicating if each element is in the intersection or not [5]. This difference is very important, since in the former case one could run the ML function (under MPC) directly on the encrypted intersection, whereas in the latter case such MPC has to be run on the whole database, and the elements not in the intersection have to be filtered out under the MPC. Needless to say, this incurs unnecessary overhead, especially in cases where the intersection is relatively small compared to the input sets.

In other words, ideally we would want to get rid of non-intersection elements before running the rest of the MPC. A natural way to do this without compromising security is to shuffle the encrypted elements together with the encrypted indicator vector. Then parties can reveal the indicator vector and discard elements which are not in the intersection. Note that it is crucial that neither party learns how exactly the elements were permuted; otherwise this party could learn whether some of its elements are in the intersection or not. Also note that the requirement on the secrecy of the permutation implies that the result of the shuffle has to be in some encrypted or secret-shared form (in order to prevent linking original and shuffled elements), hence naturally leading to the notion of secret-shared shuffle.

Known Techniques and Their Limitations. For convenience, let us look at “a half” of a secret-shared shuffle, which we call Permute+Share: in this protocol \(P_0\) holds a permutation \(\pi \) and \(P_1\) holds the database \(\varvec{x}\), and they would like to learn secret shares of permuted databaseFootnote 1. While this problem can be solved by any generic MPC, to the best of our knowledge, there are two specialized solutions for this problem, which differ in how exactly the permuting happens. One approach is to give \(P_0\)’s shares of \(\varvec{x}\) to \(P_1\) in some encrypted form, let \(P_1\) permute them according to \(\pi \) under the encryption, rerandomize them, and return them to \(P_0\). This is a folklore solution that uses rerandomizable additively homomorphic public-key encryption, and because of that it is compute-intensive. We elaborately describe this solution in the full version. The other approach is to start with secret-shared \(\varvec{x}\) and jointly compute atomic swaps, until all elements arrive to their target location. To prevent linking, each atomic swap should also rerandomize the shares. This approach is taken by [15, 22], who let parties jointly apply a permutation network to the shares, where each atomic swap is implemented using oblivious transfer (OT) in [22] and garbled circuit in [15]. The downside of this approach is its communication complexity which is proportional to \(\ell \cdot N\log N\), where \(N\) is the number of elements in the database and \(\ell \) is the bitlength of each element. This overhead seems to be inherent in approaches based on joint computation of atomic swaps, since each element has to be fully fed into at least \(\log N\) swaps.

We also note that there exist efficient protocols for secure shuffle in the 3 party setting (e.g. see [4] and references within). We note that our 2 party setting is very different from 3 party setting, which allows for honest majority and thus for simpler and more efficient constructions.

Our Contribution. We propose a novel approach to design a protocol for secret-shared shuffle, secure in the semi-honest model. Our protocol is parameterized by a value T, which can be chosen to optimize performance for a given tradeoff between network bandwith and computation cost. Our protocol runs in 3 rounds (6 messages) with communication only proportional to \(\lambda N\log N+ N\ell \log N/\log T\), where \(\lambda \) is security parameter, \(N\) is the number of elements in the database and \(\ell \) is the size of each element. In our experiments on databases of size \(2^{20}\)-\(2^{32}\) the optimal value for T is between 16 and 256, so we can think of \(\log T\) as a number between 4 an 8. Note that the size \(\ell \) of the element could be very large (e.g. each element could be a feature vector in ML algorithm), in which case the term \(N\ell \log N/\log T\) dominates, and thus it could be a significant improvement compared to communication in permutation-network-based approach, which is proportional to \(\ell N\log N\). While the computation cost of our protocol, dominated by \((NT \log N/\log T) (\ell /\lambda )\), is asymptotically worse than that of a PKE-based or permutation network-based approach, our protocol uses lightweight crypto primitives (XORs and PRGs) and does not require any public-key operations besides a set of base OTs, thus resulting in a concretely efficient protocol. We compute the concrete cost of our protocol and estimate its performance over different networks (bandwidth 1 Gbps, 100 Mbps and 72 Mbps). For large values of \(\ell \), we see a two to three orders of magnitude improvement over the best known public key based approach and an order of magnitude improvement over the best known symmetric key approach. The details of our experiment are in Sect. 7.

At the heart of our construction is a new primitive which we call Share Translation functionality. This functionality outputs two sets of pseudorandom values - one per party - with a special permutation-related dependency between them, and we show that this is enough to implement secret-shared shuffle. Conceptually, this functionality allows us to push the problem of permuting the data down to the problem of permuting preudorandom valuesFootnote 2. This can be seen as the analogue of beaver triples or tiny tables for permutations rather than arithmetic or boolean computations.

Our Share Translation has quadratic running time (in \(N)\), and thus implementing secret-shared shuffle directly using Share Translation protocol becomes too prohibitive, even with lightweight operations like XOR and PRG. This brings us to the second crucial part of our construction: we devise a way to represent any permutation as a combination of several permutations \(\pi _i\), where each \(\pi _i\) itself consists of several disjoint permutations, each acting on few elements. We find such decomposition using the special structure of Benes permutation network. This decomposition allows us to apply our Share Translation protocol to small individual disjoint permutations rather than big final permutation, allowing our protocol for secret-shared shuffle to achieve the claimed running time. We leverage the particular structure of our Share Translation protocol to make sure that this transformation doesn’t increase the number of rounds.

1.1 Applications

Collaborative Filtering. One immediate application of our shuffle protocol is to allow two parties who hold shares of a set of elements to filter out elements that satisfy a certain criterion. This could include removing poorly formed or outlier elements. Or it could be used after a PSI protocol [5, 23, 24] or in database join [21] to remove elements that were not matched. If we are willing to reveal the number of elements meeting this criterion, we can use a shuffle to securely remove these elements so that subsequent operations can be evaluated only on the resulting smaller set, which is particularly valuable if the subsequent computation is expensive (e.g. a machine learning task [20]). To do this, we first shuffle the set, then apply a 2PC to each element to evaluate the criterion, revealing the result bit in the clear, and finally remove those items whose result is 1.

Sorting Under 2PC. Our secret shared shuffle protocol can also be used to build efficient protocols for other fundamental operations. For example, in order to sort a list of secret shared elements and output the resulting secret shares, we can use the shuffle-and-reveal approach proposed by [14] together with our secret-shared shuffle. The idea in [14] is that if the data is shuffled first, then sorting algorithms can reveal the result of each comparison operation in the clear without compromizing security. Thus their approach is to first shuffle the data, and then run a sorting algorithm where each comparison is done under 2PC, with the result revealed in the clear. This yields more efficient protocols than the standard oblivious sorting protocol based on sorting networks; those protocols either have huge constants [1] or require \(O(N\log ^2 N)\) running time (using Bitonic Sorting network), where N is the number of elements in the database. Note that in many cases we want to sort not just a set of elements, but also some associated data for each element.

Sort, in addition to being a fundamental operation, can be used to find the top k results in a list, to evaluate the median or quantiles, to find outliers, and so on.

Secure Computation for RAM Programs. There has been a line of work starting with [8, 9, 11, 17,18,19, 26, 28] that looks at secure computation for RAM programs (as opposed to circuits). The primary building block in these constructions is oblivious RAM (ORAM), which allows to hide memory accesses made by the computation. A naive way to initialize ORAM is to perform an ORAM write operation for each input item, but the concrete costs on this are very high. [17, 28] show that this can be made much more efficient using a shuffle: the parties simply permute their entries using a random secret shared permutation and then store them as the ORAM memory. [28] achieve significant improvements by using garbled circuits to implement a permutation network; as we will see in Sect. 7 our solution far outperforms this approach, so we should get significant performance improvements for this application. Note that in ORAM it is often beneficial to have somewhat large block size, and our protocol for secret-shared shuffle is especially advantageous in the setting where elements are large.

1.2 Technical Overview

Notation. By bold letters \(\varvec{x}, \varvec{a}, \varvec{b}, \varvec{r}, \varvec{\varDelta }\) we denote vectors of \(N\) elements, and by \(\varvec{x}[j]\) we denote the j-th element of \(\varvec{x}\). By \(\pi (\varvec{x})\), where \(\pi \) is a permutation, we denote the permuted vector \((\varvec{x}[\pi (1)], \ldots , \varvec{x}[\pi (N)])\).

Secret-Shared Shuffle. Recall that the goal of the secret-shared shuffle is to let parties learn secret shares of a shuffled dataset. More concretely, consider parties \(P_0, P_1\), where \(P_1\) owns database \(\varvec{x}\). Our goal is to build a protocol which allows \(P_0\) to learn \(\varvec{r}\) and \(P_1\) to learn \(\varvec{r} \oplus \pi (\varvec{x})\), but nothing more; here \(\varvec{r}\) is a random vector of the same size as the database, and \(\pi \) is a random permutation of appropriate size. Our protocol also works for the case when \(\varvec{x}\) was secret shared between \(P_0\) and \(P_1\) to begin with (instead of being an input of one party).

Secret-shared shuffle can be easily built given its variant, which we call Permute+Share, where one of the parties chooses the permutation. That is, in this protocol \(P_0\) holds \(\pi \) and \(P_1\) holds \(\varvec{x}\), and as before, they would like to learn \(\varvec{r}\) and \(\varvec{r} \oplus \pi (\varvec{x})\), respectively. Indeed, secret-shared shuffle can be obtained by executing Permute+Share twice, where first \(P_0\) and then \(P_1\) chooses the permutation (note that in the second execution the database is itself already secret-shared). Thus, in the rest of the introduction we describe how to build Permute+Share.

Our construction proceeds in several steps: first we explain how to build Permute+Share using another protocol called Share Translation. Then we build the latter from oblivious punctured vector primitive, which can be in turn implemented using a GGM-based PRF and oblivious transfer with low communication. Note that we are going to describe our protocols using \(\oplus \) (XOR) operation for simplicity, however, in the main body we instead use a more general syntax with addition and subtraction, to allow our protocols to work in different groups.

Building Simplified Permute+Share from Share Translation. We first describe a simplified and inefficient version of Permute+Share; the running time of this protocol is proportional to the square of the size of the database. Later in the introduction we explain how we exploit the structure of Benes permutation network [2] to achieve our final protocol.

As a starting point, consider the following idea: \(P_1\) chooses random masks \(\varvec{a}= (\varvec{a}[1] , \ldots , \varvec{a}[N])\) and sends its masked data \(\varvec{x}\oplus \varvec{a}\) to \(P_0\). Now \(P_0\) and \(P_1\) together hold a secret-shared \(\varvec{x}\), albeit not permuted. Note that \(P_0\) knows the permutation \(\pi \) and could easily locally rearrange its shares in order of \(\pi (\varvec{x}\oplus \varvec{a})\). However, \(P_1\) doesn’t know \(\pi \) and thus cannot rearrange \(\varvec{a}\) into \(\pi (\varvec{a})\). Further, any protocol which allows \(P_1\) to learn \(\pi (\varvec{a})\) would immediately reveal \(\pi \) to \(P_1\), since \(P_1\) also knows \(\varvec{a}\).

Therefore, instead of choosing a single set of masks, \(P_1\) should choose two different and independent sets of masks, \(\varvec{a}\) and \(\varvec{b}\), where \(\varvec{a}\), as before, is used to hide \(\varvec{x}\) from \(P_0\), and \(\varvec{b}\) will become the final \(P_1\)’s share of \(\pi (\varvec{x})\). However, now \(P_0\) has a problem: since \(P_1\)’s share is \(\varvec{b}\), \(P_0\)’s share should be \(\pi (\varvec{x}) \oplus \varvec{b}\); however, \(P_0\) only receives \(\varvec{x}\oplus \varvec{a}\) from \(P_1\), and has no way of “translating” it into \(\pi (\varvec{x}) \oplus \varvec{b}\). Thus we additionally let parties execute a Share Translation protocol to allow \(P_0\) obtain a “translation function” \(\varvec{\varDelta }= \pi (\varvec{a}) \oplus \varvec{b}\), as we explain next in more detail:

Share Translation protocol takes as input permutation \(\pi \) from \(P_0\) and outputs vectors \(\varvec{\varDelta }\) to \(P_0\) and \(\varvec{a}, \varvec{b}\) to \(P_1\), such that \(\varvec{\varDelta }= \pi (\varvec{a}) \oplus \varvec{b}\), and, roughly speaking, \(\varvec{a}, \varvec{b}\) look randomFootnote 3. A simple version of Permute+Share can be obtained from Share Translation as follows:

  1. 1.

    \(P_0\) and \(P_1\) execute a Share Translation protocol, where \(P_0\) holds input \(\pi \), receives output \(\varvec{\varDelta }\), and \(P_1\) receives output \(\varvec{a}, \varvec{b}\).

  2. 2.

    \(P_1\) sends \(\varvec{x}\oplus \varvec{a}\) to \(P_0\) and sets its final share to \(\varvec{b}\).

  3. 3.

    \(P_0\) sets its share to \(\pi (\varvec{x}\oplus \varvec{a}) \oplus \varvec{\varDelta }\). Note that this is equal to \(\pi (\varvec{x}) \oplus \pi (\varvec{a}) \oplus \pi (\varvec{a}) \oplus \varvec{b}= \pi (\varvec{x}) \oplus \varvec{b}\), and therefore the parties indeed obtain secret-shared \(\pi (\varvec{x})\).

In other words, the share translation vector \(\varvec{\varDelta }\) allows \(P_0\) to translate “shares of x under \(\varvec{a}\)” into “shares of permuted x under \(\varvec{b}\)”.

Note that the Share Translation protocol can be viewed as a variant of Permute+Share protocol, with a difference that the “data” which is being permuted and shared is pseudorandom and out of parties’ control (i.e. it is chosen by the protocol): indeed, in Share Translation protocol \(P_1\) receives the “pseudorandom data” \(\varvec{a}\), and in addition \(P_0\) and \(P_1\) receive \(\varvec{\varDelta }= \pi (\varvec{a}) \oplus \varvec{b}\) and \(\varvec{b}\), respectively, which can be thought of as shares of \(\pi (\varvec{a})\). In other words, we reduced the problem of permuting the fixed data \(\varvec{x}\) to the problem of permuting some pseudorandom, out-of-control data \(\varvec{a}\). In the following paragraphs we explain how we can exploit pseudorandomness of \(\varvec{a}\) and \(\varvec{b}\) to build Share Translation protocol with reduced communication complexity.

Building Share Translation from Oblivious Punctured Vector. We start with defining an Oblivious Punctured Vector protocol (OPV), which is essentially an \((n-1)\)-out-of-n random oblivious transferFootnote 4: this protocol, on input \(j \in [N]\) from \(P_0\), allows parties to jointly generate vector \(\varvec{v}\) with random-looking elements such that:

  • \(P_0\) learns all vector elements except for its j-th element \(\varvec{v}[j]\);

  • \(P_1\) learns the whole vector \(\varvec{v}\) (but doesn’t learn index j)Footnote 5.

We use OPV to build Share Translation as follows: the parties are going to run \(N\) executions of OPV protocol to generate \(N\) vectors \(\varvec{v}_1, \ldots , \varvec{v}_N\), where \(P_0\)’s input in execution i is \(\pi (i)\). Consider an \(N\times N\) matrix \(\left\{ \varvec{v}_i[j]\right\} _{i,j \in N^2}\). By the properties of OPV protocol, \(P_1\) learns the whole matrix, and \(P_0\) learns the matrix except for elements corresponding to the permutation, i.e. it learns nothing about \(\varvec{v}_1[\pi (1)], \ldots , \varvec{v}_N[\pi (N)]\) (see Fig. 1).

Fig. 1.
figure 1

(left) \(P_0\) receives a “punctured” matrix, which is missing elements at positions \((i, \pi (i))\). Note that the missing elements are not needed to compute \(\varvec{\varDelta }\). (right) \(P_1\) receives the full matrix and uses it to compute masks \(\varvec{a}, \varvec{b}\).

Then \(P_1\) sets elements of \(\varvec{a}, \varvec{b}\) to be column- and row-wise sums of the matrix elements, i.e. for all \(i \in N\) it sets \(\varvec{a}[i] \leftarrow \bigoplus \limits _j \varvec{v}_{j}[i]\), and for all \(j \in N\) it sets \(\varvec{b}[j] \leftarrow \bigoplus \limits _i \varvec{v}_{j}[i]\). \(P_0\) computes \(\varvec{\varDelta }[i]\) by taking the sum of column \(\pi (i)\) (except the element \(\varvec{v}_i[\pi (i)]\) which it doesn’t know) and adding the sum of row i (again, except the element \(\varvec{v}_i[\pi (i)]\) which it doesn’t know), i.e. it sets \(\varvec{\varDelta }[i] \leftarrow \left( \bigoplus \limits _{j \ne i}\varvec{v}_{j}[\pi (i)]\right) \oplus \left( \bigoplus \limits _{j \ne \pi (i)}\varvec{v}_{i}[j]\right) \).

Correctness of this protocol can be immediately verified: indeed, each \(\varvec{\varDelta }[i] = \varvec{a}[\pi (i)] \oplus \varvec{b}[i]\), since the missing value \(\varvec{v}_i[\pi (i)]\) participates in the sum \(\varvec{a}[\pi (i)] \oplus \varvec{b}[i]\) twice and therefore doesn’t influence the result. For security, note that \(P_0\) doesn’t learn anything about \(\varvec{a}, \varvec{b}\) (except for \(\varvec{\varDelta }\)), since it is missing exactly one element from each row and column of the matrix; the missing element acts as a one-time pad and hides each \(\varvec{a}[i], \varvec{b}[j]\) from \(P_0\). \(P_1\) doesn’t learn anything about the permutation \(\pi \) due to index hiding property of the OPV protocol.

Note that this protocol has running time proportional to \(N^2\) - we will show how to reduce this below.

Building Oblivious Punctured Vector from OT and PRFs. Oblivious Punctured Vector can be implemented using any \((n-1)\)-out-of-n OT, but in order to make it communication-efficient, we devise a new technique which was inspired by the protocol for distributed point function by Doerner and Shelat [6]. The same technique appears in concurrent and independent worksFootnote 6 of Schoppmann et al. and Boyle et al. [3, 25] in the context of silent OT extension and vector-OLE.

In the beginning of the protocol \(P_1\) computes \(\varvec{v}\) by choosing key for GGM PRF at random, denoted \(\mathsf {seed}_\epsilon \), and setting each \(\varvec{v}[i] \leftarrow PRF(\mathsf {seed}_\epsilon ; i)\), \(i \in [N]\). Recall that in GGM construction the key is treated as a prg seed, which implicitly defines a binary tree with leaves containing PRF evaluations \(F(1), F(2), \ldots , F(N)\). In other words, we set vector \(\varvec{v}\) to contain values at the leaves of the tree.

Let \(P_0\)’s input in the OPV protocol be j. This means that \(P_0\) should learn leaves \(F(i), i\ne j,\) as a result of the protocol. This can be done as follows. Let us denote internal seeds in the tree by \(\left\{ \mathsf {seed}_\gamma \right\} \), where \(\gamma \) is a string describing the position of the node in the tree (in particular, at the root \(\gamma = \epsilon \), an empty string). Let’s assume for concreteness that the first bit of j is 1. The parties are going to run 1-out of-2 OT protocol, where \(P_0\)’s input is the complement of the first bit of j, i.e. 0, and \(P_1\)’s inputs are \(\mathsf {seed}_{0}\), \(\mathsf {seed}_{1}\). This allows \(P_0\) to recover \(\mathsf {seed}_{0}\) and therefore to locally compute the left half of the tree, i.e. all values \(F(1), \ldots , F(N/ 2)\), and corresponding intermediate seeds.

Next, assume the second bit of j is 0. Note that the parties could run 1-out of-4 OT to let \(P_0\) learn \(\mathsf {seed}_{11}\) and therefore locally compute the right quarter of the tree \(F(3N/ 4), \ldots , F(N)\), then run 1-out of-8 OT and so on. However, this approach would require eventually sending 1-out of \(N\) OT, which defeats the initial purpose of having \(\log N\) 1-out of-2 OTs only.

Instead, we let \(P_0\) learn \(\mathsf {seed}_{11}\) in a different way: we let \(P_1\) send only two values, via 1-out-of-2-OT: the first value is the sum of seeds which are left children, i.e. \(\mathsf {seed}_{00} \oplus \mathsf {seed}_{10}\), and the second value is the sum of seeds which are right children, i.e. \(\mathsf {seed}_{01} \oplus \mathsf {seed}_{11}\). Since \(P_0\) already knows the whole left subtree and in particular \(\mathsf {seed}_{00}\) and \(\mathsf {seed}_{01}\), it can receive \(\mathsf {seed}_{01} \oplus \mathsf {seed}_{11}\) from the OT protocol and add \(\mathsf {seed}_{01}\) to it to obtain \(\mathsf {seed}_{11}\). (We note that this idea of sending the sums of left and right children is coming from the work of Doerner and Shelat [6]).

More generally, the parties execute \(\log N\) 1-out-of-2 OTs - one for each level of the tree - where at each level k the first input to OT is the sum of all odd seeds at that level, and the second input to OT is the sum of all even seeds at that level. It can be seen that each sum contains exactly one term which \(P_0\) doesn’t know yet, and therefore it can receive the appropriate sum (depending on the k-th bit of j) and subtract other seeds from it to learn the next seed of the subtree. Note that these OT’s can be executed in parallel.

Note that the running time of the parties is proportional to the vector size, but their communication size only depends on its logarithm.

Applying Share Translation to the Decomposed Permutation. Recall that, while communication complexity in our protocol is low, computation complexity is proportional to the size of the database squared, and thus is only efficient for a small database. To deal with this issue, we change the way how Permute+Share is built from Share Translation : instead of applying Share Translation to the whole permutation \(\pi \) directly, we first split the permutation \(\pi \) into smaller permutations in a special way, then apply Share Translation to each separate permutation to get multiple shares, and then recombine these shares to obtain shares with respect to \(\pi \).

More concretely, the idea is to split the permutation \(\pi \) into a composition of multiple permutations \(\pi _1 \circ \ldots \circ \pi _d\), such that each \(\pi _i\) is itself a composition of several disjoint permutations, each acting on T elements, for some parameter T. We refer to this as \((T,d)-\)subpermutation representation of \(\pi \). Such a representation can be found using a special structure of Benes permutation network. For instance, as shown on Fig. 2, the first two layers of a network on 8 elements can be split into two permutations, acting on \(T = 4\) elements each, where the first permutation acts on odd elements and the second permutation acts on even elements. We present the full description of our decomposition in Sect. 6.2.

Fig. 2.
figure 2

(left) The first two layers of the Benes permutation network for 8 elements. A link indicates that the corresponding elements are potentially swapped, depending on the underlying permutation. (right) A grouping of these layers into two disjoint permutations acting on 4 elements each: one acting on white elements and the other acting on black elements.

With such a decomposition in place, parties can run parallel executions of Share Translation, each acting on domain of size T. Note that, since the running time of a single Share Translation is proportional to the domain size squared, it is better to choose relatively small T. In our experiments, the typical optimal values of T were 16, 128, 256, depending on other parameters.

Note that setting \(T = N\) corresponds to our simplified Permute+Share protocol described before, and setting \(T = 2\) results in essentially computing the permutation network, where each swap is implemented in a somewhat-complicated way (using Share Translation protocol). Thus, this scheme can be thought of as a golden middle between the two approaches.

It remains to note that parties can run all executions of Share Translation in parallel (as opposed to taking multiple rounds, following the layered structure of the permutation network). To achieve this, in all execution except for the first ones, \(P_1\) instead of sending initial masked data \(\varvec{x}\oplus \varvec{a}\) should send correction vector \(\varvec{a}^{new} \oplus \varvec{b}^{old}\), which can be added to the shares of \(P_0\) in order to obtain \(\varvec{x}\oplus \varvec{a}^{new}\). We refer the reader to Sect. 6.2 for more details.

Achieving Simulation-Based Definition. We note that the protocols we described so far only achieve indistinguishability-based definition, but not simulation-based definition. The problem is that the output values are only pseudo-random, and parties in the protocols know their succinct “preimages” (like the GGM PRF root). Thus, the simulator, given a random string as an output of the protocol, cannot simulate internal state of that party since it would amount to compressing a random string.

To achieve simulation-based definition, we slightly modify the original Permute+Share protocol as follows: we additionally instruct \(P_1\) to sample random string \(\varvec{w}\) of the size of the database and send it to \(P_0\), together with \(\varvec{x}\oplus \varvec{a}\). Then \(P_0\) should set its share to be \(\pi (\varvec{x}\oplus \varvec{a}) \oplus \varvec{\varDelta }\oplus \varvec{w}\), and \(P_1\) should set its share to be \(\varvec{b}\oplus \varvec{w}\). In other words, \(P_1\) should additionally secret-share its vector \(\varvec{b}\) using random \(\varvec{w}\). Such a protocol can be simulated by a simulator who executes Share Translation protocol honestly (obtaining some \(\varvec{a}', \varvec{b}', \varvec{\varDelta }'\)) and then sets simulated \(\varvec{w}\) to be \(\varvec{z} \oplus \varvec{b}'\) (where \(\varvec{z}\) is the output of Permute+Share protocol simulated by an external simulator).

2 Notations

We denote the security parameter as \(\lambda \). The bit length of each element in the input set is \(\ell \), \(\ell = \mathsf {poly}(\lambda )\). We denote an upper bound on the size of the database as \(N\). Ideal functionality is denoted as \(\mathcal {F}\). We will denote vectors with bold fonts and individual elements with indices. For example, \(\varvec{v}\) is a vector of \(N\) elements where each individual element is denoted as \(v_i\). \(\leftarrow ^{\$}\) denotes selected uniformly at random from a domain. By \(S_N\) we denote the group of all permutations on \(N\) elements.

We also make use of the following notation:

  • Exec: Let \(\varPi \) be a two-party protocol. By \((\mathsf {output}_0, \mathsf {output}_1) \leftarrow \mathsf {exec}^\varPi (\lambda ; x_0, x_1; r_0, r_1)\) we denote the concatenated outputs of all parties after the execution of the protocol \(\varPi \) with security parameter \(\lambda \) on inputs \(x_0, x_1\) using randomness \(r_0, r_1\).

  • View: Let \(\varPi \) be a two-party protocol. By \(\mathsf {view}_b^\varPi (\lambda ; x_0, x_1; r_0, r_1)\) we denote the view of party b when parties \(P_0\) and \(P_1\) run the protocol \(\varPi \) with security parameter \(\lambda \) on inputs \(x_0, x_1\) using randomness \(r_0, r_1\). The view of each party includes its inputs, random coins, all messages it receives, and its outputs. When the context is clear, we also write \(\mathsf {view}_b\) for short.

Honest-but-Curious Security for a 2PC: Honest-but-curious security for a 2PC protocol \(\varPi \) evaluating function \(\mathcal {F}\) is defined in terms of the following two experiments:

  • \( IDEAL _{\mathsf {sim},b}^{\mathcal {F}}(\lambda ,x_0,x_1)\) evaluates \(\mathcal {F}(x_0,x_1)\) to obtain output \((y_0,y_1)\) runs the stateful simulator \(\mathsf {sim}(1^\lambda ,b, x_b, y_b)\) which produces a simulated view \(\mathsf {view}_b\) for party \(P_b\). The output of the experiment is \((\mathsf {view}_b, y_{1-b})\).

  • \( REAL _{b}^\varPi (\lambda , x_{0}, x_1)\) runs the protocol with security parameter \(\lambda \) between honest parties \(P_{0}\) with input \(x_0\) and \(P_1\) with input \(x_1\) who obtain outputs \(y_0,y_1\) respectively. It outputs \((\mathsf {view}_b, y_{1-b})\).

Definition 1

Protocol \(\varPi \) realizes \(\mathcal {F}\) in the honest-but-curious setting if there exists a PPT simulator \(\mathsf {sim}\) such that for all inputs \(x_0,x_1\), and corrupt parties \(b\in \{0,1\}\) the two experiments are indistinguishable.

Pseudo Random Generator. Let \(\{\mathsf {G}\}_\lambda \) be a family of polynomial size circuits where each \(\mathsf {G}_\lambda : \{0,1\}^{m(\lambda )} \rightarrow \{0,1\}^{l(\lambda )}\), \(l(\lambda ) \ge m(\lambda )\). \(\{\mathsf {G}\}_\lambda \) is a PRG if the following distributions are computationally indistinguishable:

$$ \{\mathcal {D}_1\}_\lambda = \{\mathsf {G}(\mathsf {s}) : \mathsf {s}\leftarrow \{0,1\}^{m(\lambda )}\}, \{\mathcal {D}_2\}_\lambda = \{x: x \leftarrow \{0,1\}^{l(\lambda )}\}$$

We will omit the dependence of m and l on \(\lambda \) for simplicity. When \(l=2m\), we call this a length doubling PRG.

Oblivious Transfer (\(\mathsf {OT}\)). \(\mathsf {OT}\) is a secure 2-party protocol that realizes the functionality \(\mathcal {F}_\mathsf {OT}:((\mathsf {str}_0,\mathsf {str}_1),b) = (\bot ,\mathsf {str}_b)\) where \(\mathsf {str}_0, \mathsf {str}_1 \in \{0,1\}^k, b \in \{0,1\}\).

3 Oblivious Punctured Vector (\(\mathsf {OPV}\))

3.1 Definition and Security Properties

An Oblivious Punctured Vector (\(\mathsf {OPV}\)) for domain \(\mathbb {D}\) is an interactive protocol between two parties, \(P_0\) and \(P_1\), where parties’ inputs are \(((1^\lambda ,\mathsf {n}), (1^\lambda ,\mathsf {n},i))\) and their outputs are \((\varvec{v}_0,\varvec{v}_1)\), respectively. Here \(\lambda \) is the security parameter that determines the running time of the protocol, \(\varvec{v}_b, b \in \{0,1\}\) are vectors of length \(\mathsf {n}\), \(i\in [\mathsf {n}]\) and \(\varvec{v}_b \in [\mathbb {D}]^\mathsf {n}\).

This protocol lets the two parties jointly generate vector \(\varvec{v}\) with random-looking elements such that: 1) \(P_0\) learns the whole vector \(\varvec{v}\) but doesn’t learn index i. 2) \(P_1\) learns all vector elements except for its i-th element \(\varvec{v}[i]\). So we define the protocol to be correct if \(\varvec{v}_1 [j] = \varvec{v}_0 [j] \; \forall j\ne i\).

To capture the first property, we want to say that an adversarial \(P_0\), who is given two distinct indices \(i, i'\in [\mathsf {n}]\), \(i\ne i'\) and participates in two executions of the protocol, one where party \(P_1\) holds i, and the other, where \(P_1\) holds \(i'\), cannot tell the two executions apart. We call this property Position hiding. To capture the second property, we want to say that an adversarial \(P_1\), who, in addition to its view in the protocol execution, receives the vector \(\varvec{v}_0\), cannot differentiate between the two cases: when \(\varvec{v}_0\) is generated according to \(\mathsf {exec}\) and when \(\varvec{v}_0\) is generated according to \(\mathsf {exec}\), then \(\varvec{v}_0 [i]\) is replaced a random string from the domain. We call this security property Value hiding. We define the properties formally below.

Correctness. For any sufficiently large security parameter \(\lambda \in \mathbb {N}\), for any \(\mathsf {n}\in \mathbb {N}, i \in [\mathsf {n}]\), if \((\varvec{v}_0,\varvec{v}_1) \leftarrow \mathsf {exec}^\mathsf {OPV}((\lambda \mathsf {n}),(\lambda ,\mathsf {n},i))\) and \(\varvec{v}_b \in [\mathbb {D}]^\mathsf {n}, b \in \{0,1\}\), then \(\varvec{v}_1 [j] = \varvec{v}_0 [j] \; \forall j\ne i\).

Position Hiding. For any sufficiently large security parameter \(\lambda \in \mathbb {N}\), \(\mathsf {n}\in \mathbb {N}, i,i' \in [\mathsf {n}]\), the following distributions are computationally indistinguishable:

$$\begin{aligned} \mathcal {D}_1 = \{ (\varvec{v}_0,\varvec{v}_1) \leftarrow \mathsf {exec}^\mathsf {OPV}((1^\lambda , \mathsf {n}),(1^\lambda ,\mathsf {n},i)): (1^\lambda ,\mathsf {n}, i,i',\mathsf {view}_0) \}\\ \mathcal {D}_2 = \{ (\varvec{v}_0,\varvec{v}_1) \leftarrow \mathsf {exec}^\mathsf {OPV}((1^\lambda , \mathsf {n}),(1^\lambda ,\mathsf {n},i')): (1^\lambda ,\mathsf {n}, i,i',\mathsf {view}_0) \} \end{aligned}$$

Value Hiding. For any sufficiently large security parameter \(\lambda \in \mathbb {N}\), for any \(\mathsf {n}\in \mathbb {N}, i \in [\mathsf {n}]\), the following distributions are computationally indistinguishable:

$$\begin{aligned} \mathcal {D}_1= \{(\varvec{v}_0,\varvec{v}_1) \leftarrow \mathsf {exec}^\mathsf {OPV}((1^\lambda , \mathsf {n}), (1^\lambda ,\mathsf {n},i)): (1^\lambda ,\mathsf {n},i,\varvec{v}_0,\mathsf {view}_1)\}\\ \mathcal {D}_2= \{((\varvec{v}_0,\varvec{v}_1) \leftarrow \mathsf {exec}^\mathsf {OPV}((1^\lambda , \mathsf {n}), (1^\lambda ,\mathsf {n},i)), \varvec{v}_0[i] := r \; \text {where}\; r \leftarrow ^{\$}\mathbb {D}: \\ (1^\lambda ,\mathsf {n},i,\varvec{v}_0,\mathsf {view}_1)\} \} \end{aligned}$$

Construction: We defer the formal construction and security proof of Theorem 1 to the full version. For an informal description of the construction, please refer to Sect. 1.2.

Please note that we only count the cryptographic operations while analyzing the computation complexity of our protocols.

Theorem 1

The OPV construction satisfies position and value hiding as defined in Definition Sect. 3.1. The protocol runs \(\mathsf {n}\) (1-out-of-2) \(\mathsf {OT}\) on messages of length \(\lambda \) bits in parallel. The communication cost is that of the \(\mathsf {OT}\)s and the computation cost is the cost of these \(\mathsf {OT}\)s + \(\mathsf {n}\) length-doubling PRG computations for each partyFootnote 7, where \(\lambda \) is a security parameter and n is the number of elements in the vector.

3.2 OPV Construction for Longer Strings

Let \(\mathsf {OPV}_\mathbb {D}\) denote the interactive protocol between two parties, \(P_0\) and \(P_1\), where parties’ inputs are \(((1^\lambda ,\mathsf {n}), (1^\lambda ,\mathsf {n},i))\) and their outputs are \((\varvec{v}_0,\varvec{v}_1)\), where \(\varvec{v}_b \in [\mathbb {D}]^\mathsf {n}\) and \(\mathbb {D}\) is strings of length \(\lambda \). We construct \(\mathsf {OPV}_{\mathbb {D}'}\) where \(\mathbb {D}'\) is strings of length \(\ell \ge \lambda \) using \(\mathsf {OPV}_\mathbb {D}\) and a PRG \(\mathsf {G}: \{0,1\}^\lambda \rightarrow \{0,1\}^\ell \) as follows.

  • Run \((\varvec{v}_0,\varvec{v}_1) \leftarrow \mathsf {exec}^{\mathsf {OPV}_\mathbb {D}}((1^\lambda , \mathsf {n}),(1^\lambda ,\mathsf {n},i)\)

  • Party \(P_b\), \(b \in \{0,1\}\) does the following: for each \(\varvec{v}_b[j], j \in [1,\mathsf {n}]\), expand it to a \(\ell \)-bit string using \( \mathsf {G}(\varvec{v}_b[j])\), i.e., \(\varvec{v'}_b[j] \leftarrow \mathsf {G}(\varvec{v}_b[j])\)\(P_b\)’s output is \(\varvec{v'}_b\).

Theorem 2

If \(\mathsf {OPV}_\mathbb {D}\) satisfies correctness, position and value hiding as defined in Definition Sect. 3.1, and \(\mathsf {G}\) is a secure PRG, then our construction for \(\mathsf {OPV}_{\mathbb {D}'}\) satisfies correctness, position and value hiding as well. The round complexity and communication cost is the same as the cost of \(\mathsf {OPV}_\mathbb {D}\). The computation cost includes the computation cost of \(\mathsf {OPV}_\mathbb {D}\) + \(\mathsf {n}\) \(\lambda \)-bit-to-\(\ell \)-bit PRGs.

Proof

Correctness: By the correctness of \(\mathsf {OPV}_\mathbb {D}\), \(\varvec{v}_0[j] = \varvec{v}_1[j]\), \(\forall j \ne i\). Therefore, by our construction, \(\varvec{v'}_0[j] = \varvec{v'}_1[j]\), \(\forall j \ne i\).

Position Hiding: For the sake of contradiction, suppose not. Then, there exists a distinguisher D that breaks the position hiding property of \(\mathsf {OPV}_{\mathbb {D}'}\). We use D to build a distinguisher \(\mathcal {A}\) that breaks the position hiding property of \(\mathsf {OPV}_\mathbb {D}\) as follows. \(\mathcal {A}\) receives \((1^\lambda ,\mathsf {n}, i,i',\mathsf {view}^{\mathsf {OPV}_\mathbb {D}}_0)\) as input, where \(\mathsf {view}^{\mathsf {OPV}_\mathbb {D}}_0\) contains \(\varvec{v}_0\). For every \(\varvec{v}_0[j], j \in [1,\mathsf {n}]\), \(\mathcal {A}\) computes \(\varvec{v'}_0[j] = \mathsf {G}(\varvec{v}_0[j])\). Then it constructs \(\mathsf {view}^{\mathsf {OPV}_{\mathbb {D}'}}_0\), which is \(\mathsf {view}^{\mathsf {OPV}_\mathbb {D}}_0\), augmented with \(\varvec{v'}_0[j]\). \(\mathcal {A}\) forwards \((1^\lambda ,\mathsf {n}, i,i',\mathsf {view}^{\mathsf {OPV}_{\mathbb {D}'}}_0)\) to D. Thus, \(\mathcal {A}\) directly inherits the success probability D.

Value Hiding: Recall that we are trying to prove the following two distributions are computationally indistinguishable.

$$\begin{aligned} \mathcal {D}_1= \{(\varvec{v'}_0,\varvec{v'}_1) \leftarrow \mathsf {exec}^{\mathsf {OPV}_{\mathbb {D}'}}((1^\lambda , \mathsf {n}), (1^\lambda ,\mathsf {n},i)): (1^\lambda ,\mathsf {n},i,\varvec{v'}_0,\mathsf {view}^{\mathsf {OPV}_{\mathbb {D}'}}_1)\}\\ \mathcal {D}_2= \{((\varvec{v'}_0,\varvec{v'}_1) \leftarrow \mathsf {exec}^{\mathsf {OPV}_{\mathbb {D}'}}((1^\lambda , \mathsf {n}), (1^\lambda ,\mathsf {n},i)), \varvec{v'}_0[i] := r \; \text {where}\; r \leftarrow ^{\$}\mathbb {D}': \\ (1^\lambda ,\mathsf {n},i,\varvec{v}'_0,\mathsf {view}^{\mathsf {OPV}_{\mathbb {D}'}}_1)\} \end{aligned}$$

The proof will proceed through a series of hybrid steps. We define a series of distributions as follows.

  • \(H_0\): \(\mathcal {D}_1= \{(\varvec{v'}_0,\varvec{v'}_1) \leftarrow \mathsf {exec}^{\mathsf {OPV}_{\mathbb {D}'}}((1^\lambda , \mathsf {n}), (1^\lambda ,\mathsf {n},i)): (1^\lambda ,\mathsf {n},i,\varvec{v'}_0,\mathsf {view}^{\mathsf {OPV}_{\mathbb {D}'}}_1)\}\)

  • \(H_1\): Identical to the previous distribution except the following: generate \((\varvec{v}_0,\varvec{v}_1) \leftarrow \mathsf {exec}^{\mathsf {OPV}_{\mathbb {D}}}((1^\lambda , \mathsf {n}), (1^\lambda ,\mathsf {n},i))\), then set \(\varvec{v}_0[i] := r \; \text {where}\; r \leftarrow ^{\$}\mathbb {D}\) and set \(\varvec{v'}_0[i] \leftarrow \mathsf {G}(\varvec{v'}_0[i])\). By the value-hiding property of \(\mathsf {OPV}_\mathbb {D}\), \(H_0,H_1\) are identical.

  • \(H_2\): Identical to the previous distribution except the following: instead of computing \(\varvec{v'}_0[i] \leftarrow \mathsf {G}(\varvec{v'}_0[i])\), set \(\varvec{v'}_0[i] := r' \; \text {where}\; r' \leftarrow ^{\$}\mathbb {D}'\). By the security property of PRG, \(H_1, H_2\) are identical. Note that distribution \(H_2\) is identical to \(\mathcal {D}_2\). So this concludes the proof of value hiding.   \(\square \)

4 Share Translation Protocol

4.1 Definition

Share Translation (ST) protocol with parameters \((N, \ell )\) is an interactive protocol between two parties, \(P_0\) and \(P_1\), where parties’ inputs are \((\pi , \bot )\) and their outputs are \((\varvec{\varDelta }, (\varvec{a}, \varvec{b}))\), respectively. Here \(\pi \) is a permutation on N elements, and \(\varDelta , \varvec{a}, \varvec{b}\) are all vectors of N elements in group \(\mathbb {G}\), where each element can be represented with \(\ell \) bits. The protocol should satisfy the following correctness and security guarantees:

Correctness: For each sufficiently large security parameter \(\lambda \), for each \(\pi \in S_N\), and for each \(r_0, r_1\) of appropriate length, let \((\varvec{\varDelta }, (\varvec{a}, \varvec{b})) \leftarrow \mathsf {exec}^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)\). Then it should hold that \(\varvec{\varDelta }= \varvec{b}- \pi (\varvec{a})\).

This definition can be modified in a straightforward way for statistical or computational correctness.

Permutation Hiding: For all sufficiently large \(\lambda \) it should hold that for all \(\pi , \pi ' \in S_N\),

$$\mathsf {view}_1^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1) \approx \mathsf {view}_1^\mathsf {ST}(\lambda ; \pi ', \bot ; r_0, r_1),$$

where indistinguishability holds over uniformly chosen \(r_0, r_1\).

Share Hiding: For all sufficiently large \(\lambda \) it should hold that for any \(\pi \in S_N\),

$$(\varvec{a}, \varvec{b}, \mathsf {view}_0^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)) \approx (\varvec{a}', \varvec{b}', \mathsf {view}_0^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)),$$

where \((\varvec{\varDelta }, \varvec{a}, \varvec{b}) = \mathsf {exec}_\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)\), \(\varvec{a}' \leftarrow ^{\$}\mathbb {G}^N\), \(\varvec{b}' = \varvec{\varDelta }+ \pi (\varvec{a}')\), and indistinguishability holds over uniformly chosen \(r_0, r_1\).

4.2 Construction

We build Share Translation protocol out of an Oblivious Punctured Vector (OPV) protocol for domain \(\mathbb {D}= \mathbb {G}\). Let \(\pi \) be \(P_0\)’s input in Share Translation protocol. The protocol proceeds as follows:

  1. 1.

    \(P_0\) and \(P_1\) run \(N\) executions of the OPV protocol in parallel, where \(P_0\) uses \(\pi (i)\) as its input in execution i, for \(i \in [N]\). Denote \(\varvec{v}'_i, \varvec{v}_i\) to be the outputs of the OPV protocol in execution i, for parties \(P_0\) and \(P_1\), respectively, and denote \(\varvec{v}'_i[j], \varvec{v}_i[j]\) to be j-th elements of these vectors.

  2. 2.

    For each \(i \in [N]\) \(P_0\) sets \(\varvec{\varDelta }[i] \leftarrow \sum \limits _{j \ne \pi (i)}\varvec{v}'_i[j] - \sum \limits _{j \ne i}\varvec{v}'_j[\pi (i)]\). It sets its output to be \(\varvec{\varDelta }= (\varvec{\varDelta }[1], \ldots , \varvec{\varDelta }[N])\).

  3. 3.

    For each \(i \in [N]\) \(P_1\) sets \(\varvec{b}_i \leftarrow \sum \limits _j \varvec{v}_i[j]\), \(\varvec{a}_i \leftarrow \sum \limits _j \varvec{v}_j[i]\). It sets \((\varvec{a}, \varvec{b})\) as its output, where \(\varvec{a}= (\varvec{a}[1], \ldots , \varvec{a}[N])\), \(\varvec{b}= (\varvec{b}[1], \ldots , \varvec{b}[N])\).

Theorem 3

The construction described above satisfies correctness, permutation hiding and share hiding, assuming underlying OPV protocol satisfies correctness, value hiding and position hiding. The round complexity, communication and computation cost of this protocol are equal to those of \(N\) instances of OPV run in parallel.

Correctness. For any \(i \in [N]\) we have

$$\varvec{\varDelta }_i = \sum \limits _{j \ne \pi (i)}\varvec{v}'_i[j] - \sum \limits _{j \ne i}\varvec{v}'_j[\pi (i)] {\mathop {=}\limits ^{(1)}} \sum \limits _{j \ne \pi (i)}\varvec{v}_i[j] - \sum \limits _{j \ne i}\varvec{v}_j[\pi (i)] {\mathop {=}\limits ^{(2)}} $$
$${\mathop {=}\limits ^{(2)}} \sum \limits _{j \in [N]}\varvec{v}_i[j] - \sum \limits _{j \in [N]}\varvec{v}_j[\pi (i)] = \varvec{b}_i - \varvec{a}_{\pi (i)}.$$

Here (1) follows from correctness of the OPV protocol, and (2) holds since we add and subtract the same value \(\varvec{v}_i[\pi (i)]\). Note that a computationally (resp., statistically, perfectly) correct OPV protocol results in a computationally (resp., statistically, perfectly) correct ST protocol.

Permutation Hiding. Recall that we need to show that for all \(\pi , \pi ' \in S_N\),

$$\mathsf {view}_1^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1) \approx \mathsf {view}_1^\mathsf {ST}(\lambda ; \pi ', \bot ; r_0, r_1).$$

We show this indistinguishability in a sequence of hybrids \(H_0, H_1, \ldots , H_N\), where:

  • \(H_0 = \mathsf {view}_1^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)\), for uniformly chosen \(r_0, r_1\),

  • \(H_N= \mathsf {view}_1^\mathsf {ST}(\lambda ; \pi ', \bot ; r_0, r_1)\), for uniformly chosen \(r_0, r_1\),

  • For \(1 \le i < N\), \(H_i = \mathsf {view}_1^{(i)}(\lambda ; (\pi , \pi '), \bot ; r_0, r_1)\), where \(\mathsf {view}_1^{(i)}(\lambda ; (\pi , \pi '), \bot ; r_0, r_1)\) is a view of \(P_1\) in the modified Share Translation protocol where party \(P_0\) uses \(\pi '(j)\) as its input in OPV executions \(1 \le j \le i\) and \(\pi (j)\) as its input in OPV executions \(i < j \le N\). \(r_0, r_1\) are uniformly chosen.

We argue that for each \(1 \le i \le N\) \(H_i \approx H_{i-1}\) due to position-hiding property of the OPV protocol, and therefore \(H_0 \approx H_N\).

Indeed, note that the only difference between \(H_i\) and \(H_{i-1}\) is that in i-th execution of OPV party \(P_0\) uses input \(\pi '(i)\) instead of \(\pi (i)\). Therefore if some PPT adversary distinguishes between \(H_i\) and \(H_{i-1}\), then we break position hiding of OPV as follows. Given the challenge in the OPV position hiding game \((\pi (i), \pi '(i), \mathsf {view}_1^\mathsf {OPV}(\lambda ; x, \bot ; r_0^\mathsf {OPV}, r_1^\mathsf {OPV}))\), where \(r^\mathsf {OPV}_0, r^\mathsf {OPV}_1\) are uniformly chosen randomness of \(P_0\) and \(P_1\) in the OPV protocol, and \(\mathsf {view}_1^\mathsf {OPV}\) is a view of \(P_1\) in OPV protocol (which uses randomness \(r^\mathsf {OPV}_0, r^\mathsf {OPV}_1\) and \(P_0\)’s input x which is either \(\pi (i)\) or \(\pi '(i)\)), we execute the rest \(N-1\) OPV protocols honestly using uniform randomness for each party and setting \(P_0\)’s input to \(\pi '(j)\) (for executions \(j < i\)) and \(\pi (j)\) (for executions \(j > i\)). Let \(\varvec{v}_j\), \(j = 1, \ldots , N\), be the output of \(P_1\) in j-th execution of OPV.

We give the adversary \(P_1\)’s view in all \(N\) OPV executions (including \(\mathsf {view}_1^\mathsf {OPV}(\lambda ; x, \bot ; r_0^\mathsf {OPV}, r_1^\mathsf {OPV})\) of i-th execution which we received as a challenge). Depending on whether challenge input x was \(\pi (i)\) or \(\pi '(i)\), the distribution the adversary sees is either \(H_{i-1}\) or \(H_{i}\). Therefore, if the adversary distinguishes between the two distributions, we can break position hiding of OPV protocol with the same success probability.

Share Hiding. Recall that we need to show that for any \(\pi \in S_N\),

$$(\varvec{a}, \varvec{b}, \mathsf {view}_0^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)) \approx (\varvec{a}', \varvec{b}', \mathsf {view}_0^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)),$$

where \(\varvec{a}, \varvec{b}\) are true shares produced by the protocol, and \(\varvec{a}', \varvec{b}'\) are uniformly random, subject to \(\varvec{\varDelta }= \varvec{b}- \pi (\varvec{a})\).

We show this indistinguishability in a sequence of hybrids \(H_0, H_1, \ldots , H_N\), where:

  • \(H_0 = (\varvec{a}, \varvec{b}, \mathsf {view}_0^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1))\), for uniformly chosen \(r_0, r_1\),

  • \(H_N= (\varvec{a}', \varvec{b}', \mathsf {view}_0^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1))\), for uniformly chosen \(r_0, r_1, \varvec{a}'\), and \(\varvec{b}' = \varvec{\varDelta }+ \pi (\varvec{a})\), where \((\varvec{\varDelta }, \varvec{a}, \varvec{b}) = \mathsf {exec}_\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)\),

  • \(H_i = (\varvec{a}^{(i)}, \varvec{b}^{(i)}, \mathsf {view}_0^\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1))\), where \((\varvec{\varDelta }, \varvec{a}, \varvec{b}) = \) \(\mathsf {exec}_\mathsf {ST}(\lambda ; \pi , \bot ; r_0, r_1)\) is the output of the Share Translation protocol for random \(r_1, r_2\), \(\varvec{a}^{(i)} = (\varvec{a}_1^{(i)}, \ldots , \varvec{a}_N^{(i)})\) is such that \(\varvec{a}_j^{(i)}\) is uniformly chosen for \(1 \le j \le i\), \(\varvec{a}_j^{(i)} = \varvec{a}_j\) for \(i < j \le N\), and \(\varvec{b}^{(i)} = \varDelta + \pi (\varvec{a}^{(i)})\).

We argue that for each \(1 \le i \le N\) \(H_i \approx H_{i-1}\), by reducing it to value hiding of OPV protocol. Indeed, note that the only difference between \(H_i\) and \(H_{i-1}\) is that \(\varvec{a}_i^{(i)}\) is generated uniformly at random, rather then set to the true output of the protocol. Therefore if some PPT adversary distinguishes between \(H_i\) and \(H_{i-1}\), then we break security of OPV as follows. Assume we are given the challenge \((\varvec{v}_i, \mathsf {view}_0^\mathsf {OPV}(\lambda ; \pi (i), \bot ; r_0^\mathsf {OPV}, r_1^\mathsf {OPV}))\), where \(r^\mathsf {OPV}_0, r^\mathsf {OPV}_1\) are uniformly chosen randomness of \(P_0\) and \(P_1\) in the OPV protocol, and \(\mathsf {view}_0^\mathsf {OPV}\) is a view of \(P_0\) in OPV protocol (which uses randomness \(r^\mathsf {OPV}_0, r^\mathsf {OPV}_1\) and \(P_0\)’s input \(\pi (i)\)), and challenge \(\varvec{v}_i\) is either the true output of \(P_1\), or the output of \(P_1\) except that \(\varvec{v}_i[\pi (i)]\) is set to a uniform value. We execute the rest \(N-1\) OPV protocols honestly using uniform randomness for each party and setting \(P_0\)’s input to \(\pi (j)\), for \(j \ne i\). Let’s denote the outputs of each OPV execution \(j \ne i\) as \((\varvec{v}_j, \varvec{v}'_j)\).

Then we compute \(\varvec{a}^{(i)}\), \(\varvec{b}^{(i)}\) as follows:

  • \(\varvec{b}^{(i)}[k] \leftarrow \sum \limits _{j} \varvec{v}_k[j]\), for each \(k \in [N]\),

  • \(\varvec{a}^{(i)}[k] \leftarrow \sum \limits _{j} \varvec{v}_j[k]\), for each \(k \in [N]\),

Then we give the adversary \(\varvec{a}^{(i)}, \varvec{b}^{(i)}\), and the views of party \(P_0\) in all \(N\) OPV executions (including the challenge view \( \mathsf {view}_0^\mathsf {OPV}(\lambda ; \pi (i), \bot ;\)

\( r_0^\mathsf {OPV}, r_1^\mathsf {OPV})\) of i-th execution). Depending on whether challenge \(\varvec{v}_i[\pi (i)]\) was uniform or not, the distribution the adversary sees is either \(H_{i-1}\) or \(H_{i}\).

Thus, we showed that \(H_0\) and \(H_N\) are indistinguishable, as required.

5 \((T, d)-\)Subpermutation Representation Based on Benes Permutation Network

In this section we describe how to obtain \((T, d)-\)subpermutation representation, which is used in our final construction of Share Translation and secret-shared shuffle in Sect. 6. That is, we show how to represent any permutation \(\pi \in S_N\), where \(N= 2^n\) for some integer n, as a composition of permutations \(\pi _1 \circ \ldots \circ \pi _d\), such that each \(\pi _i\) is itself a composition of several disjoint permutations, each acting on T elements, for some parameter T. In our construction \(d = 2\lceil \frac{\log N}{\log T} \rceil -1\).

Our decomposition is based on the special structure of the Benes permutation network. This network has \(2\log N-1\) layers, each containing N/2 2-element permutations (that is, each is either an identity permutation or a swap). Specifically, if inputs are numbered with index \(1, \ldots , N\), where each index is expressed in binary as \(\sigma _1, \ldots , \sigma _n\), then the \(j-\)th layer and the \(2\log N-j\)-th layer contain 2-element permutations, each acting on elements number \(\sigma _1\ldots \sigma _{j -1}0\sigma _{j+1},\ldots \sigma _n\) and \(\sigma _1\ldots \sigma _{j -1}1\sigma _{j+1},\ldots \sigma _n\), for all \(\sigma _1, \ldots \sigma _{j -1}, \sigma _{j+1},\ldots , \sigma _n \in \left\{ 0, 1\right\} ^{n-1}\).

Now we describe our decomposition of \(\pi \) into \(\pi _1 \circ \ldots \circ \pi _d\). For any parameter \(T = 2^t\), \(t \in \mathbb {N}\), set \(d=2\lceil \frac{n}{t} \rceil -1\), and consider Benes network for \(\pi \). We set \(\pi _1\) to consist of first t layers \(1, \ldots , t\) of this network, \(\pi _2\) to consist of next t layers \(t + 1, \ldots , 2t\), and so on, except for the middle permutation \(\pi _{\left\lfloor {\frac{d}{2}}\right\rfloor + 1}\) which consists of \(2t - 1\) layers in the middleFootnote 8. That is, we set each \(\pi _i\), for \(i = 1, \ldots , \left\lfloor {\frac{d}{2}}\right\rfloor \), to consist of t consecutive layers number \(i \cdot t - (t - 1), \ldots , i \cdot t - 1, i \cdot t\), and \(\pi _i\) for \(i = \left\lfloor {\frac{d}{2}}\right\rfloor + 2, \ldots , d\) are defined symmetrically. From the description of Benes layers above, it follows that these t consecutive layers do not permute all N elements together, but instead only permute elements within each group of the form \(\sigma _1, \ldots , \sigma _{i(t-1)}x\sigma _{i\cdot t+1}, \ldots , \sigma _n\), where x includes all t-bit strings, and the remaining \(n - t\) bits \(\sigma _1, \ldots , \sigma _{i(t-1)}, \sigma _{i\cdot t+1}, \ldots , \sigma _n\) are fixed. Therefore it follows that each \(\pi _i\), \(i \ne \left\lfloor {\frac{d}{2}}\right\rfloor + 1\), consists of \(2^{n - t} = N / T\) disjoint permutations, each acting on \(2^t = T\) elements. Similarly, the middle permutation \(\pi _{\left\lfloor {\frac{d}{2}}\right\rfloor + 1}\), consisting of \(2t-1\) layers in the middle of the network, only permutes elements within each group of the form \(\sigma _1, \ldots , \sigma _{n-t}x\), and thus can also be represented as a combination of N/T disjoint permutations each acting on T elements.

Finally, note that the total number of permutations is \(\lceil \frac{(2n-1) - (2t-1)}{t} \rceil + 1 = 2\lceil \frac{n}{t} \rceil -1\). Therefore, \(\pi = \pi _1 \circ \ldots \circ \pi _d\) is indeed a (Td)-subpermutation representation of \(\pi \), for \(d = 2\lceil \frac{n}{t} \rceil -1\).

6 Permute and Share and Secret-Shared Shuffle

Recall that we use \(\pi (\varvec{x})\) for a permutation \(\pi \) and vector \(\varvec{x}\) to mean the permutation which produces \(x_{\pi (1)}, ...,x_{\pi (N)}\).

We will use the Share Translation scheme we presented in the previous scheme to construct first a secure computation for permuting and secret sharing elements where one party chooses the permutation and the other the elements, and then a construction for a full secret-shared shuffle.

6.1 Definitions

We consider the following functionality, which we call Permute+Share, in which one party provides as input a permutation \(\pi \), and the other party provides as input a set of elements \(\varvec{x}\) in group \(\mathbb {G}\), and the output is secret shares of the permuted elements:

$$\mathcal {F}_{ \mathsf{Permute+Share} [N,\ell ]} (\pi ,\varvec{x}) = (\varvec{r},\pi (\varvec{x})-\varvec{r})\text{, } \text{ where } \varvec{r} \leftarrow ^{\$}\mathbb {G}^N.$$

We can also consider the equivalent functionality when the permutation or the initial database is secret shared as input. (Here we consider a secret sharing of permutation \(\pi \) which consists of two permutations \(\pi _0,\pi _1\) such that \(\pi = \pi _0\circ \pi _1\).)

Finally, we define the secret shared shuffle functionality:

$$\mathcal {F}_{ \mathsf{Secret Shared Shuffle} [N,\ell ]} (\varvec{x}_0, \varvec{x}_1) = (\varvec{r},\pi (\varvec{x}_0+\varvec{x}_1)-\varvec{r}),$$

where \(\varvec{r} \leftarrow ^{\$}\mathbb {G}^N\) and \(\pi \) is a random permutation over \(N\) elements.

6.2 \(\mathsf{Permute+Share}\) from Share Translation

Let \(\mathsf {ShareTrans}_T\) be a protocol satisfying the definition in Sect. 4 for permutations on T elements in group \(\mathbb {G}\), where each element can be represented in \(\ell \) bits. Let Td be some parameters such that any permutation in \(S_N\) has \((T, d)-\)subpermutation representation (e.g. \(d = 2\lceil \frac{\log N}{\log T} \rceil -1\) for any \(T = 2^t\), as described in Sect. 5). We construct our permute and share protocol \(\mathsf{Permute+Share}\) using \((T, d)-\)subpermutation representation as follows.

  1. 1.

    \(P_0\) computes the (Td)-subpermutation representation \(\pi _1,\ldots , \pi _d\) of its input \(\pi \).

  2. 2.

    For each layer i, the parties run N/T instances of \(\mathsf {ShareTrans}_T\), with \(P_0\) providing as input the N/T permutations making up \(\pi _i\). (Note that all of these instances and layers can be run in parallel.) For each i, \(P_1\) obtains \(\varvec{a}^{(i,1)}, \ldots ,\varvec{a}^{(i,N/T)}\) and \(\varvec{b}^{(i,1)}, \ldots ,\varvec{b}^{(i,N/T)}\). Call the combined vectors \(\varvec{a}^{(i)}\) and \(\varvec{b}^{(i)}\). Similarly, \(P_0\) obtains \(\varvec{\varDelta }^{(i,1)}, \ldots ,\varvec{\varDelta }^{(i,N/T)}\), which we will call \(\varDelta ^{(i)}\).

  3. 3.

    For each \(i\in 1,\ldots , d-1\), \(P_1\) computes \(\varvec{\delta ^{(i)}} = \varvec{a}^{(i+1)}-\varvec{b}^{(i)}\) and sends it to \(P_0\). \(P_1\) also sends \(\varvec{m} = \varvec{x}+\varvec{a}^{(1)}\), and samples and sends random \(\varvec{w}\). \(P_1\) outputs \(\varvec{b}=\varvec{w}-\varvec{b}^{(d)}\)

  4. 4.

    \(P_0\) computes \(\varvec{\varDelta }=\varvec{\varDelta }^{(d)}+\pi _d(\varvec{\delta }^{(d-1)}+\varvec{\varDelta }^{(d-1)}+\pi _{d-1}(\varvec{\delta }^{(d-2)}+\varvec{\varDelta }^{(d-2)}+....+\pi _2(\varvec{\delta }^{(1)}+\varvec{\varDelta }^{(1)})\) and outputs \(\pi (m)+\varvec{\varDelta }-\varvec{w}\).

Theorem 4

Let \(N\) and \(\ell \) be the number of elements in the database and the size of each element, respectively, and let Td be arbitrary parameters such that any permutation in \(S_N\) has \((T, d)-\)subpermutation representation. Then the construction described above is a Permute+Share protocol secure against static semi-honest corruptions with the following efficiency:

  • The communication cost is \((d+1) N\ell \) bits together with the cost of dN/T Share Translation protocols on T elements each, run in parallel,

  • The computation cost is equal to the cost of dN/T Share Translation protocols on T elements each, run in parallel.Footnote 9

Correctness. By correctness of \(\mathsf {ShareTrans}_T\), for all i \(\varvec{\varDelta }^{(i)}=\varvec{b}^{(i)}-\pi _i(\varvec{a}^{(i)})\). This means that for all i, \(\varvec{\delta }^{(i)}+\varvec{\varDelta }^{(i)} = \varvec{a}^{(i+1)}-\varvec{b}^{(i)}+ \varvec{b}^{(i)}-\pi _i(\varvec{a}^{(i)}) = \varvec{a}^{(i+1)}-\pi _i(\varvec{a}^{(i)})\).

Thus, the final \(\varvec{\varDelta }\) produced by \(P_0\) is

$$\begin{aligned}&\varvec{\varDelta }^{(d)}+\pi _d(\varvec{\delta }^{(d-1)}+\varvec{\varDelta }^{(d-1)}+\pi _{d-1}(\varvec{\delta }^{(d-2)}+\varvec{\varDelta }^{(d-2)}+\ldots +\pi _2(\varvec{\delta }^{(1)}+\varvec{\varDelta }^{(1)})\\ =&\varvec{\varDelta }^{(d)}+\pi _d(\varvec{a}^{(d)}-\pi _{d-1}(\varvec{a}^{(d-1)})+\pi _{d-1}(\varvec{a}^{(d-1)}-\pi _{d-2}(\varvec{a}^{(d-2)})+\ldots +\pi _2(\varvec{a}^{(2)}-\pi _1\varvec{a}^{(1)})))\\ =&\varvec{\varDelta }^{(d)}+\pi _d(\varvec{a}^{(d)} - \pi _{d-1}(\ldots \pi _2(\pi _1\varvec{a}^{(1)})))\\ =&\varvec{b}^{(d)} -\pi _d(\varvec{a}^{(d)})+\pi _d(\varvec{a}^{(d)} - \pi _{d-1}(\ldots \pi _2(\pi _1\varvec{a}^{(1)})))\\ =&\varvec{b}^{(d)} -\pi _d(\pi _{d-1}(\ldots \pi _2(\pi _1(\varvec{a}^{(1)}))))\\ =&\varvec{b}^{(d)} -\pi (\varvec{a}^{(1)}) \end{aligned}$$

The output for \(P_0,P_1\) is:

$$\begin{aligned}&\pi (\varvec{m})+\varvec{\varDelta }-\varvec{w},&\varvec{w}-\varvec{b}^{(d)}\\ =&\pi (\varvec{x}+\varvec{a}^{(1)})+\varvec{\varDelta }-\varvec{w},&\varvec{w}-(\varvec{\varDelta }+\pi (\varvec{a}^{(1)}))\\ =&\pi (\varvec{x})+\pi (\varvec{a}^{1)})+\varvec{\varDelta }-\varvec{w},&-\varvec{\varDelta }-\pi (\varvec{a}^{(1)})+\varvec{w}\end{aligned}$$

If we let \(\varvec{r}= \pi (\varvec{x})+\pi (\varvec{a}^{(1)})+\varvec{\varDelta }-\varvec{w}\), we see that this has the correct distribution.

Security. Our simulator behaves as follows: If \(b=0\) (i.e. \(P_0\) is corrupt): \(\mathsf {sim}(1^\lambda ,0,\pi , \varvec{y}_0)\) will first generate the subpermutations for \(\pi \) as described above, and then internally run all of the \(\mathsf {ShareTrans}_T\) protocols to obtain simulated view for \(P_0\) and \(\varvec{a}^{(1)},\ldots ,\varvec{a}^{(d)}, \varvec{b}^{(1)},\ldots ,\varvec{b}^{(d)}\). Let \(\varvec{\varDelta }^{(1)},\ldots , \varvec{\varDelta }^{(d)}\) be the corresponding values computed by \(P_0\) in these protocols. Choose random \(\varvec{\delta }^{(1)},\ldots , \varvec{\delta }^{(d-1)}\). It then computes \(\varvec{\varDelta }\) as in step 4 of the protocol and sets \(\varvec{w}= -\varvec{y}_0+\pi (\varvec{m})+\varvec{\varDelta }\). It outputs the views from the \(\mathsf {ShareTrans}_T\) protocols and the messages \(\varvec{m},\varvec{w}, \varvec{\delta }^{(1)},\ldots ,\varvec{\delta }^{(d)}\).

If \(b=1\) (i.e. \(P_1\) is corrupt): \(\mathsf {sim}(1^\lambda ,1,\varvec{x}, \varvec{y}_1)\) will pick random \(\pi '\), compute the subpermutations, internally run the \(\mathsf {ShareTrans}_T\) protocols with these permutations to obtain the views for \(P_1\), and compute \(\varvec{b}^{(d)}\) from these runs as in the real protocol. It will set the random tape \(\varvec{w}= \varvec{y}_1+\varvec{b}^{(d)}\). It outputs the view from the \(\mathsf {ShareTrans}_T\) protocols and the random tape \(\varvec{w}\).

We show that this simulator produces an ideal experiment that is indistinguishable from the real experiment. We start with the case where \(b=0\) and show this through a series of games:

  • Real Game: Runs the real experiment. The output is \(P_0\)’s view (its input, the \(\mathsf {view}_0\)s from the Share Translation  protocols and the messages \(\varvec{m}\), \(\varvec{w}\), and \(\varvec{\delta }^{(1)},\ldots , \varvec{\delta }^{(d-1)}\) it receives), and the honest \(P_1\)’s input \(\varvec{x}\) and output \(\varvec{w}-\varvec{b}\).

  • Game 1: As in the previous game except in step 2, compute \(\varvec{\varDelta }^{(i)}\) as \(\varvec{b}^{(i)}-\pi _i(\varvec{a}^{(i)}\) instead of through the \(\mathsf {ShareTrans}_T\) protocols. This is identical by correctness of Share Translation.

  • Game 2: As in the previous game except after step 2 for each i we sample random \(\varvec{a}'^{(i)}\) and compute \({\varvec{b}'}^{(i)} = \pi _i({\varvec{a}'}^{(i)}) + \varvec{\varDelta }^{(i)}\), and then use these values in place of \(\varvec{a}^{(i)},\varvec{b}^{(i)}\) in steps 3 and 4.

    We can show that this is indistinguishable via a series of hybrids, where in hybrid \(H_i\), we use \({\varvec{a}'}^{(j)},{\varvec{b}'}^{(j)}\) for the output of the first i \(\mathsf {ShareTrans}_T\) protocols and \({\varvec{a}}^{(j)},{\varvec{b}}^{(j)}\) for the rest. Then \(H_i, H_{i+1}\) are indistinguishable by the share hiding property of \(\mathsf {ShareTrans}_T\).

  • Game 3: As above, but choose random \(\varvec{m}, \varvec{\delta }^{(1)},\ldots , \varvec{\delta }^{(d-1)}\). Set \({\varvec{a}'}^{(1)}= \varvec{m}-\varvec{x}\). For \(i=1\ldots d\), compute \({\varvec{b}'}^{(i)} = \pi _i({\varvec{a}'}^{(i)}) + \varvec{\varDelta }^{(i)}\) as above, and then set \({\varvec{a}'}^{(i+1)} = \varvec{\delta }^{(i)}-\varvec{b}^{(i)}\). Note that this is distributed identically to Game 2.

  • Game Simulated: The only difference between the simulated game and Game 3 is that in Game 3, \(\varvec{w}\) is chosen at random, and \(P_1\)’s output is computed as \(\varvec{w}-{\varvec{b}'^{(d)}}\), while in Game Simulated, \(P_1\)’s output is random \(\varvec{r}\) and \(\varvec{w}\) is set to \(-\varvec{y}_0 +\pi (\varvec{m})+\varvec{\varDelta }=-(\pi (\varvec{x})-\varvec{r})+\pi (\varvec{m})+\varvec{\varDelta }=\pi ({\varvec{a}'}^{(1)})+\varvec{r}+\varvec{\varDelta }={\varvec{b}'}^{(d)}+\varvec{r}\) by construction of \(\varvec{\varDelta }\). Thus, the two games are identical.

We argue the case when \(b=1\) as follows:

  • Real Game: Runs the real experiment. The output is \(P_1\)’s view (it’s input \(\varvec{x}\), \(\mathsf {view}_1\) from the Share Translation  protocol and the random string \(\varvec{w}\) it chooses) and the honest \(P_0\)’s input \(\pi \) and output \(\pi (\varvec{m})+\varvec{\varDelta }-\varvec{w}\) where \(\varvec{\varDelta }\) is as computed in step 4 of the protocol.

  • Game 1: As in the previous game, but \(P_0\)’s output is \(\pi (x)+\varvec{b}^{(d)}-\varvec{w}\). Note that \(\pi (x)+\varvec{b}^{(d)}-\varvec{w}=\pi (\varvec{x}+\varvec{a}^{(1)})+\varvec{b}^{(d)}-\pi (\varvec{a}^{(1)})-\varvec{w}=\pi (\varvec{m})+\varvec{\varDelta }-\varvec{w}\) where \(\varvec{a}^{(1)}, \varvec{b}^{(d)}\) are the values \(P_1\) obtains from the first and last layer \(\mathsf {ShareTrans}_t\) protocols.

  • Game 2: As in the previous game except run the \(\mathsf {ShareTrans}_T\) protocols with \(\pi _1',\ldots ,\pi _d'\) derived from a random permutation \(\pi '\).

    We can show that this is indistinguishable via a series of hybrids, where in hybrid \(H_i\), we use the subpermutations derived from \(\pi '\) for the first i protocols, and the subpermutations derived from \(\pi \) for the rest. Then \(H_i, H_{i+1}\) are indistinguishable by the permutation hiding property of \(\mathsf {ShareTrans}_T\).

  • Game Simulated: As in the previous game except choose random \(\varvec{r}\) and set \(\varvec{w}=\pi (\varvec{x}) -\varvec{r}+\varvec{b}^{(d)}\). This is identically distributed to Game 1 and identical to the ideal experiment.

6.3 Secret Shared Shuffle  from Permute+Share

The Secret Shared Shuffle protocol proceeds as follows:

  1. 0.

    \(P_0\) and \(P_1\) each choose a random permutation \(\pi _0, \pi _1\leftarrow S_N\).

  2. 1.

    \(P_0\) and \(P_1\) run the Permute+Share  protocol to apply \(\pi _0\) to \(\varvec{x}_1\), resulting in shares \(\varvec{x}_0^{(1)}\) for \(P_0\) and \(\varvec{x}_1^{(1)}\) for \(P_1\).

  3. 2.

    \(P_0\) computes \(\varvec{x}_0^{(2)}=\pi _0(\varvec{x}_0) + \varvec{x}_0^{(1)}\).

  4. 3.

    \(P_1\) and \(P_0\) run the \(\mathsf{Permute+Share}\) protocol to apply \(\pi _1\) to \(\varvec{x}_0^{(2)}\), resulting in shares \(\varvec{x}_1^{(3)}\) for \(P_1\) and \(\varvec{x}_0^{(3)}\) for \(P_0\).

  5. 4.

    \(P_1\) computes \(\varvec{x}_1^{(4)}=\pi _1(\varvec{x}_1^{(1)})+\varvec{x}_1^{(3)}\).

  6. 5.

    \(P_0\) outputs \(\varvec{x}_0^{(3)}\) and \(P_1\) outputs \(\varvec{x}_1^{(4)}\).

Theorem 5

The construction above is a Secret Shared Shuffle protocol secure against static semi-honest corruptions. It’s communication and computation cost is that of invokes 2 sequential Permute+Share’s. (See footnote 9).

Correctness. The output for \(P_0,P_1\) is:

$$\begin{aligned}&\varvec{x}_0^{(3)},&\varvec{x}_1^{(4)}\\ =&\varvec{x}_0^{(3)},&\pi _1(\varvec{x}_1^{(1)})+\varvec{x}_1^{(3)}\\ =&\pi _1(\varvec{x}_0^{(2)})-\varvec{r}^{(3)} ,&\pi _1(\varvec{x}_1^{(1)})+\varvec{r}^{(3)}\\ =&\pi _1(\pi _0(\varvec{x}_0) + \varvec{x}_0^{(1)})-\varvec{r}^{(3)},&\pi _1(\varvec{x}_1^{(1)})+\varvec{r}^{(3)}\\ =&\pi _1(\pi _0(\varvec{x}_0)+\varvec{r}^{(1)})-\varvec{r}^{(3)},&\pi _1(\pi _0(\varvec{x}_1)-\varvec{r}^{(1)})+\varvec{r}^{(3)}\\ =&\pi _1(\pi _0(\varvec{x}_0))+\pi _1(\varvec{r}^{(1)})-\varvec{r}^{(3)},&\pi _1(\pi _0(\varvec{x}_1))-(\pi _1(\varvec{r}^{(1)})-\varvec{r}^{(3)})\\ \end{aligned}$$

Where \(\varvec{r}^{(1)}\) and \(\varvec{r}^{(3)}\) are the values generated by the first and second invocations of Permute+Share. If we let \(\varvec{r}=\pi _1(\pi _0(\varvec{x}_0))+\pi _1(\varvec{r}^{(1)})-\varvec{r}^{(3)}\) and \(\pi = \pi _1\circ \pi _0\) we see that this has the correct distribution.

Security. Our simulator behaves as follows:

If \(b=0\) (i.e. \(P_0\) is corrupt): \(\mathsf {sim}(1^\lambda ,0,\varvec{x}_0, \varvec{y}_0)\) will choose random \(\pi _0, \varvec{x}_0^{(1)}\), set \(\varvec{x}_0^{(2)}=\pi _0(\varvec{x}_0) + \varvec{x}_0^{(1)}\), simulate the view from the first Permute+Share  with \(\mathsf {sim}^{\mathsf{Permute+Share}}(1^\lambda ,0,\pi _0, \varvec{x}_0^{(1)})\), and simulate the view from the second Permute+Share with \(\mathsf {sim}^{\mathsf{Permute+Share}}(1^\lambda ,1, \varvec{x}_0^{(2)},\varvec{y}_0)\).

If \(b=1\) (i.e. \(P_1\) is corrupt): \(\mathsf {sim}(1^\lambda ,1,\varvec{x}_1, \varvec{y}_1)\) will choose random \(\pi _1, \varvec{x}_1^{(1)}\), set \(\varvec{x}_1^{(3)}=\varvec{y}_1-\pi _1(\varvec{x}_1^{(1)})\), simulate the view from the first Permute+Share  with \(\mathsf {sim}^{\mathsf{Permute+Share}}(1^\lambda ,1,\varvec{x}_1, \varvec{x}_1^{(1)})\), and simulate the view from the second Permute+Share  with \(\mathsf {sim}^{\mathsf{Permute+Share}}(1^\lambda ,0, \pi _1,\varvec{x}_1^{(3)})\).

We show that this simulator produces an ideal experiment that is indistinguishable from the real experiment. We start with the case where \(b=0\) and show this through a series of games:

  • Real Game: Runs the real experiment. The output is \(P_0\)’s view (its input \(\varvec{x}_0\), \(\mathsf {view}_0^{(1)}, \mathsf {view}_0^{(2)}\) from the two Permute+Share protocols including the outputs \(\varvec{x}_0^{(1)}, \varvec{x}_0^{(3)}\), and the honest \(P_1\)’s input \(\varvec{x}_1\) and output \(\varvec{x}_1^{(4)}=\pi _1(\varvec{x}_1^{(1)})+\varvec{x}_1^{(3)}\).

  • Game 1: In step 1, first compute \(\mathcal {F}_{\mathsf{Permute+Share}}(\pi _0, \varvec{x}_1)\), i.e. choose random \(\varvec{r}^{(1)}\), and set \(\varvec{x}_0^{(1)} = \varvec{r}^{(1)}\) and \(\varvec{x}_1^{(1)}=\pi _0(\varvec{x}_1)-\varvec{r}^{(1)}\). Then run the Permute+Share simulator to generate the view \({\mathsf {view}_0^{(1)}}'\) for the first Permute+Share. The output is \(P_0\)’s view (its input \(\varvec{x}_0\), \({\mathsf {view}_0^{(1)}}', \mathsf {view}_0^{(2)}\) from the two Permute+Share protocols including its outputs from those protocols \(\varvec{x}_0^{(1)}=\varvec{r}^{(1)}\) and \(\varvec{x}_0^{(3)}\)), and the honest \(P_1\)’s input \(\varvec{x}_1\) and output \(\varvec{x}_1^{(4)}=\pi _1(\varvec{x}_1^{(1)})+\varvec{x}_1^{(3)}=\pi _1(\pi _0(\varvec{x}_1)-\varvec{r}^{(1)})+\varvec{x}_1^{(3)}\). This is indistinguishable by security of the Permute+Share protocol.

  • Game 2: In step 3, first compute \(\mathcal {F}_{\mathsf{Permute+Share}}(\pi _1, \varvec{x}_0^{(2)})\), i.e. choose random \(\varvec{r}^{(3)}\) and set \(\varvec{x}_1^{(3)}=\varvec{r}^{(3)}\) and \(\varvec{x}_0^{(3)}=\pi _1(\varvec{x}_0^{(2)})-\varvec{r}^{(3)}\). Then run the Permute+Share simulator to generate the view \({\mathsf {view}_0^{(2)}}'\) for the second Permute+Share. The output is \(P_0\)’s view (its input \(\varvec{x}_0\), \({\mathsf {view}_0^{(1)}}', {\mathsf {view}_0^{(2)}}'\) from the two Permute+Share protocols including its outputs from those protocols \(\varvec{x}_0^{(1)}=\varvec{r}^{(1)}\) and \(\varvec{x}_0^{(3)}=\pi _1(\varvec{x}_0^{(2)})-\varvec{r}^{(3)}\)), and the honest \(P_1\)’s input \(\varvec{x}_1\) and output \(\varvec{x}_1^{(4)}=\pi _1(\pi _0(\varvec{x}_1)-\varvec{r}^{(1)})+\varvec{x}_1^{(3)} =\pi _1(\pi _0(\varvec{x}_1)-\varvec{r}^{(1)})+\varvec{r}^{(3)}\). This is again indistinguishable by security of the Permute+Share protocol.

  • Game 3: Choose random \(\pi , \varvec{r}, \varvec{x}_0^{(1)}\). Set \(\pi _1 = \pi \circ \pi _0^{-1}\), \(\varvec{r}^{(1)} = \varvec{x}_0^{(1)}\) and \(\varvec{r}^{(3)} = \pi _1(\pi _0(\varvec{x}_0))+\pi _1(\varvec{r}^{(1)})-\varvec{r}\). Other than that, proceed as in Game 2. The output is \(P_0\)’s view (its input \(\varvec{x}_0\), \({\mathsf {view}_0^{(1)}}', {\mathsf {view}_0^{(2)}}'\) from the two Permute+Share protocols including its outputs from those protocols \(\varvec{x}_0^{(1)}=\varvec{r}^{(1)}\) and \(\varvec{x}_0^{(3)}\)), and the honest \(P_1\)’s input \(\varvec{x}_1\) and output \(\varvec{x}^{(4)})\). This is identically distributed to Game 2. \(P_1\)’s output in this game is

    $$\begin{aligned} \varvec{x}_1^{(4)}&= \pi _1(\varvec{x}_1^{(1)})+\varvec{x}_1^{(3)}\\&= \pi _1(\varvec{x}_1^{(1)})+\pi _1(\varvec{x}_0^{(2)})-\varvec{x}_0^{(3)}\\&= \pi _1(\varvec{x}_1^{(1)})+\pi _1(\pi _0(\varvec{x}_0) + \varvec{x}_0^{(1)})-\varvec{x}_0^{(3)}\\&= \pi _1(\pi _0(\varvec{x}_1)-\varvec{x}_0^{(1)})+\pi _1(\pi _0(\varvec{x}_0) + \varvec{x}_0^{(1)})-\varvec{x}_0^{(3)}\\&= \pi _1(\pi _0(\varvec{x}_1+\varvec{x}_0))-\varvec{x}_0^{(3)}\\&= \pi (\varvec{x}_1+\varvec{x}_0)-\varvec{x}_0^{(3)} \end{aligned}$$

    Thus, this is identical to the ideal experiment.

Next, we turn to the case where \(b=1\).

  • Real Game: Runs the real experiment

  • Game 1: In step 1, first compute \(\mathcal {F}_{\mathsf{Permute+Share}}(\pi _0, \varvec{x}_1)\), i.e. choose random \(\varvec{x}_0^{(1)}\), and then compute \(\varvec{x}_1^{(1)}=\pi _0(\varvec{x}_1)-\varvec{x}_0^{(1)}\). Then run the Permute+Share simulator to generate the view for the first Permute+Share. This is indistinguishable by security of the Permute+Share protocol.

  • Game 2: In step 3, first compute \(\mathcal {F}_{\mathsf{Permute+Share}}(\pi _1, \varvec{x}_0^{(2)})\), i.e. choose random \(\varvec{x}_1^{(3)}\), and then compute \(\varvec{x}_0^{(3)}=\pi _1(\varvec{x}_0^{(2)})-\varvec{x}_1^{(3)}\). Then run the Permute+Share simulator to generate the view for the second Permute+Share. This is again indistinguishable by security of the Permute+Share protocol.

  • Game 3: Choose random \(\varvec{x}_0^{(3)}\). Set \(\varvec{x}_1^{(3)}=\pi _1(\varvec{x}_0^{(2)})-\varvec{x}_0^{(3)}\). Other than that, proceed as in Game 2. This is identically distributed to Game 2.

  • Game 4: Choose random \(\pi \), set \(\pi _0 = \pi _1^{-1}\circ \pi \) and set \(\varvec{x}_1^{(3)} = \pi (\varvec{x}_0+\varvec{x}_1)-\pi _1(\varvec{x}_1^{(1)}) -\varvec{x}_0^{(3)}\). Note that this means \(\varvec{x}_1^{(4)} = \pi (\varvec{x}_0+\varvec{x}_1) -\varvec{x}_0^{(3)}\) so this is distributed identically to the ideal experiment. Note also that this is distributed identically to Game 3, because:

    $$\begin{aligned}&\pi _1(\varvec{x}_0^{(2)})-\varvec{x}_0^{(3)} \\&= \pi _1(\pi _0(\varvec{x}_0)+\varvec{x}_0^{(1)})-\varvec{x}_0^{(3)} \\&= \pi _1(\pi _0(\varvec{x}_0)+\pi _0(\varvec{x}_1)-\varvec{x}_1^{(1)})-\varvec{x}_0^{(3)} \\&= \pi _1(\pi _0(\varvec{x}_0+ \varvec{x}_1))-\pi _1(\varvec{x}_1^{(1)})-\varvec{x}_0^{(3)} \\&= \pi (\varvec{x}_0+ \varvec{x}_1)-\pi _1(\varvec{x}_1^{(1)})-\varvec{x}_0^{(3)} \\ \end{aligned}$$

7 Experimental Evaluation

In this section, we compare the solution for our \(\mathsf{Permute+Share}\) with public key based solution and with the best previous permutation network based solution [22]. We consider \(\mathsf{Permute+Share}\) where party \(P_0\) starts with a permutation \(\pi \) and party \(P_1\) starts with a input vector \(\varvec{x}\) of N strings in \(\{0,1\}^\ell \).

We take a microbenchmarking approach to estimating the cost of the two protocols, where we first empirically estimate the cost of the individual operations (AES computations or RSA group operations), and then use that number to estimate the cost of the full protocol, by plugging the time of individual operations into the formula for execution time of the protocol. For example, for our protocol and the protocol of [22], both of which are dominated by AES operations, we estimate the cost as follows:

  1. 1.

    We compute the computation cost in terms of number of AES calls.

  2. 2.

    We empirically estimate the cost for a computing fixed key AES (per 128-bit block).

  3. 3.

    We compute the communication cost in bits.

  4. 4.

    Then we compute the time to communicate the calculated number of bits using various networks (bandwidth 72 Mbps, 100 Mbps and 1 Gbps).

  5. 5.

    The total time reported is number of AES calls \(\times \) the cost of a single AES \(+\) the size of communication/sbandwidth.

In the following we will describe our cost estimates in more detail and then present a detailed comparison. First we discuss some specifics on how we implement AES and how we analyze the cost of OT, then we present formulas for the number of basic operations required for each solution, then we describe how we estimate the cost of these operations, and finally we present the detailed efficiency comparison.

7.1 More Detail on Cost of OT and AES

Fixed key Block Ciphers. The symmetric key based protocols (ours and the one described in [22]) rely on two fundamental building blocks, namely, Oblivious Transfer extension (OTe) [16] and GGM PRG [10]. Typically, published OTe protocols are based on a hash function that is modeled as a random oracle. However, in most of the recent implementations, the hash function is instantiated, somewhat haphazardly, using fixed key block ciphers (AES). In a recent work [13], the authors provided a principled way of implementing [16] using fixed key AES and formally proved that it is secure. The authors also propose that the length doubling PRG used in GGM [10] can be implemented using fixed key AES for better efficiency, though they do not prove it. Here, we first prove that it is safe to use this optimized PRG construction [13], and then use it in our experiments. In our experiments, we will also use the fixed-key AES-based length extension technique for stretching short messages into longer ones (both for OTe and for OPV message length extension) described in Section 6.1 in [13].

The optimized PRG construction is based on correlation-robust hash (\(\mathsf {CRH}\)) function [13, 16]. Roughly, H is said to be correlation-robust if the keyed function \(f_R(x) = H(x \oplus R)\) is pseudorandom, as long as R is sufficiently random. Given a \(\mathsf {CRH}\) H, the length doubling PRG is constructed as follows: \(\mathsf {G}(x) = H(1 \oplus x) \circ H(2 \oplus x)\). We give more details in the full version.

In our experiments, we will use the following concrete instantiation of \(\mathsf {CRH}\) [13]: \(H(x) = \pi (x) \oplus x\) where \(\pi (.)\) is a fixed key block cipher, e.g. AES.

OT Extension Costs. The computation in OT extension consists of \(O(m\ell )\) bitwise operations (ANDs an XORs), running \(\lambda \) public key OTs, and \(O(m+ m\ell /\lambda )\) AES operations as discussed above. This means that for sufficiently large m, like those we consider, the cost is dominated by the AES operations, as can be verified empirically using any standard OT library. For example, we benchmark Naor-Pinkas base OT (dubbed NPOT) using  [27] and the average time to run 128 base OTs is 13 ms. As a result, we can focus our analysis of computational costs on the AES operations.

In our experiments, we simulate the cost of OT-extension as follows. The cost is reported in number of fixed-key AES calls for sender and receiver and communication is reported in number of bits. For random OT’s on strings of length \(\ell > \lambda =128\) bits, we use IKNP OT-extension protocol with fixed-key AES optimization [13]. The cost of m Random OTs on messages of length \(\ell \) bits is shown in Table 1, where the terms \(2m\ell /\lambda \) for sender and \(m\ell /\lambda \) for receiver are for extending the random messages from \(\lambda \) to \(\ell \) bits. We denote this functionality as \(\mathsf {ROT}_\ell ^m\). For \(\ell =\lambda \), no message length extension is required (both for \(\mathsf {ROT}\) and \(\mathsf {SOT}\)). Fixed message OT’s or standard OTs (\(\mathsf {SOT}\)) are obtained from \(\mathsf {ROT}\) by using the \(\mathsf {ROT}\) messages as one-time pads for the actual messages. So \(\mathsf {SOT}_\ell ^m\) adds an additional \(2m\ell \) bits of communication over \(\mathsf {ROT}_\ell ^m\), i.e., the communication cost of \(\mathsf {SOT}_\ell ^m\) is \(m(\lambda +2\ell )\) bits. There is no additional computation overhead (except some additional XORs, which we ignore).

Table 1. The computation and communication cost for variants of OT extension.

7.2 Analyzing the Cost of Each Solution

As discussed above, in the following estimates we only count computation time of AES, not base OTs or XORs, since the latter are fairly small.

Let \(N\) be the number of elements in the database, \(\ell \) be the length of each element, \(\lambda \) be the security parameter, and T be the size of subpermutations. Let \( d= 2\lceil \log N/\log T \rceil -1\).

Our Protocol: The compute cost of our \(\mathsf{Permute+Share}\) protocol is the compute cost of \(d N/T\) \(\mathsf {ShareTrans}_T\)’s, where \(d = 2\lceil \log N/ \log T \rceil -1\). The communication includes the cost of \(d N/T\) \(\mathsf {ShareTrans}_T\)’s + \((d+1)N\ell \) bits.

Each \(\mathsf {ShareTrans}_T\) protocol requires \(\mathsf {SOT}_\lambda ^{T \log T}\) and \(T^2(2+ \ell /\lambda )\) local fixed key AES calls (for both parties) which includes PRG calls in the GGM tree and message length extension and for the underlying \(\mathsf {OPV}\) protocol. There is no additional communication over the cost of \(\mathsf {SOT}_\lambda ^{T \log T}\).

  • Computation Cost. The number of AES calls (for each of sender and receiver) is the following:

    $$3dN \log T + dNT(2+\ell /\lambda ) $$
  • Communication Cost. Communication in number of bits is the following:

    $$3dN\lambda \log T + (d+1)N\ell $$

Protocol from [22]: This \(\mathsf{Permute+Share}\) requires \(\mathsf {SOT}_{2l}^{N\log N-N/2}\) and has an additional \( 2N\ell \) bits communication overhead.

So, the total computation and computation costs are the following:

  • Computation Cost: Number of AES calls for receiver (receiver is slightly more efficient than sender) in the protocol of [22] is the following:

    $$3( N \log N - N/2) + 2 (\ell /\lambda ) (N \log N - N/2)$$
  • Communication Cost: Communication in number of bits in the protocol of [22] is the following:

    $$(N \log N - N/2) (\lambda + 4 \ell ) + 2\,N \ell $$

Paillier Based Solution. In the full version we describe a solution based on additively homomorphic encryption in which \(P_1\) encrypts his data and sends it to \(P_0\), who permutes the ciphertexts, randomizes them, and adds a random share to each before returning them to \(P_0\); \(P_0\) outputs the decryptions and \(P_1\) outputs the random shares he added.

In this protocol, since every element of \(\varvec{x}\) has to be encrypted and the encryption message space in defined to be \(\mathbb {Z}_n\), each element has to be broken into blocks of size n. This means that \(P_0\) computes \(N* \lceil \ell /n\rceil \) encryptions and \(P_1\) computes \(N* \lceil \ell /n\rceil \) ciphertext randomizations and ciphertext-plaintext multiplications. The communication for this protocol is \(N* \lceil \ell /n \rceil *2n\) bits. To get a very rough estimate of the cost of Paillier encryption and randomization+multiplication, we measure the cost of an RSA signing operation with modulus n. Note that this is a significant underestimate since Paillier operations actually happen in \(\mathbb {Z}_n^2\), and since the RSA signer knows the factorization of n, while \(P_1\) does not.

7.3 Microbenchmarking

To estimate the per block cost of AES, we use the permute_block function in prp of [27] to benchmark the cost of a fixed key AES-ECB 128 per 128-bit block (we use security parameter \(\lambda =128\) for our experiments). To get this cost, we run fixed key AES for different numbers of blocks (4096, 8192, 12288) to get the amortized cost of a single AES. We repeat each experiment 100 times and then report the average amortized cost per 128-bit block (no significant variance was noticeable).

For estimating the cost of a single encryption and a single ciphertext randomization for the Paillier based protocol, we use the RSA signing cost for modulus of size 4096. We get this cost using the OpenSSL benchmark [7] by running the command openssl speed.

The costs we get are the following: AES-ECB 128: 3.5 ns, RSA 4096 signing 0.17 s. All the benchmarks are run on a Macbook Pro 2017 with a 3.1 GHz Intel core i-7 processor and 16 GB of 2133 MHz LPDDR3 RAM.

7.4 Performance Comparison

We estimate the performance of the different constructions described above. For this estimation, we experiment with three different database sizes, \(N = 2^{20}, 2^{24}\) and \(2^{32}\) elements and three different network bandwidths, 72 Mbps, 100 Mbps and 1 Gbps. We vary the length of each element in the database from 640 bits to 64000 bits. This range of values is roughly inspired from Machine Learning training applications which has 100s to 1000s of features (with each feature represented by a 64 bit integer).

In the following graphs on Figs. 3, 4 and 5 we report the estimated running time of our protocol and the protocol from [22]. We do not report the running time of the PKE based protocol in the graphs since they are 2–3 orders of magnitude slower compared to our protocol. Instead we summarize their performance in Table 2.

Table 2. Comparative performance of our protocol vs PKE based protocol
Table 3. Comparative performance of our protocol vs  [22] based protocol for \(N=T=128\)

In addition, we summarize how we compare with the protocol from [22] for relatively small \(N\) but long elements (640–64000 bits) in Table 3. We get a performance gain of 3–12x depending on the speed of the network.

Fig. 3.
figure 3

Total running time of Permute+Share for 72 Mbps network

7.5 Choosing Optimal Subpermutation Size T

We choose the best value of T empirically: by fixing desired \(N, \ell \), enumerating over all possible T from 2 to \(N\), and using the following formula to find the running time for each value of T (as before, d is set to be \(2\lceil \log N/ \log T \rceil -1\)), which is: \((3dN \log T + dNT(2+\ell /\lambda ))\cdot \mathsf {TimePerAES} + (3dN\lambda \log T + (d+1)N\ell )\cdot \mathsf {TimePerBitSent} .\) We give more details in the full version.

Fig. 4.
figure 4

Total running time of Permute+Share for 100 Mbps network

Fig. 5.
figure 5

Total running time of Permute+Share for 1 Gbps network