1 Introduction

Secure multi-party computation (MPC) allows mutually distrusting parties to compute securely over their private data. Secure computation of most functionalities requires expensive public-key primitives such as oblivious transfer, even in the semi-honest setting.Footnote 1 We can effortlessly adjust most of these existing secure computation protocols so that they offload a significant fraction of their complex operations to an offline preprocessing phase. Subsequently, during an online phase, parties can implement extremely fast secure computation protocols. In fact, several specialized protocols optimize MPC for this online-offline paradigm [4, 6, 7, 15, 18, 28, 29, 31, 37].

For instance, in the two-party setting, we envision this offline phase as a secure implementation of a trusted dealer who generates private albeit correlated shares \((r_A,r_B)\) for Alice and Bob, respectively, sampled from an appropriate joint distribution \((R_A,R_B)\), referred to as a correlation. This versatile framework allows the implementation of this trusted dealer using computational hardness assumptions, secure hardware, trusted hardware, or physical processes. Furthermore, this offline phase is independent of the final functionality to be computed, as well as the parties’ private inputs.

A particularly useful correlation is the random oblivious transfer correlation, represented by . One sample of this correlation generates three random bits \(x_0,x_1,b\) and provides private shares \(r_A=(x_0,x_1)\) to Alice, and \(r_B=(b,x_b)\) to Bob. Note that Alice does not know the choice bit b, and Bob does not know the other bit \(x_{1-b}\). Let \(\mathcal {F}\) be the class of functionalities that admit 2-message secure computation protocols in the -hybrid [10, 26]. Note that \(\mathcal {F}\) includes the powerful class of functions that have a decomposable randomized encoding [3, 5, 25]. Alice and Bob can compute the required s in the offline phase. Then, they can compute any functionality from this class using 2-messages, a protocol exhibiting optimal message complexityFootnote 2 and (essentially) optimal efficiency in the usage of cryptographic resources.

However, the private share of the honest party is susceptible to leakage attacks by an adversary, both during the generation of the shares and the duration of storing the shares. We emphasize that the leakage need not necessarily reveal individual bits of the honest party’s share. The leakage can be on the entire share and encode crucial global information that can potentially jeopardize the security of the secure computation protocol. This concern naturally leads to the following fundamental question.

“Can we preserve the security and efficiency of the secure computation during the online phase despite the adversarial leakage on the honest party’s shares?”

Using the class \(\mathcal {F}\) of functionalities (defined above) as a yardstick, let us determine the primary hurdle towards a positive resolution of this question. In the sequel, \(\mathcal {F}_{m/2} \subset \mathcal {F}\) is the set of all two-party functionalities that have a 2-message protocol in -hybrid, i.e., parties start with m / 2 independent samplesFootnote 3 from the correlation. In the leaky correlation setting where an adversary has already leaked global information from the private share of the honest party, our objective is to design an (asymptotically) optimal secure computation protocol for the functionalities in \(\mathcal {F}_{m/2}\). That is, starting with leaky correlations (of size n), we want to compute any \(F \in \mathcal {F}_{m/2}\) such that \(m = \varTheta (n)\) via a 2-message protocol despite \(t=\varTheta (n)\) bits of leakage. We note that this task is equivalent to the task of constructing a secure computation protocol for the particular functionality that also belongs to \(\mathcal {F}_{m/2}\). This observation follows from the parallel composition of the secure protocol implementing the functionality from leaky correlations with the 2-message protocol for F in the -hybrid. To summarize, our overall objective of designing optimal secure computation protocols from leaky correlations reduces to the following equivalent goal.

“Construct a 2-message protocol to compute securely, where \(m\!=\!\varTheta (n)\), from the leaky correlation in spite of \(t=\varTheta (n)\) bits of leakage.”

Note that in the -hybrid, both parties have private share of size n bits. The above problem is identical to correlation extractors introduced in the seminal work of Ishai, Kushilevitz, Ostrovsky, and Sahai [26].

Correlation Extractors. Ishai et al. [26] introduced the notion of correlation extractors as an interactive protocol that takes a leaky correlation as input and outputs a new correlation that is secure. Prior correlation extractors either used four messages [26] or had a sub-linear production [9, 22], i.e., \(m=o(n)\). We construct the first 2-message correlation extractor that has a linear production and leakage resilience, that is, \(m=\varTheta (n)\) and \(t=\varTheta (n)\). Note that even computationally secure protocols can use the output of the correlation extractor in the online phase. Section 1.1 formally defines correlation extractors, and we present our main contributions in Sect. 1.2.

1.1 Correlation Extractors and Security Model

We consider the standard model of Ishai et al. [26], which is also used by the subsequent works, for 2-party semi-honest secure computation in the preprocessing model. In the preprocessing step, a trusted dealer draws a sample of shares \((r_A,r_B)\) from the joint distribution of correlated private randomness \((R_A,R_B)\). The dealer provides the secret share \(r_A\) to Alice and \(r_B\) to Bob. Moreover, the adversarial party can perform an arbitrary t-bits of leakage on the secret share of the honest party at the end of the preprocessing step. We represent this leaky correlation hybridFootnote 4 as \((R_A,R_B)^{[t]}\).

Definition 1

(Correlation Extractor). Let \((R_A,R_B)\) be a correlated private randomness such that the secret share of each party is n-bits. An \((n,m,t,\varepsilon )\)-correlation extractor for \((R_A,R_B)\) is a two-party interactive protocol in the \((R_A,R_B)^{[t]}\)-hybrid that securely implements the functionality against information-theoretic semi-honest adversaries with \(\varepsilon \) simulation error.

Note that the size of the secret shares output by the correlation extractor is m-bits. We emphasize that no leakage occurs during the correlation extractor execution. The t-bit leakage cumulatively accounts for all the leakage before the beginning of the online phase. We note that, throughout this work, we shall always normalize the total length of the input shares of each party to n-bits.

1.2 Our Contribution

Recall that \(\mathcal {F}_{m/2} \subset \mathcal {F}\) is the set of all two-party functionalities that have a 2-message protocol in the -hybrid. We prove the following results.

Theorem 1

(Asymptotically Optimal Secure Computation from Leaky Correlations). There exists a correlation \((R_A,R_B)\) that produces n-bit secret shares such that for all \(F\in \mathcal {F}_{m/2}\) there exists a 2-message secure computation protocol for F in the leaky \((R_A,R_B)^{[t]}\)-hybrid, where \(m=\varTheta (n)\) and \(t=\varTheta (n)\), with exponentially low simulation error.

The crucial ingredient of Theorem 1 is our new 2-message \((n,m,t,\varepsilon )\)-correlation extractor for . We compose the 2-message secure computation protocol for functionalities in \(\mathcal {F}_{m/2}\) in the -hybrid with our correlation extractor. Our work presents the first 2-message correlation extractor that has a linear production and a linear leakage resilience (along with exponentially low insecurity).

Theorem 2

(Asymptotically Optimal Correlation Extractor for ). There exists a 2-message \((n,m,t,\varepsilon )\)-correlation extractor for such that \(m=\varTheta (n)\), \(t=\varTheta (n)\), and \(\varepsilon =\exp (-\varTheta (n))\).

The technical heart of the correlation extractor of Theorem 2 is another correlation extractor (see Theorem 3) for a generalization of the correlation. For any finite field \({\mathbb F}\), the random oblivious linear-function evaluation correlation over \({\mathbb F}\) [36, 42], represented by , samples random \(a,b,x\in {\mathbb F} \) and defines \(r_A=(a,b)\) and \(r_B=(x,z)\), where \(z=ax+b\). Note that, for \({\mathbb F} ={\mathbb G} {\mathbb F} \left[ 2\right] \), we have \((x_0+x_1)b+x_0 = x_b\); therefore, the correlation is identical to the correlation. One share of the correlation has secret share size \(2\lg \left| {{\mathbb F}}\right| \). In particular, the correlation provides each party with \(n/2\lg \left| {{\mathbb F}}\right| \) independent samples from the correlation and the secret share size of each party is n-bits for suitable constant sized field \({\mathbb F} \).

Theorem 3

(Asymptotically Optimal Correlation Extractor for ). There exists a 2-message \((n,m,t,\varepsilon )\)-correlation extractor for such that \(m=\varTheta (n)\), \(t=\varTheta (n)\), and \(\varepsilon =\exp (-\varTheta (n))\).

In Fig. 4, we present our correlation extractor that outputs fresh samples from the same correlation. Finally, our construction obtains multiple samples from each output sample using the OT embedding technique of [9]. Figure 1 positions our contribution vis-à-vis the previous state-of-the-art. In particular, Fig. 1 highlights the fact that our result simultaneously achieves the best qualitative parameters. Our results are also quantitatively better than the previous works and we discuss the concrete performance numbers we obtain for Theorem 3 and Theorem 2 below. For more detailed numerical comparison with prior works [9, 22, 26], refer to Sect. 5.

Fig. 1.
figure 1

A qualitative summary of our correlation extractor constructions and a comparison to prior relevant works. Here \({\mathbb K} \) is a finite field and \({\mathbb F} \) is a finite field of constant size. The \(\mathsf {IP} \big ({{\mathbb K} ^s}\big ) \) is a correlation that samples random \(r_A=(u_1,\cdots ,u_s)\in {\mathbb K} ^s\) and \(r_B=(v_1,\cdots ,v_s)\in {\mathbb K} ^s\) such that \(u_1v_1+\cdots +u_sv_s=0\). All correlations are normalized so that each party gets an n-bit secret share. The parameter g is the gap to maximal leakage resilience such that. \(g > 0\).

Performance of Correlation Extractors for (Theorem 3). Our correlation extractor for relies on the existence of suitable Algebraic Geometry (AG) codesFootnote 5 over finite field \({\mathbb F}\), such that \(\left| {{\mathbb F}}\right| \) is an even power of a prime and \(\left| {{\mathbb F}}\right| \geqslant 49\). We shall use \({\mathbb F}\) that is a finite field with characteristic 2.

As the size of the field \({\mathbb F} \) increases, the “quality” of the Algebraic Geometry codes get better. However, the efficiency of the BMN OT embedding protocol [9] used to obtain the output in our construction decreases with increasing \(\left| {{\mathbb F}}\right| \). For example, with \({\mathbb F} ={\mathbb G} {\mathbb F} \left[ 2^{14}\right] \) we achieve the highest production rate \(m/n=16.32\%\) if the fractional leakage rate is \(t/n=1\%\). For leakage rate \(t/n=10\%\), we achieve production rate \(m/n =10\%\). Figure 7 (Sect. 5) and Fig. 9 (Sect. 6) summarize these tradeoffs for various choices of the finite field \({\mathbb F}\).

Performance of Correlation Extractors for (Theorem 2). We know extremely efficient algorithms that use multiplications over \({\mathbb G} {\mathbb F} \left[ 2\right] \) to emulate multiplications over any \({\mathbb G} {\mathbb F} \left[ 2^s\right] \) [12, 14]. For example, we can use 15 multiplications over \({\mathbb G} {\mathbb F} \left[ 2\right] \) to emulate one multiplication over \({\mathbb G} {\mathbb F} \left[ 2^6\right] \). Therefore, we can use 15 samples of to perform one with perfect semi-honest security. Note that, by applying this protocol, the share sizes reduce by a factor of 6/15. In general, using this technique, we can convert the leaky (equivalently, ) correlation, into a leaky correlation, where \({\mathbb F} \) is a finite field of characteristic 2, by incurring a slight multiplicative loss in the share size. Now, we can apply the correlation extractor for discussed above. By optimizing the choice of the field \({\mathbb F}\) (in our case \({\mathbb F} ={\mathbb G} {\mathbb F} \left[ 2^{10}\right] \)), we can construct a 2-message correlation extractor for with fractional leakage rate \(t/n=1\%\) and achieve production rate of \(m/n=4.20\%\) (see Fig. 8, Sect. 5). This is several orders of magnitude better than the production and resilience of the IKOS correlation extractor and uses less number of messages.Footnote 6

High Leakage Resilience Setting. Ishai et al. [27] showed that \(t< n/4\) is necessary to extract even one new sample of from the leaky correlation. Our construction, when instantiated with a suitably large constant-size field \({\mathbb F}\), demonstrates that if \(t\leqslant (1/4 - g)n\) then we can extract \(\varTheta (n)\) new samples of the correlation. The prior construction of [22] only achieves a sub-linear production by using sub-sampling techniques.

Theorem 4

(Near Optimal Resilience with Linear Production). For every \(g\in (0,1/4]\), there exists a finite field \({\mathbb F} \) with characteristic 2 and a 2-message \((n,m,t,\varepsilon )\)-correlation extractor for , where \(t=(1/4-g)n\), \(m=\varTheta (n)\), and \(\varepsilon =\exp (-\varTheta (n))\).

The production \(m=\varTheta (n)\) depends on the constant g, the gap to optimal fractional resilience. We prove Theorem 4 in the full version of our work [8]. Section 5 shows that we can achieve linear production even for \(t=0.22 n\) bits of leakage using \({\mathbb F} ={\mathbb G} {\mathbb F} \left[ 2^{10}\right] \).

Correlation Extractors for Arbitrary Correlations. Similar to the construction of IKOS, we can also construct a correlation extractor from any correlation and output samples of any correlation; albeit it is not round optimal anymore. However, our construction achieves overall better production and leakage resilience than IKOS because our correlation extractor for has higher production and resilience. Figure 2 outlines a comparison of these two correlation extractor construction for the general case.

Fig. 2.
figure 2

General correlation extractors that extract arbitrary correlations from arbitrary correlations. Above is the expanded IKOS [26] correlation extractor and below is ours. Our main contribution is shown in highlighted part. For brevity, it is implicit that there are multiple samples of the correlations. The correlations are over suitable constant size fields. The superscript “(t]” represents that the correlation is secure against adversarial leakage of only one a priori fixed party.

1.3 Other Prior Relevant Works

Figure 1 already provides the summary of the current state-of-the-art in correlation extractors. In this section, we summarize works related to combiners: extractors where the adversary is restricted to leaking individual bits of the honest party’s secret share. The study of OT combiners was initiated by Harnik et al. [24]. Since then, there has been work on several variants and extensions of OT combiners [23, 28, 33, 34, 39]. Recently, Ishai et al. [27] constructed OT combiners with nearly optimal leakage resilience. Among these works, the most relevant to our paper are the ones by Meier, Przydatek, and Wullschleger [34] and Przydatek, and Wullschleger [39]. They use Reed-Solomon codes to construct two-message error-tolerantFootnote 7 combiners that produce fresh over large fieldsFootnote 8 from over the same field. Using multiplication friendly secret sharing schemes based on Algebraic Geometry Codes introduced by Chen and Cramer [13], a similar construction works with over fields with appropriate constant size. We emphasize that this construction is insecure if an adversary can perform even 1-bit global leakage on the whole secret of the other party. In our construction, we crucially rely on a family of linear codes instead of a particular choice of the linear code to circumvent this bottleneck. Section 1.4 provides the principal technical ideas underlying our correlation extractor construction.

In the malicious setting, the feasibility result on malicious-secure combiners for is reported in [28]. Recently, Cascudo et al. construct a malicious-secure combiner with high resilience, but \(m=1\) [11]. The case of malicious-secure correlation extractors remains entirely unexplored.

1.4 Technical Overview

At the heart of our correlation extractor constructions is a 2-message -to- extractor, where we start with leaky and produce fresh secure sample of . The field \({\mathbb F} \) is a constant-size field with characteristic 2, say \({\mathbb F} ={\mathbb G} {\mathbb F} \left[ 2^6\right] \), and each party gets n-bit shares. Below, we discuss some of the technical ideas underlying this construction.

This correlation extractor relies on the existence of a family of linear codes over \({\mathbb F} \) with suitable properties that we define below. For this discussion, let us assume that \(s\in {\mathbb N} \) is the block-length of the codes. Let \({\mathcal J} \) be an index set, and we denote the family of linear codes with block-length s as follows: \({\mathcal C} = \left\{ C_j :j\in {\mathcal J} \right\} \). This family of code \({\mathcal C}\) needs to have the following properties.

  1. 1.

    Multiplication Friendly Good Codes. Each code \(C_j\subseteq {\mathbb F} ^s\) in the family \({\mathcal C} \) is a good code, i.e., its rate and distance is \(\varTheta (s)\). Further, the Schur-productFootnote 9 of the codes, i.e., \(C_j* C_j\), is a linear code with distance \(\varTheta (s)\). Such codes can be used to perform the multiplication of two secrets by multiplying their respective secret shares in secure computation protocols, hence the name.

  2. 2.

    Small Bias Family. Intuitively, a small bias family defines a pseudorandom distribution for linear tests. Let \(S=(S_1,\cdots ,S_s)\in {\mathbb F} ^s\) and its corresponding linear test be defined as . Consider the distribution D of \(L_S(c)\) for a random \(j \in {\mathcal J} \) and a randomly sampled codeword \(c\in C_j\). If \({\mathcal C} \) is a family of \(\rho \)-biased distributions, then the distribution D has statistical distance at most \(\rho \) from the output of \(L_S(u)\) for random element \(u\in {\mathbb F} ^s\). For brevity, we say that the family \({\mathcal C}\)\(\rho \)-fools the linear test \(L_S\).” The concept of small bias distributions was introduced in [1, 35] and has found diverse applications, for example, [2, 17, 20, 35]. An interesting property of any linear code \(C\subseteq {\mathbb F} ^s\) is the following. A random codeword \(c\in C\) can 0-fool every linear test \(L_S\) such that S is not a codeword in the dual of C. However, if S is a codeword in the dual of the code C, then the linear test \(L_S\) is clearly not fooled.

    So, a randomly chosen codeword from one fixed linear code cannot fool all linear tests. However, when we consider an appropriate family of linear codes, then a randomly chosen codeword from a randomly chosen code in this family can fool every linear test.

We construct such a family of codes over small finite fields \({\mathbb F}\) that can be of potential independent interest. Our starting point is an explicit Algebraic Geometry code \(C\subseteq {\mathbb F} ^s\) that is multiplication friendly [19, 21]. Given one such code C, we randomly “twist then permute” the code to define the family \({\mathcal C} \). We emphasize that the production of our correlation extractor relies on the bias being small. So, it is crucial to construct a family with extremely small bias. Next, we describe our “twist then permute” operation.

Twist then Permute.Footnote 10 Suppose \(C \subseteq {\mathbb F} ^s\) is a linear code. Pick any \(\lambda =(\lambda _1,\cdots ,\lambda _s)\in ({\mathbb F} ^*)^s\), i.e., for all \(i \in [s]\), \(\lambda _i \ne 0\). A \(\lambda \)-twist of the code C is defined as the following linear code

Let \(\pi :\{1,\cdots ,s\}\rightarrow \{1,\cdots ,s\}\) be a permutation. The \(\pi \)-permutation of the \(\lambda \)-twist of C is defined as the following linear code

Define \({\mathcal J} \) as the set of all \((\pi ,\lambda )\) such that \(\lambda \in ({\mathbb F} ^*)^s\) and \(\pi \) is a permutation of the set \(\{1,\cdots ,s\}\). Note that if C is multiplication friendly good code, then the code \(C_{\pi ,\lambda }\) continues to be multiplication friendly good code. A key observation towards demonstrating that \({\mathcal C} \) is a family of small bias distributions is that the following two distributions are identical (see Claim 2).

  1. 1.

    Fix \(S\in {\mathbb F} ^s\). The output distribution of the linear test \(L_S\) on a random codeword \(c\in C_j\), for a random index \(j\in {\mathcal J} \).

  2. 2.

    Let \(T\in {\mathbb F} ^s\) be a random element of the same weightFootnote 11 as S. The output distribution of the linear test \(L_T\) on a random codeword \(c\in C\).

Based on this observation, we can calculate the bias of the family of our codes. Note that there are a total of \({ \left( {\begin{matrix} {s}\\ {w} \end{matrix}}\right) } (q-1)^w\) elements in \({\mathbb F} ^s\) that have weight w. Let \(A_w\) denote the number of codewords in the dual of C that have weight w. Our family of codes \({\mathcal C}\) fools the linear test \(L_S\) with \(\rho =A_w \cdot { \left( {\begin{matrix} {s}\\ {w} \end{matrix}}\right) } ^{-1}(q-1)^{-w}\), where w is the weight of \(S\in {\mathbb F} ^s\).

We obtain precise asymptotic bounds on the weight enumerator \(A_w\) of the dual of the code C to estimate the bias \(\rho \), for \(w \in \{0,1,\cdots ,s\}\). This precise bound translates into higher production m, higher resilience t, and exponentially low simulation error \(\varepsilon \) of our correlation extractor. We remark that for our construction if C has a small dual-distance, then the bias cannot be small.

Remark. The performance of the code C supersedes the elementary Gilbert-Varshamov bound. These Algebraic Geometry codes are one of the few codes in mathematics and computer science where explicit constructions have significantly better quality than elementary randomized constructions. So, elementary randomization techniques are unlikely to produce any (qualitatively) better parameters for this approach, given that the estimations of the weight enumerator in this work are asymptotically optimal. Therefore, finding better randomized techniques to construct the family of multiplication friendly good codes that is also a family of small-bias distributions is the research direction that has the potential to reduce the bias. This reduction in the bias can further improve the production and leakage resilience of our correlation extractors.

2 Preliminaries

We denote random variables by capital letters, for example X, and the values taken by small letters, for example \(X = x\). For a positive integer n, we write [n] and \([-n]\) to denote the sets \(\{1, \cdots , n\}\) and \(\{-n, \cdots , -1\}\), respectively. Let \({\mathcal S} _n\) be the set of all permutations \(\pi : [n] \rightarrow [n]\). We consider the field \({\mathbb F} = {\mathbb G} {\mathbb F} \left[ q\right] \), where \(q = p^a\), for a positive integer a and prime p. For any \(c = (c_1, \cdots , c_{\eta }) \in {\mathbb F} ^{\eta }\), define the function \(\mathsf {wt} (c)\) as the cardinality of the set \(\{i : c_i \ne 0\}\). For any two \(x,y \in {\mathbb F} ^{\eta }\), we let \(x*y\) represent the point-wise product of x and y. That is, \(x*y = \left( x_1y_1, x_2y_2, \cdots , x_{\eta } y_{\eta }\right) \in {\mathbb F} ^{\eta }\). For a set Y, \(U_Y\) denotes the uniform distribution over the set Y, and denotes sampling y according to \(U_Y\). For any vector \(x \in {\mathbb F} ^\eta \) and a permutation \(\pi \in {\mathcal S} _{\eta }\), we define .

2.1 Correlation Extractors

We denote the functionality of 2-choose-1 bit Oblivious Transfer as \( \mathsf {OT} \) and Oblivious Linear-function Evaluation over a field \({\mathbb F} \) as \(\mathsf {OLE} ({\mathbb F})\). Also, we denote the Random Oblivious Transfer Correlation as and Random Oblivious Linear-function Evaluation Correlation over the field \({\mathbb F} \) as . When \({\mathbb F} = {\mathbb G} {\mathbb F} \left[ 2\right] \), we denote by .

Let \(\eta \) be such that \(2\eta \lg |{\mathbb F} | = n\). In this work, we consider the setting when Alice and Bob start with \(\eta \) samples of the correlation and the adversary performs t-bits of leakage. We give a secure protocol for extracting multiple secure \(\mathsf {OT} \)s in this hybrid. Below we define such an correlation extractor formally using initial correlations.

Leakage Model. We define our leakage model for correlations as follows:

  1. 1.

    \(\eta \)-ROLE correlation generation phase. Alice gets \(r_A = \{(a_i,b_i)\}_{i \in [\eta ]} \in {\mathbb F} ^{2\eta }\) and Bob gets \(r_B = \{(x_i, z_i)\}_{i \in [\eta ]}\in {\mathbb F} ^{2\eta }\) such that for all \(i \in [\eta ]\), \(a_i,b_i,x_i\) is uniformly random and \(z_i = a_ix_i+b_i\). Note that the size of secret share of each party is n bits.

  2. 2.

    Corruption and leakage phase. A semi-honest adversary corrupts either the sender and sends a leakage function \(L: {\mathbb F} ^{\eta } \rightarrow \{0,1\}^t\) and gets back \(L(x_{[\eta ]})\). Or, it corrupts the receiver and sends a leakage function \(L: {\mathbb F} ^{\eta } \rightarrow \{0,1\}^t\) and gets back \(L(a_{[\eta ]})\). Note that w.l.o.g. any leakage on the sender (resp., receiver) can be seen as a leakage on \(a_{[\eta ]}\) (resp., \(x_{[\eta ]}\)). We again emphasize that this leakage need not be on individual bits of the shares, but on the entire share, and thus can encode crucial global information.

We denote by \((R_A, R_B)\) the above correlated randomness and by \((R_A, R_B)^{[t]}\) its t-leaky version. Recall the definition for \((n,m,t,\varepsilon )\)-correlation extractor (see Definition 1, Sect. 1.1). Below, we give the correctness and security requirements.

The correctness condition says that the receiver’s output is correct in all m/2 instances of . The privacy requirement says the following: Let \((s_0^{^{{\left( i\right) }}}, s_1^{^{{\left( i\right) }}})\) and \((c^{^{{\left( i\right) }}}, z^{^{{\left( i\right) }}})\) be the output shares of Alice and Bob, respectively, in the \(i^{th}\) instance. Then a corrupt sender (resp., receiver) cannot distinguish between \(\{c^{^{{\left( i\right) }}}\}_{i \in [m/2]}\) (resp., \(\left\{ s^{^{{\left( i\right) }}}_{1-c^{^{{\left( i\right) }}}}\right\} _{i \in [m/2]}\)) and with advantage more than \(\varepsilon \). The leakage rate is defined as t/n and the production rate is defined as m/n.

2.2 Fourier Analysis over Fields

We give some basic Fourier definitions and properties over finite fields, following the conventions of [40]. To begin discussion of Fourier analysis, let \(\eta \) be any positive integer and let \({\mathbb F} \) be any finite field. We define the inner product of two complex-valued functions.

Definition 2

(Inner Product). Let \(f,g :{\mathbb F} ^\eta \rightarrow {\mathbb C} \). We define the inner product of f and g as

where \(\overline{g(x)}\) is the complex conjugate of g(x).

Next, we define general character functions for both \({\mathbb F} \) and \({\mathbb F} ^\eta \).

Definition 3

(General Character Functions). Let \(\psi :{\mathbb F} \rightarrow {\mathbb C} ^*\) be a group homomorphism from the additive group \({\mathbb F} \) to the multiplicative group \({\mathbb C} ^*\). Then we say that \(\psi \) is a character function of \({\mathbb F} \).

Let \( \chi :{\mathbb F} ^\eta \times {\mathbb F} ^\eta \rightarrow {\mathbb C} ^* \) be a bilinear, non-degenerate, and symmetric map defined as \( \chi (x,y) = \psi (x \cdot y) = \psi (\sum _{i} x_i y_i) \). Then, for any \(S \in {\mathbb F} ^\eta \), the function is a character function of \({\mathbb F} ^\eta \).

Given \(\chi \), we have the Fourier Transformation.

Definition 4

(Fourier Transformation). For any \(S \in {\mathbb F} ^\eta \), let \(f :{\mathbb F} ^\eta \rightarrow {\mathbb C} \) and \(\chi _S\) be a character function. We define the map \(\widehat{f} :{\mathbb F} ^\eta \rightarrow {\mathbb C} \) as . We say that \(\widehat{f}(S)\) is a Fourier Coefficient of f at S and the linear map \(f \mapsto \widehat{f}\) is the Fourier Transformation of f.

Note that this transformation is an invertible linear map. The Fourier inversion formula is given by the following lemma.

Lemma 1

(Fourier Inversion). For any function \(f :{\mathbb F} ^\eta \rightarrow {\mathbb C} \), we can write \(f(x) =\sum _{S \in {\mathbb F} ^\eta } \widehat{f}(S) \chi _S(x)\).

2.3 Distributions and Min-Entropy

For a probability distribution X over a sample space U, entropy of \(x \in X\) is defined as \( H_X(x) = -\lg \Pr [X = x] \). The min-entropy of X, represented by \( \mathbf {H_\infty } (X) \), is defined to be \( \min _{x \in \mathsf {Supp} (X)} H_X(x) \). The binary entropy function, denoted by \( \mathbf {h_2}(x) = -x \lg x - (1-x) \lg (1-x) \) for every \( x \in (0,1) \).

Given a joint distribution (XY) over sample space \( U \times V \), the marginal distribution Y is a distribution over sample space V such that, for any \( y \in V \), the probability assigned to y is \( \sum _{x \in U} \Pr [X = x, Y = y] \). The conditional distribution (X|y) represents the distribution over sample space U such that the probability of \( x \in U \) is \( \Pr [X= x\vert Y = y] \). The average min-entropy [16], represented by \( \mathbf {\widetilde{H}_\infty } (X|Y) \), is defined to be \( -\lg {\mathbb E} _{y \sim Y} [2^{-\mathbf {H_\infty } (X|y)}] \).

Imported Lemma 1

([16]). If \( \mathbf {H_\infty } (X) \ge k \) and L is an arbitrary \( \ell \)-bit leakage on X, then \( \mathbf {\widetilde{H}_\infty } (X|L) \geqslant k- \ell \).

Lemma 2

(Fourier Coefficients of a Min-Entropy Distribution). Let \( X :{\mathbb F} ^\eta \rightarrow {\mathbb R} \) be a min-entropy source such that \( \mathbf {H_\infty } (X) \geqslant k \). Then \( \sum _{S} |\widehat{X}(S)|^2 \leqslant \left| {{\mathbb F}}\right| ^{-\eta }\cdot 2^{-k}\).

2.4 Family of Small-Bias Distributions

Definition 5

(Bias of a Distribution). Let X be a distribution over \({\mathbb F} ^\eta \). Then the bias of X with respect to \(S \in {\mathbb F} ^\eta \) is defined as .

Dodis and Smith [17] defined small-bias distribution families for distributions over \({\{0,1\}} ^\eta \). We generalize it naturally for distributions over \({\mathbb F} ^\eta \).

Definition 6

(Small-bias distribution family). A family of distributions \( {\mathcal F} = \{ F_1, F_2, \cdots , F_k\}\) over sample space \( {\mathbb F} ^\eta \) is called a \(\rho ^2\)-biased family if for every non-zero vector \( S \in {\mathbb F} ^\eta \) following holds:

Following extraction lemma was proven in previous works over \({\{0,1\}} ^\eta \).

Imported Lemma 2

([2, 17, 20, 35]). Let \({\mathcal F} =\{F_1,\cdots ,F_\mu \}\) be \(\rho ^2\)-biased family of distributions over the sample space \({\{0,1\}} ^\eta \). Let (ML) be a joint distribution such that the marginal distribution M is over \({\{0,1\}} ^\eta \) and \(\mathbf {\widetilde{H}_\infty } (M|L)\ge m\). Then, the following holds: Let J be a uniform distribution over \([\mu ]\). Then,

$$\mathrm {SD}\left( {\left( F_J\oplus M,L,J\right) },{\left( U_{{\{0,1\}} ^\eta },L,J\right) }\right) \le \frac{\rho }{2}\left( \frac{2^\eta }{2^m}\right) ^{1/2}.$$

A natural generalization of above lemma for distributions over \({\mathbb F} ^\eta \) gives the following.

Theorem 5

(Min-entropy extraction via masking with small-bias distributions). Let \( {\mathcal F} = \{F_1,\cdots ,F_\mu \} \) be a \(\rho ^2\)-biased family of distributions over the sample space \({\mathbb F} ^{\eta }\) for field \({\mathbb F} \) of size q. Let (ML) be a joint distribution such that the marginal distribution M is over \({\mathbb F} ^{\eta }\) and \(\mathbf {\widetilde{H}_\infty } (M|L) \geqslant m\). Then, the following holds: Let J be a uniform distribution over \([\mu ]\). Then,

$$\begin{aligned} \mathrm {SD}\left( { (F_J\oplus M, L, J) },{ (U_{{\mathbb F} ^{\eta }}, L, J) }\right) \leqslant \frac{\rho }{2}\left( \frac{|{\mathbb F} |^{\eta }}{2^m}\right) ^{1/2}. \end{aligned}$$

We provide the proof of this result in the full version of our work [8].

2.5 Distribution over Linear Codes

Let \(C= [\eta ,\kappa ,d,d^\perp , d^{{\left( 2\right) }} ]_{\mathbb F} \) be a linear code over \({\mathbb F} \) with generator matrix \(G \in {\mathbb F} ^{\kappa \times \eta }\). We also use C to denote the uniform distribution over codewords generated by G. For any \(\pi \in {\mathcal S} _\eta \), define \(G_\pi = \pi (G)\) as the generator matrix obtained by permuting the columns of G under \(\pi \).

The dual code of C, represented by \( C^\perp \), is the set of all codewords that are orthogonal to every codeword in C. That is, for any \(c^\perp \in C^\perp \), it holds that \(\langle c, c^\perp \rangle = 0\) for all \(c \in C\). Let \( H \in {\mathbb F} ^{(\eta - \kappa ) \times \eta }\) be a generator matrix of \( C^\perp \). The distance of \(C^\perp \) is \(d^\perp \).

The Schur product code of C, represented by \(C^{{\left( 2\right) }} \), is the span of all codewords obtained as a Schur product of codewords in C. That is, , where \(c*c'\) denotes the coordinate-wise product of c and \(c'\). The distance of \(C^{{\left( 2\right) }} \) is \(d^{{\left( 2\right) }} \).

3 Family of Small-Bias Distributions with Erasure Recovery

In this section, we give our construction of the family of small-bias distributions \(\{C_j\}_{j \in {\mathcal J}}\) such that each \(C_j\) is a linear code and \(C_j * C_j\) supports erasure recovery. Recall that \(C_j*C_j\) is the linear span of all \(c*c'\) such that \(c,c' \in C_j\). We formally define the requirements for our family of distributions in Property 1.

Property 1

A family of linear code distributions \({\mathcal C} = \{C_j :j \in {\mathcal J} \}\) over \({\mathbb F} ^{\eta ^*}\) satisfy this property with parameters \(\delta \) and \(\gamma \) if the following conditions hold.

  1. 1.

    \(2^{-\delta }\)-bias family of distributions. For any \(0^{\eta ^*} \ne S \in {\mathbb F} ^{\eta ^*}\), \({\mathbb E} ~[\mathsf {Bias} _S(C_j)^2] \leqslant 2^{-\delta }\), where the expectation is taken over .

  2. 2.

    \(\gamma \)-erasure recovery in Schur Product. For all \(j \in {\mathcal J} \), the Schur product code of \(C_j\), that is \(C_j * C_j = C_j^{{\left( 2\right) }} \), supports the erasure recovery of the first \(\gamma \) coordinates. Moreover, the first \(\gamma \) coordinates of \(C_j\) and \(C_j^{{\left( 2\right) }} \) are linearly independent of each other.

3.1 Our Construction

Figure 3 presents our construction of a family of linear codes which satisfies Property 1 and Theorem 6 gives the parameters for our construction.

Fig. 3.
figure 3

Our construction of a family of small bias linear code distributions.

At a high level, the linear code C is a suitable algebraic geometric code over constant size field \({\mathbb F} \) of block length \(\eta ^* = \gamma + \eta \). The parameters of the code C are chosen such that C is a \(2^{-\delta }\)-biased family of distributions under our “twist-then-permute” operation, and \(C * C\) supports erasure recovery of any \(\gamma \) coordinates. The precise calculation of the parameters of the code C can be found in the full version of our work [8]. Our family of linear codes satisfies the following theorem.

Theorem 6

The family of linear code distributions \(\{C_{\pi , \lambda }:\pi \in {\mathcal S} _{\eta ^*},\; \lambda \in ({\mathbb F} ^*)^{\eta ^*}\}\) over \({\mathbb F} ^{\eta ^*}\) given in Fig. 3 satisfies Property 1 for any \(\gamma < d^{{\left( 2\right) }} \), where \(d^{{\left( 2\right) }} \) is the distance of the Schur product code of C, and , where \(\mathbf {h_2}\) denotes the binary entropy function.

Proof

We first prove erasure recovery followed by the small-bias property.

\(\gamma \)-erasure Recovery in Schur Product code. First we note that permuting or re-ordering the columns of a generator matrix does not change its distance, distance of the Schur product, or its capability of erasure recovery (as long as we know the mapping of new columns vis-à-vis old columns). Let \( {\mathcal I} _\gamma = \{i_1,\cdots , i_\gamma \}\) be the indices of the erased coordinates of codeword in \(C_{\pi ,\lambda }^{{\left( 2\right) }} \). Hence to show erasure recovery of the coordinates \({\mathcal I} _{\gamma }\) of a codeword of \(C^{{\left( 2\right) }} _{\pi , \lambda }\), it suffices to show erasure recovery of the \(\gamma \) erased coordinates \({\mathcal J} _\gamma = \{j_1,\cdots , j_\gamma \}\) of a codeword of \(C^{{\left( 2\right) }} _\lambda \), where \(C_\lambda \) is the uniform codespace generated by \(G_\lambda \), and \(\pi (j_k) = i_k\), \(\forall k \in [\gamma ]\).

Note that since \(\gamma < d^{{\left( 2\right) }} \), the code \(C^{{\left( 2\right) }} \) supports erasure recovery of any \(\gamma \) coordinates. Thus it suffices to show that this implies that \(C_\lambda ^{{\left( 2\right) }} \) also supports the erasure recovery of any \(\gamma \) coordinates. Note that since \(\lambda \in ({\mathbb F} ^*)^{\eta ^*}\), multiplication of the columns of G according to \(\lambda \) does not change its distance or distance of the Schur product. Then we do the following to perform erasure recovery of \(\gamma \) coordinates in \(C_\lambda ^{{\left( 2\right) }} \). Let \(c^{{\left( 2\right) }} \in C_\lambda ^{{\left( 2\right) }} \) be a codeword with erased coordinates \({\mathcal J} _\gamma = \{j_1,\cdots , j_\gamma \}\), and let \({\mathcal J} _\eta = \{j'_1,\cdots , j'_\eta \}\) be the coordinates of \(c^{{\left( 2\right) }} \) that have not been erased. For every \(j \in {\mathcal J} _\eta \), compute \(c_j = (\lambda _j^{\text {-}1})^2c^{{\left( 2\right) }} _j\). Then the vector \((c_j)_{j \in {\mathcal J} _\eta }\) is a codeword of \(C^{{\left( 2\right) }} \) with coordinates \(c_i\) erased for \(i \in {\mathcal J} _\gamma \). Since \(C^{{\left( 2\right) }} \) has \(\gamma \) erasure recovery, we can recover the \(c_i\) for \(i \in {\mathcal J} _\gamma \). Once recovered, for every \(i \in {\mathcal J} _\gamma \), compute \(c^{{\left( 2\right) }} _i = \lambda _i^2c_i\). This produces the \(\gamma \) erased coordinates of \(c^{{\left( 2\right) }} \) in \(C_{\lambda }^{{\left( 2\right) }} \). Finally, one can map the \(c^{{\left( 2\right) }} _i\) for \(i \in {\mathcal J} _\gamma \) to the coordinates \({\mathcal I} _\gamma \) using \(\pi \), recovering the erasures in \(C_{\pi ,\lambda }^{{\left( 2\right) }} \).

\(2^{-\delta }\)-bias Family of Distributions. Let \(C, C_{\lambda }, C_{\pi ,\lambda }\) be the uniform distribution over the linear codes generated by \(G, G_\lambda , G_{\pi ,\lambda }\), respectively. Recall that \(d^\perp \) is the dual distance for C. Note that \(C_\lambda , C_{\pi ,\lambda }\) have dual-distance \(d^\perp \) as well. Let \(\eta ^* = \eta + \gamma .\) Since \(\mathsf {Bias} _S(C_{\pi ,\lambda }) = |{\mathbb F} |^{\eta ^*}|\widehat{C_{\pi ,\lambda }}(S)|\) for every \(S \in {\mathbb F} ^{\eta ^*}\), it suffices to show that

$$\begin{aligned} \underset{\pi , \lambda }{{\mathbb E}}\left[ \widehat{C_{\pi ,\lambda }}(S)^2\right] \leqslant \frac{1}{|{\mathbb F} |^{2{\eta ^*}} \cdot 2^\delta }. \end{aligned}$$

To begin, first recall the definition of \(C_{\pi , \lambda }\):

Next, given any \(S \in {\mathbb F} ^{\eta ^*}\), define Note that \(\mathcal {S} (S)\) is equivalently characterized as

$$\begin{aligned} \mathcal {S} (S) = \{T = (T_1, \cdots , T_{\eta ^*}) \in {\mathbb F} ^{\eta ^*} \;|\; \mathsf {wt} (T) = \mathsf {wt} (S)\}. \end{aligned}$$

It is easy to see that \(|\mathcal {S} (S)| = \left( {\begin{array}{c}\eta ^*\\ w_0\end{array}}\right) (q-1)^{{\eta ^*}-w_0}\), where \(w_0 = \eta ^* - \mathsf {wt} (S)\); i.e., \(w_0\) is the number of zeros in S. We prove the following claim.

Claim 1

For any \(S \in {\mathbb F} ^{\eta ^*}\), we have \(\widehat{C_{\pi ,\lambda }}(S) = \widehat{C}(\pi ^{\text {-}1} (S)*\lambda )\).

Proof

Notice that by definition for any \(x \in C_{\pi ,\lambda }\), we have \(C_{\pi ,\lambda }(x) = C(c)\) since \(x = \pi (\lambda _1c_1,\cdots ,\lambda _{\eta ^*} c_{\eta ^*})\) for \(c \in C\). This is equivalently stated as \(C_{\pi ,\lambda }(\pi (c * \lambda )) = C(c)\). For \(x = \pi (\lambda _1y_1,\cdots ,\lambda _{\eta ^*} y_{\eta ^*}) \in {\mathbb F} ^{\eta ^*}\) and any \(S \in {\mathbb F} ^{\eta ^*}\), we have

$$\begin{aligned} S\cdot x = \sum \limits _{i=1}^{\eta ^*} S_i x_i = \sum \limits _{i=1}^{\eta ^*} S_i(\lambda _{\pi (i)} y_{\pi (i)}) = \sum \limits _{i=1}^{\eta ^*} (S_{\pi ^{\text {-}1} (i)})\lambda _i y_i = (\pi ^{\text {-}1} (S) * \lambda )\cdot y. \end{aligned}$$

where \(S\cdot x\) is the vector dot product. By definition of \(\chi _S(x)\), this implies \(\chi _S(x) = \chi _y(\pi ^{\text {-}1} (S)* \lambda )\). Using these two facts and working directly from the definition of Fourier Transform, we have

$$\begin{aligned} \widehat{C_{\pi ,\lambda }}(S)&= \frac{1}{|{\mathbb F} |^{\eta ^*}}\sum \limits _{x\in {\mathbb F} ^{\eta ^*}} C_{\pi ,\lambda }(x) \overline{\chi _S(x)}\\&=\frac{1}{|{\mathbb F} |^{\eta ^*}}\sum \limits _{c\in {\mathbb F} ^{\eta ^*}} C_{\pi ,\lambda }(\pi (\lambda _1c_1,\cdots ,\lambda _{\eta ^*} c_{\eta ^*})) \overline{\chi _S(\pi (\lambda _1c_1,\cdots ,\lambda _{\eta ^*} c_{\eta ^*}))}\\&= \frac{1}{|{\mathbb F} |^{\eta ^*}}\sum \limits _{c\in {\mathbb F} ^{\eta ^*}} C(c) \overline{\chi _c(\pi ^{\text {-}1} (S) * \lambda )} = \widehat{C}(\pi ^{\text {-}1} (S) * \lambda ). \end{aligned}$$

This proves Claim 1.    \(\square \)

It is easy to see that \(\mathsf {wt} (\pi ^{\text {-}1}(S) * \lambda ) = \mathsf {wt} (S)\), so \((\pi ^{\text {-}1}(S) * \lambda ) = T \in \mathcal {S} (S)\). From this fact and Claim 1, we prove the following claim.

Claim 2

For any \(S \in {\mathbb F} ^n\), .

Proof

Suppose we have codeword \(x \in C_{\pi ,\lambda }\) such that \(\pi (\lambda _1c_1,\cdots , \lambda _\eta ^* c_\eta ^*) = x\), for some codeword \(c\in C\). Let \(\{i_1, \cdots , i_{w_0}\}\) be the set of indices of 0 in c; that is, \(c_j = 0\) for all \(j \in \{i_1, \cdots , i_{w_0}\}\). Then for any permutation \(\pi \), the set \(\{\pi (i_0),\cdots , \pi (i_{w_0})\}\) is the set of zero indices in x. Note also that for any index \(j \not \in \{\pi (i_0),\cdots , \pi (i_{w_0})\}\), we have \(x_j \ne 0\). If this was not the case, then we have \(x_j = c_{\pi ^{\text {-}1}(j)} \lambda _{\pi ^{\text {-}1}(j)} = 0\). Since \(j \not \in \{\pi (i_0),\cdots , \pi (i_{w_0})\}\), this implies \(\pi ^{\text {-}1} (j) \not \in \{i_0,\cdots , i_{w_0}\}\), which further implies that \(c_{\pi ^{\text {-}1} (j)} \ne 0\). This is a contradiction since \(\lambda \in ({\mathbb F} ^*)^{\eta ^*}\). Thus any permutation \(\pi \) must map the zeros of S to the zeros of c, and there are \(w_0!(\eta ^*-w_0)!\) such permutations. Notice now that for any \(c_k = 0\), \(\lambda _k\) can take any value in \({\mathbb F} ^*\), so we have \((q-1)^{w_0}\) such choices. Furthermore, if \(c_k \ne 0\) and \(\lambda _kc_k = x_{\pi ^{\text {-}1}(k)} \ne 0\), then there is exactly one value \(\lambda _k \in {\mathbb F} ^*\) which satisfies this equation. Putting it all together, we have

where the first line of equality follows from Claim 1. This proves Claim 2.    \(\square \)

With Claim 2, we now are interested in finding \(\delta \) such that for \(0^{\eta ^*} \ne S \in {\mathbb F} ^{\eta ^*}\)

We note that since C is a linear code, C has non-zero Fourier coefficients only at codewords in \(C^\perp \).

Claim 3

For all \(S \in {\mathbb F} ^{\eta ^*}\), \(\widehat{C}(S) = {\left\{ \begin{array}{ll} \dfrac{1}{|{\mathbb F} |^{\eta ^*}} &{} S \in C^\perp \\ 0 &{} \text {otherwise.} \end{array}\right. }\)

Let \(A_w = |C^\perp \cap \mathcal {S} (S) |\), where \(w = {\eta ^*} -w_0= \mathsf {wt} (S)\). Intuitively, \(A_w\) is the number of codewords in \(C^\perp \) with weight w. Then from Claim 3, we have

Now, our goal is to upper bound \(A_w\). Towards this goal, the weight enumerator for the code \(C^\perp \) is defined as the following polynomial:

$$\begin{aligned} W_{C^\perp }(x) = \sum _{c\in C^\perp } x^{{\eta ^*} - \mathsf {wt} \left( {c}\right) }. \end{aligned}$$

This polynomial can equivalently be written in the following manner:

$$\begin{aligned} W_{C^\perp }(x) = \sum _{w\in \{0,\cdots ,{\eta ^*}\}} A_w x^{{\eta ^*} - w}. \end{aligned}$$

Define \(a={\eta ^*} - d^\perp \).

Imported Theorem 1

(Exercise 1.1.15 from [41]). We have the relation

$$\begin{aligned} W_{C^\perp }(x) = x^{\eta ^*} + \sum _{i=0}^a B_i(x-1)^i, \text {where} \end{aligned}$$
$$\begin{aligned} B_i = \sum _{j=\eta ^*-a}^{{\eta ^*}-i}{ \left( {\begin{matrix} {\eta ^*-j}\\ {i} \end{matrix}}\right) } A_j\ge 0 \qquad A_i=\sum _{j={\eta ^*}-i}^a(-1)^{{\eta ^*}+i+j}{ \left( {\begin{matrix} {j}\\ {\eta ^*-i} \end{matrix}}\right) } B_j. \end{aligned}$$

For weight \(w\in \left\{ d^\perp ,\cdots ,{\eta ^*}\right\} \), we use the following expression to estimate \(A_w\).

$$A_w = { \left( {\begin{matrix} {{\eta ^*}-w}\\ {{\eta ^*}-w} \end{matrix}}\right) } B_{{\eta ^*}-w} - { \left( {\begin{matrix} {{\eta ^*}-w+1}\\ {{\eta ^*}-w} \end{matrix}}\right) } B_{{\eta ^*}-w+1} + \cdots \pm { \left( {\begin{matrix} {{\eta ^*} - d^\perp }\\ {{\eta ^*}-w} \end{matrix}}\right) } B_{{\eta ^*} - d^\perp }$$

Since we are interested in the asymptotic behavior (and not the exact value) of \(A_w\), we note that \(\lg A_w \sim \lg \varGamma (w) \), where

$$\varGamma (w) = \max \left\{ { \left( {\begin{matrix} {{\eta ^*}-w}\\ {{\eta ^*}-w} \end{matrix}}\right) } B_{{\eta ^*}-w},{ \left( {\begin{matrix} {{\eta ^*}-w+1}\\ {{\eta ^*}-w} \end{matrix}}\right) } B_{{\eta ^*}-w+1},\cdots ,{ \left( {\begin{matrix} {{\eta ^*}-d^\perp }\\ {{\eta ^*}-w} \end{matrix}}\right) } B_{{\eta ^*} - d^\perp }\right\} .$$

Thus, it suffices to compute \(\varGamma (w)\) for every w, and then the bias. We present this precise asymptotic calculation in the full version of our work [8]. This calculation yields

$$\begin{aligned} \delta = \left( d^\perp + \frac{\eta ^*}{\sqrt{q}-1} - 1\right) \left( \lg (q-1) - \mathbf {h_2}\left( \frac{1}{q+1}\right) \right) - \frac{\eta ^*}{\sqrt{q}-1}\lg q, \end{aligned}$$

which completes the proof.    \(\square \)

4 Construction of Correlation Extractor

Our main sub-protocol for Theorem 3 takes as the initial correlation and produces secure . Towards this, we define a -to- extractor formally below.

Definition 7

(\((\eta ,\gamma ,t,\varepsilon )\)--to- extractor). Let be correlated randomness. An \((\eta ,\gamma ,t,\varepsilon )\)--to- extractor is a two-party interactive protocol in the \((R_A, R_B)^{[t]}\)-hybrid that securely implements the functionality against information-theoretic semi-honest adversaries with \(\varepsilon \) simulation error.

Let \((u_i, v_i) \in {\mathbb F} ^2\) and \((r_i, z_i) \in {\mathbb F} ^2\) be the shares of Alice and Bob, respectively, in the \(i^{th}\) output instance. The correctness condition says that the receiver’s output is correct in all \(\gamma \) instances of , i.e., \(z_i = u_ir_i+v_i\) for all \(i \in [\gamma ]\). The privacy requirement says the following: A corrupt sender (resp., receiver) cannot distinguish between \(\{r_i\}_{i \in [\gamma ]}\) (resp., \(\{u_i\}_{i \in [\gamma ]}\)) and \(U_{{\mathbb F} ^\gamma }\) with advantage more than \(\varepsilon \).

In Sect. 4.1, we give our construction for Theorem 3. Later, in Sect. 4.3, we build on this to give our construction for Theorem 2.

4.1 Protocol for correlation extractor

As already mentioned in Sect. 1.4, to prove Theorem 3, our main building block will be \((\eta , \gamma , t, \varepsilon )\)--to- extractor (see Definition 7). That is, the parties start with \(\eta \) samples of the correlation such that size of each party’s share is \(n = 2\eta \log |{\mathbb F} |\) bits. The adversarial party gets t bits of leakage. The protocol produces with simulation error \(\varepsilon \). We give the formal description of the protocol, inspired by the Massey secret sharing scheme [32], in Fig. 4. Note that our protocol is round-optimal and uses a family of distributions \({\mathcal C} = \{C_j \}_{j \in {\mathcal J}}\) that satisfies Property 1 with parameters \(\delta \) and \(\gamma \).

Fig. 4.
figure 4

\(\mathsf {ROLE} ({\mathbb F})\)-to-\(\mathsf {ROLE} ({\mathbb F})\) extractor protocol.

Next, we use the embedding technique from [9] to embed s in each fresh obtained from above protocol. For example, we can embed two s into one . Using this we get production \(m = 2\sigma \gamma \), i.e., we get \(m/2 = \sigma \gamma \) secure s. We note that the protocol from [9] is round-optimal, achieves perfect security and composes in parallel with our protocol in Fig. 4. Hence, we maintain round-optimality (see Sect. 4.2).

Correctness of Fig. 4. The following lemma characterizes the correctness of the scheme presented in Fig. 4.

Lemma 3

(Correctness). If the family of distributions \({\mathcal C} = \{C_j \}_{j \in {\mathcal J}}\) satisfies Property 1, i.e., erasure recovery of first \(\gamma \) coordinates in Schur product, then for all \(i \in \{-\gamma , \cdots , -1\}\), it holds that \(t_i = u_ir_i+v_i\).

Proof

First, we prove the following claim.

Claim 4

For all \(i \in [\eta ]\), it holds that \(t_i = u_ir_i+v_i\).

This claim follows from the following derivation.

$$\begin{aligned} t_i&= \alpha _ir_i + \beta _i - z_i = (u_i - a_i)r_i + (a_im_i + b_i + v_i) - z_i\\&= u_ir_i - a_ir_i + a_i(r_i + x_i) + b_i + v_i\\&= u_ir_i + a_ix_i + b_i + v_i - z_i\\&= u_ir_i + v_i \end{aligned}$$

From the above claim, we have that \(t_{[\eta ]} = u_{[\eta ]}*r_{[\eta ]} + v_{[\eta ]}\). From the protocol, we have that \(u, r \in C_j\) and \(v \in C_j^{{\left( 2\right) }} \). Consider \(\tilde{t} = u*r + v \in C_j^{{\left( 2\right) }} \). Note that \(t_i = \tilde{t}_i\) for all \(i \in [\eta ]\). Hence, when client B performs erasure recovery on \(t_{[\eta ]}\) for a codeword in \(C_j^{{\left( 2\right) }} \), it would get \(\tilde{t}_{[-\gamma ]}\). This follows from erasure recovery guarantee for first \(\gamma \) coordinates by Property 1.    \(\square \)

Security of Fig. 4. To argue the security, we prove that the protocol is a secure implementation of functionality against an information-theoretic semi-honest adversary that corrupts either the sender or the receiver and leaks at most t-bits from the secret share of the honest party at the beginning of the protocol. At a high level, we prove the security of our protocol by reducing it exactly to our unpredictability lemma.

Lemma 4

(Unpredictability Lemma). Let \({\mathcal C} = \{C_j:j\in {\mathcal J} \}\) be a \(2^{-\delta }\)-biased family of linear code distributions over \({\mathbb F} ^{\eta ^*}\), where \(\eta ^* = \gamma + \eta \). Consider the following game between an honest challenger \({\mathcal H} \) and an adversary \({\mathcal A} \):

figure a

The adversary \({\mathcal A}\) wins the game if \(b = \widetilde{b}\). For any \({\mathcal A}\), the advantage of the adversary is \(\varepsilon \le \frac{1}{2}\sqrt{\frac{|{\mathbb F} |^\gamma 2^t}{2^\delta }}\).

Proof

Let \(M_{[\eta ]} \) be the distribution corresponding to \(m_{[\eta ]} \). Consider \(M'_{[\eta +\gamma ]} = (0^\gamma , M_{[\eta ]})\). By Imported Lemma 1, \(\mathbf {\widetilde{H}_\infty } (M'|{\mathcal L} (M')) \ge \eta \log |{\mathbb F} | - t\). Recall that \({\mathcal C} = \{C_j : j \in {\mathcal J} \}\) is a \(2^{-\delta }\)-bias family of distributions over \({\mathbb F} ^{\eta +\gamma }\). Then, by Theorem 5, we have the following as desired:

$$\begin{aligned} \mathrm {SD}\left( {(C_{\mathcal J} \oplus M', {\mathcal L} (M'), {\mathcal J})},{ (U_{{\mathbb F} ^{\eta +\gamma }}, {\mathcal L} (M'), {\mathcal J})}\right) \le \frac{1}{2}\left( \frac{2^t\cdot |{\mathbb F} |^{\eta +\gamma }}{2^\delta \cdot |{\mathbb F} |^\eta }\right) ^{\frac{1}{2}} = \frac{1}{2}\sqrt{ \frac{|{\mathbb F} |^\gamma 2^t}{2^\delta } }. \end{aligned}$$

   \(\square \)

We note that this lemma crucially relies on a family of small-bias distributions. Next, we prove the following security lemma.

Lemma 5

The simulation error of our protocol is \(\varepsilon \le \sqrt{\frac{|{\mathbb F} |^\gamma 2^t}{2^\delta }}\), where t is the number of bits of leakage, and \(\gamma \) and \(\delta \) are the parameters in Property 1 for the family of distributions \({\mathcal C} \).

Proof

We first prove Bob privacy followed by Alice privacy.

Bob Privacy. In order to prove privacy of client B against a semi-honest client A, it suffices to show that the adversary cannot distinguish between Bob’s secret values \((r_{-\gamma }, \cdots , r_{-1})\) and \(U_{{\mathbb F} ^\gamma }\). We show that the statistical distance of \((r_{-\gamma }, \cdots , r_{-1})\) and \(U_{{\mathbb F} ^\gamma }\) given the view of the adversary is at most \(\varepsilon \), where \(\varepsilon \) is defined above.

We observe that client B’s privacy reduces directly to our unpredictability lemma (Lemma 4) for the following variables. Let \(X_{[\eta ]}\) be the random variable denoting B’s input in the initial correlations. Then, \(X_{[\eta ]}\) is uniform over \({\mathbb F} ^\eta \). Note that the adversary gets \(L = {\mathcal L} (X_{[\eta ]})\) that is at most t-bits of leakage. Next, the honest client B picks and a random \(r = (r_{-\gamma }, \cdots , r_{-1}, r_{1},\cdots , r_{\eta }) \in C_j\). Client B sends \(m_{[\eta ]} = r_{[\eta ]} + x_{[\eta ]}\). This is exactly the game between the honest challenger and an semi-honest adversary in the unpredictability lemma (see Lemma 4). Hence, the adversary cannot distinguish between \(r_{[-\gamma ]}\) and \(U_{{\mathbb F} ^\gamma }\) with probability more than \(\varepsilon \).

Alice Privacy. In order to prove privacy of client A against a semi-honest client B, it suffices to show that the adversary cannot distinguish between Alice’s secret values \((u_{-\gamma }, \cdots , u_{-1})\) and \(U_{{\mathbb F} ^\gamma }\). We show that the statistical distance of \((u_{-\gamma }, \cdots , u_{-1})\) and \(U_{{\mathbb F} ^\gamma }\) given the view of the adversary is at most \(\varepsilon \), where \(\varepsilon \) is defined above by reducing to our unpredictability lemma (see Lemma 4).

Fig. 5.
figure 5

Simulator for Alice Privacy.

Let \(A_{[\eta ]}\) denote the random variable corresponding to the client A’s input \(a_{[\eta ]}\) in the initial correlations. Then, without loss of generality, the adversary receives t-bits of leakage \({\mathcal L} (A_{[\eta ]})\). We show a formal reduction to Lemma 4 in Fig. 5. Given an adversary \({\mathcal A} \) who can distinguish between \((u_{-\gamma }, \cdots , u_{-1})\) and \(U_{{\mathbb F} ^\gamma }\), we construct an adversary \({\mathcal A} '\) against an honest challenger \({\mathcal H} \) of Lemma 4 with identical advantage. It is easy to see that this reduction is perfect. The only differences in the simulator from the actual protocol are as follows. In the simulation, the index j of the distribution is picked by the honest challenger \({\mathcal H} \) instead of client B. This is identical because client B is a semi-honest adversary.

Also, the simulator \({\mathcal A} '\) generates \(\beta _{[\eta ]}\) slightly differently. We claim that the distribution of \(\beta _{[\eta ]}\) in the simulation is identical to that of real protocol.

This holds by correctness of the protocol: \(t_{[\eta ]} = u_{[\eta ]}*r_{[\eta ]}+v_{[\eta ]} = (\alpha _{[\eta ]} * r_{[\eta ]}) + \beta _{[\eta ]} - z_{[\eta ]} \). Hence, \(\beta _{[\eta ]} = (u_{[\eta ]}*r_{[\eta ]}+v_{[\eta ]}) - (\alpha _{[\eta ]} * r_{[\eta ]}) + z_{[\eta ]} = w_{[\eta ]} -(\alpha _{[\eta ]} * r_{[\eta ]}) + z_{[\eta ]} \), where \(w_{[-\gamma , \eta ]}\) is chosen as a random codeword in \(C_j^{{\left( 2\right) }} \). This holds because in the real protocol \(v_{[-\gamma , \eta ]}\) is chosen as a random codeword in \(C_j^{{\left( 2\right) }} \) and \(u_{[-\gamma ,\eta ]}*r_{[-\gamma , \eta ]} \in C_j^{{\left( 2\right) }} \). Here, we denote by \([-\gamma , \eta ]\) the set \(\{-\gamma , \cdots , -1, 1, \cdots , \eta \}\).    \(\square \)

4.2 OT Embedding

The second conceptual block is the embedding protocol from [9], referred to as the BMN embedding protocol, that embeds a constant number of samples into one sample of , where \({\mathbb F}\) is a finite field of characteristic 2. The BMN embedding protocol is a two-message perfectly semi-honest secure protocol. For example, asymptotically, [9] embeds \((s)^{1-o(1)}\) samples of into one sample of the correlation. However, for reasonable values of s, say for \(s\le 2^{50}\), a recursive embedding embeds \(s^{\log 10/\log 32}\) samples of into one sample of the correlation, and this embedding is more efficient than the asymptotically good one. Below, we show that this protocol composes in parallel with our protocol in Fig. 4 to give our overall round optimal protocol for \((n,m,t,\varepsilon )\)-correlation extractor for correlation satisfying Theorem 3.

We note that the BMN embedding protocol satisfies the following additional properties. (1) The first message is sent by client B, and (2) this message depends only on the first share of client B in (this refers to \(r_i\) in Fig. 4) and does not depend on the second share (this refers to \(t_i\) in Fig. 4). With these properties, the BMN embedding protocol can be run in parallel with the protocol in Fig. 4. Also, since the BMN protocol satisfies perfect correctness and perfect security, to prove overall security, it suffices to prove the correctness and security of our protocol in Fig. 4. This holds because we are in the semi-honest information theoretic setting.

4.3 Protocol for Extractor (Theorem 2)

In this section, we describe a protocol to construct using , that is the starting point of our protocol in Sect. 4.1. This would prove Theorem 2. Here, . Recall that and are equivalent.

One of the several fascinating applications of algebraic function fields pioneered by the seminal work of Chudnovsky and Chudnovsky [14], is the application to efficiently multiply over an extension field using multiplications over the base field. For example, 6 multiplications over \({\mathbb G} {\mathbb F} \left[ 2\right] \) suffice to perform one multiplication over \({\mathbb G} {\mathbb F} \left[ 2^3\right] \), or 15 multiplications over \({\mathbb G} {\mathbb F} \left[ 2\right] \) suffice for one multiplication over \({\mathbb G} {\mathbb F} \left[ 2^6\right] \) (cf., Table 1 in [12]).

Our first step of the correlation extractor for uses these efficient multiplication algorithms to (perfectly and securely) implement , where \({\mathbb F} ={\mathbb G} {\mathbb F} (2^\alpha )\) is a finite field with characteristic 2.

We start by describing a protocol for realizing one using , i.e., \(\ell \) independent samples of (in the absence of leakage) in Fig. 6. Our protocol implements, for instance, one sample of correlation using 6 samples from the correlation in two rounds. Our protocol uses a multiplication friendly code \({\mathcal D} \) over \({\{0,1\}} ^\ell \) and encodes messages in \({\mathbb F} \). That is, \({\mathcal D} *{\mathcal D} = {\mathcal D} ^{{\left( 2\right) }} \subset {\{0,1\}} ^\ell \) is also a code for \({\mathbb F} \). Later, we show how to extend this to the leakage setting.

Fig. 6.
figure 6

Perfectly secure protocol for in hybrid.

Security Guarantee. It is easy to see that the protocol in Fig. 6 is a perfectly secure realization of in the -hybrid against a semi-honest adversary using the fact that \({\mathcal D} \) is a multiplication friendly code for \({\mathbb F} \). Moreover, [26] proved the following useful lemma to argue t-leaky realization of if the perfect oracle call to is replaced by a t-leaky oracle.

Imported Lemma 3

([26]). Let \(\pi \) be a perfectly secure (resp., statistically \(\varepsilon \) secure) realization of f in the g-hybrid model, where \(\pi \) makes a single call to g. Then, \(\pi \) is also a perfectly secure (resp., statistically \(\varepsilon \) secure) realization of \(f^{[t]}\) in the \(g^{[t]}\)-hybrid model.

Using the above lemma, we get that the protocol in Fig. 6 is a perfect realization of in -hybrid. Finally, by running the protocol of Fig. 6 in parallel for \(\eta \) samples of and using Imported Lemma 3, we get a perfectly secure protocol for in -hybrid.

Round Optimality. To realize the round-optimality in Theorem 2, we can run the protocols in Figs. 6 and 4 in parallel. We note that the first messages of protocols in Figs. 6 and 4 can be sent together. This is because the first message of client B in protocol of Fig. 4 is independent of the second message in Fig. 6. The security holds because we are in the semi-honest information theoretic setting. Hence, overall round complexity is still 2.

5 Parameter Comparison

5.1 Correlation Extractor from (Theorem 3)

In this section, we compare our correlation extractor for correlation, where \({\mathbb F} \) is a constant size field, with the BMN correlation extractor [9].

BMN Correlation Extractor [9]. The BMN correlation extractor emphasizes high resilience while achieving multiple s as output. Roughly, they show the following. If parties start with the \(\mathsf {IP} \big ({{\mathbb G} {\mathbb F} \left[ 2^{\varDelta n}\right] ^{1/\varDelta }}\big ) \) correlation, then they (roughly) achieve \(\frac{1}{2}-\varDelta \) fractional resilience with production that depends on \((\varDelta n)\). Here, \(\varDelta \) has to be the inverse of an even natural number \(\geqslant 4\).

In particular, the \(\mathsf {IP} \big ({{\mathbb G} {\mathbb F} \left[ 2^{n/4}\right] ^4}\big ) \) correlationFootnote 12 achieves the highest production using the BMN correlation extractor. The resilience of this correlation is \((\frac{1}{4}-g)\), where \(g\in (0,1/4]\) is a positive constant. Then the BMN correlation extractor produces at most \((n/4)^{\log 10/\log 38}\approx (n/4)^{0.633}\) fresh samples from the correlation as output when \(n\le 2^{50}\). This implies that the production is \(m\approx 2\cdot (n/4)^{0.633}\), because each sample produces private shares that are two-bits long. For \(n=10^3\), the production is \(m\leqslant 66\), for \(n=10^6\) the production is \(m\leqslant 5,223\), and for \(n=10^9\) the production is \(m\leqslant 413,913\). We emphasize that the BMN extractor cannot increase its production any further by sacrificing its leakage resilience and going below .

Our Correlation Extractor for . We shall use \({\mathbb F}\) such that \(q=\left| {{\mathbb F}}\right| \) is an even power of 2. For the suitable Algebraic Geometry codes [19] to exist, we need \(q\geqslant 49\). Since, the last step of our construction uses the OT embedding technique introduced by BMN [9], we need to consider only the smallest fields that allow a particular number of OT embeddings. Based on this observation, for fractional resilience \(\beta =(t/n)=1\%\), Fig. 7 presents the achievable production rate \(\alpha =(m/n)\). Note that the Algebraic Geometry codes become better with increasing q, but the BMN OT embedding becomes worse. So, the optimum \(\alpha =16.32\%\) is achieved for \({\mathbb F} ={\mathbb G} {\mathbb F} \left[ 2^{14}\right] \). For \(n=10^3\), for example, the production is \(m=163\), for \(n=10^6\) the production is \(m= 163,200\), and for \(n=10^9\) the production is \(m= 163,200,000\). In Fig. 9 (Sect. 6), we demonstrate the trade-off between leakage rate (Y-axis) with production rate (X-axis). We note that even in the high leakage setting, for instance, for \(\beta = 20\%\), we have \(\alpha \approx 3\%\). Hence, the production is \(m\approx 30\), for \(n=10^6\) the production is \(m\approx 30,000\), and for \(n=10^9\) the production is \(m\approx 30,000,000\). Our production is overwhelmingly higher than the BMN production rate.

Fig. 7.
figure 7

The production rate of our correlation extractor for , where \(\beta = t/n=1\%\) rate of leakage using different finite fields.

Fig. 8.
figure 8

The production rate of our correlation extractor for . We are given n-bit shares of the correlation, and fix \(\beta = t/n=1\%\) fractional leakage. Each row corresponds to using our -to- correlation extractor as an intermediate step. The final column represents the production rate \(\alpha =m/n\) of our -to- correlation extractor corresponding to the choice of the finite field \({\mathbb F}\).

5.2 Correlation Extractor for ROT (Theorem 2)

In this section we compare our construction with the GIMS [22] correlation extractor from . The IKOS [26] correlation extractor is a feasibility result with minuscule fractional resilience and production rate.

Fig. 9.
figure 9

A comparison of the feasibility regions for our correlation extractors for for various finite fields \({\mathbb F}\) of characteristic 2. For each plot, the X-axis represents the relative production rate \(\alpha =m/n\) and the Y-axis represents the fractional leakage resilience \(\beta =t/n\).

GIMS Production. The GIMS correlation extractor for [22] trades-off simulation error to achieve higher production by sub-sampling the precomputed s. For \(\beta =(t/n)=1\%\) fractional leakage, the GIMS correlation extractor achieves (roughly) \(m = n/4p\) production with \(\varepsilon = m\cdot 2^{-p/4}\) simulation error. To achieve negligible simulation error, suppose \(p = \log ^2(n)\). For this setting, at \(n = 10^3\), \(n = 10^6\), and \(n=10^9\), the GIMS correlation extractor obtains \(m = 3\), \(m =625\), and \(m = 277,777\), respectively. These numbers are significantly lower than what our construction achieves.

Our Production. We use a bilinear multiplication algorithm to realize one by performing several . For example, we use \(\mu _2(s) = 15\) s to implement one , where \(s=6\). Thus, our original n-bit share changes into \(n'\)-bit share, where \(n' = (6/15)n\) while preserving the leakage \(t=\beta n\). So, the fractional leakage now becomes \(t = \beta ' n'\), where \(\beta ' = (15/6)\beta \). Now, we can compute the production \(m'=\alpha ' n'=\alpha n\).

The highest rate is achieved for \(s=10\), i.e., constructing the correlation extractor for via the correlation extractor for . For this choice, our correlation extractor achieves production rate \(\alpha =(m/n)=4.20\%\), if the fractional leakage is \(\beta =(t/n)=1\%\). For \(n = 10^3\), \(n = 10^6\), and \(n=10^9\), our construction obtains \(m=42\), \(m=42,000\), and \(m=42,000,000\), respectively.

5.3 Close to Optimal Resilience

An interesting facet of our correlation extractor for is the following. As \(q=\left| {{\mathbb F}}\right| \) increases, the maximum fractional resilience, i.e., the intercept of the feasibility curve on the Y-axis, tends to 1/4. Ishai et al. [27] showed that any correlation extractor cannot be resilient to fractional leakage \(\beta =(t/n)=25\%\). For every \(g\in (0,1/4]\), we show that, by choosing sufficiently large q, we can achieve positive production rate \(\alpha =(m/n)\) for \(\beta =(1/4 -g)\). Thus, our family of correlation extractors (for larger, albeit constant-size, finite fields) achieve near optimal fractional resilience. Figure 9 (Sect. 6) demonstrates this phenomenon for a few values of q. The proof of this result, which proves Theorem 4, can be found in the full version of our work [8].

6 Parameter Comparison Graphs

In this section we highlight the feasibility of parameters for our to correlation extractor (Theorem 2) for a few representative values of \(q=\left| {{\mathbb F}}\right| \).

The shaded regions in the graphs in Fig. 9 represent the feasible parameter choices. In particular, the X-axis represents the production rate m/n and the Y-axis represents the leakage rate t/n given our parameter choices. The full version of the paper [8] details the calculation of the feasible parameters.

Note that, as the size of the field \({\mathbb F} \) increases, the quality of the algebraic geometric code used in our construction increases. This observation translates into higher possible production values and leakage resilience, which is illustrated by increasing \(q=2^6\) to \(q=2^{14}\). However, as the size of the field \({\mathbb F} \) increases, the efficiency of the BMN embedding [9] reduces, potentially reducing the overall production rate (for example, increasing \(q = 2^{14}\) to \(q = 2^{20}\)).

Finally, as noted earlier, the feasibility graphs demonstrate that our family of correlation extractors achieve near optimal fractional resilience. That is, as the size of the field \({\mathbb F} \) increases, the fractional leakage resilience approaches 1/4, which is optimal [27].