Secure two-party input-size reduction: Challenges, solutions and applications

doi:10.1016/j.ins.2021.01.038

Information Sciences

Volume 567, August 2021, Pages 256-277

https://doi.org/10.1016/j.ins.2021.01.038 Get rights and content

Highlights

•
We motivate, formulate, and solve the generic two-party Secure Input-size Reduction.
•
Our solutions are based on protocols for secure perfect hashing in a two-party setting.
•
We propose lightweight and perfectly secure protocols with optimal reduced sizes.
•
We further improve our protocols’ efficiency for security against a PPT adversary.
•
We give use cases where our protocols result in significant performance improvements.

Abstract

The computation and communication costs of many secure multiparty protocols would benefit from a preprocessing that replaces large inputs with much smaller values without changing the outputs. This preprocessing is especially advantageous when its cost can be amortized over subsequent computations that all benefit from smaller inputs. The above holds for protocols based on garbled circuits, homomorphic encryption, or other techniques. Problems benefiting from such preprocessing include pattern matching, information retrieval, and sequence comparisons that depend on (in)equality of comparands. Motivated by this (in)equality-preservation requirement, we define the problem as follows: Alice’s and Bob’s inputs are their respective private sets $S^{A}$ and $S^{B}$ of large integers, and their private outputs are images of their sets under a function $ρ$ that injectively maps $S^{A} \cup S^{B}$ into ${0, 1, \dots, N - 1}$ for a small $N ⩾ | S^{A} | + | S^{B} |$ . Alice’s (Bob’s) knowledge of this mapping on $S^{A}$ ( $S^{B}$ ) must reveal nothing about $S^{B}$ ( $S^{A}$ ). Thus, neither party should be able to learn $ρ (x)$ for any x that is not in its private set; otherwise, s/he could exploit the small codomain of $ρ$ to learn about the other party’s set. We formalize the problem, propose efficient and secure (semi-honest model) solutions to it, and discuss its use cases.

Introduction

Secure Multiparty Computation (SMC) has recently become a more practically applicable cryptographic technology, and many researchers have been pursuing efficient solutions to a wide variety of SMC problems (e.g., [43], [44], [39], [19], [45], [50], [42], [49], [47], [51]). Some protocols use preprocessing to improve the overall performance, mainly using techniques that are either problem-specific or approach-specific [22], [39]. In this work, we develop a generic preprocessing mechanism, applicable for all SMC problems that rely on “equal/not equal” comparisons (e.g., pattern matching). Our mechanism reduces the bit-length of inputs such that using the size-reduced inputs to solve the problem of interest results in the same output as if the original inputs had been used. Such input-size reduction is especially advantageous when its cost can be amortized over multiple subsequent computations that all benefit from the already-done size-reduction. This is true for all SMC approaches, e.g., in garbled circuits, the circuit size depends on the inputs’ bit-length, and in number-theoretic approaches, the arithmetic cost depends on the size of input integers.

Aiming for faster information retrieval and saving memory space in an SMC setting, Goldreich et al. [22] considered the problem of mapping long names into smaller abbreviations. The solution in [22] (i) requires a trusted party who knows all input names, and (ii) allows a small probability of collision for each pair of names (see Section 1.2 for details). In this paper, we consider the above size-reduction problem when there is no such trusted party. Moreover, we impose two further desiderata: We require (i) a very small abbreviation space, i.e., “codomain” (where “small” means “equal or close to the number of input items”), and (ii) a guarantee of no collisions at all, which is particularly challenging in view of the above requirement of a very small codomain. Although these requirements make the problem considerably more difficult, they are necessitated by our results’ specific applications.

Consider a functionality $F$ whose inputs are over a large domain $Σ$ , but only a small subset of $Σ$ appears in the inputs (see Section 7 for examples). In such situations, it is desirable to avoid the communication and computation costs corresponding to symbols of $Σ$ that do not occur in the inputs. If security is not a concern, it is easy to replace the large inputs with shorter ones whose bit-length is the logarithm of the count of occurring symbols (rather than $\log | Σ |$ ). However, in the SMC setting where each input is private to a party, it would be inappropriate to reveal the set of occurring symbols; doing so would leak information about the private inputs. Hence, a secure preprocessing mechanism is needed to obtain the size-reduced symbols. In other words, symbols that appear in the inputs must be securely mapped to smaller values in a way that (i) the mapping is consistent among all parties, (ii) the mapping is collision-free, and (iii) the bit-length of size-reduced symbols is as small as possible. Section 7 gives practical examples that benefit from such a secure input-size reduction.

The secure input-size reduction problem discussed in this paper is reminiscent of (but different in fundamental ways from) the classic perfect hashing problem: Let S be a set containing large integers (i.e., $S \subset Σ = {0, \dots, 2^{σ} - 1} = {0, 1}^{σ}$ ). A perfect hash function (PHF) for S is a function $ρ : Σ \to {0, 1, \dots, N - 1}$ such that N is a small integer (close to $| S |$ ) and $ρ$ is injective on S [15], [16]. In this paper, we propose a secure perfect hashing approach to the input-size reduction problem of interest. Although perfect hashing is a well-studied and broadly used topic when security and privacy are of no concern [16], [13], [25], [48], [20], [7], [9], [32], [15], the patent document by Nawaz et al. [37] seems to be the only existing work that considers perfect hashing in a secure setting; however, the results of [37] do not solve the secure input-size reduction problem (see Section 1.2 for details).

The first step in designing Secure Perfect Hash Functions (SPHFs) is to precisely formulate the security requirements with respect to the private input sets $S^{A}$ and $S^{B}$ ( $S = S^{A} \cup S^{B} \subset Σ$ ). One requirement is that neither Alice nor Bob should be able to individually compute $ρ (x)$ for $x \in Σ$ , where $ρ$ is the desired SPHF. Otherwise, the inherently small codomain of $ρ$ would enable a party to obtain information about the other party’s private set via a membership-testing attack – because of $ρ$ ’s small codomain, it would be trivial to find integers that collide under $ρ$ , resulting in information leakage about the private sets. For example, if $y \in S^{A}$ , Alice learns that any $x \neq y$ with $ρ (x) = ρ (y)$ cannot be in $S^{B}$ (otherwise, $ρ$ would not be injective on S). The above vulnerability cannot be fixed by enlarging the codomain as it would defeat the purpose of input-size reduction and perfect hashing. To overcome this obstacle, Definition 1 requires that Alice and Bob learn only the image of their own respective private set under $ρ$ (Fig. 1).

Definition 1 Secure Perfect Hash Function

A hash function $ρ : Σ \to {0, 1, \dots, N - 1}$ is an SPHF for $S = S^{A} \cup S^{B}$ if and only if

1.
Correctness: $ρ$ is a PHF for S ( $ρ$ is injective on S).
2.
Security: Given Alice’s private output $ϱ^{A} = {(x, ρ (x))}_{x \in S^{A}}$ , but not $ρ$ itself, Alice must not be able to learn anything about $S^{B}$ . Formally, for any $y \in Σ$ $| \Pr [y \in S^{B} | ϱ^{A}] - \Pr [y \in S^{B}] | ⩽ negl,$
Similarly, for Bob’s private output $ϱ^{B} = {(x, ρ (x))}_{x \in S^{B}}$ and any $y \in Σ$ $| \Pr [y \in S^{A} | ϱ^{B}] - \Pr [y \in S^{A}] | ⩽ negl,$ where $negl$ is a negligible function in the implied security parameter; we say, $ρ$ is a perfect SPHF if and only if $negl = 0$ .

Property 2 in Definition 1 implies that a minimal SPHF has a codomain of $N = | S^{A} | + | S^{B} |$ possible hash values (the alternative choice $| S^{A} \cup S^{B} |$ would reveal information about $S^{A} \cap S^{B}$ ).

There are two major obstacles towards obtaining an SPHF for $S = S^{A} \cup S^{B}$ :

1.
Privacy and Security: The classical perfect hashing algorithms (e.g., [20], [7]) require centralized knowledge of S to obtain the injective behavior. Even the secure size-reduction scheme in [22] depends on a trusted party who knows all the input items (the scheme also allows collisions with a small probability). Our solutions reveal only a value m that is equal to $| S^{A} | + | S^{B} |$ . If it is desired to hide $| S^{A} |$ and $| S^{B} |$ [33], parties may add a random number of dummy items to their respective private sets before engaging in our SPHF construction protocols, thereby revealing only a loose upper bound on the sizes of their respective sets.
2.
Efficiency: It is not clear how to use the general SMC techniques to practically and securely implement the classical perfect hashing algorithms. For example, currently there is no known practical circuit for implementing such schemes through garbled circuits. Moreover, such a (hypothetical) circuit would have a large number of gates, because of the large input items in $S^{A}$ and $S^{B}$ . Our solutions mitigate this “curse of large inputs” through a judicious combination of local computations and fast cryptographic primitives such as lightweight computations in additive split form.

Below, we review the existing work in the literature related to perfect hashing and secure input-size reduction. Construction and analysis of perfect hash functions are well-studied in the area of algorithms and data structures when security and privacy are not of concern [16], [20], [15], [13], [25], [48], [9], [7], [32]. Some constructions [13], [25], [48] rely on the theoretical properties of sets of integers. Though these methods are simple and theoretically interesting, they tend to be slow either in the construction or evaluation step (or both), unlike Fredman et al. [20] who proposed an efficient two-level PHF construction that obtains asymptotically minimal codomain size $O (| S |)$ . The key idea in [20] is to partition the items into m subsets using a universal family of hash functions [12], followed by another layer of universal hashing to obtain injective behavior on each subset. More recent works such as [7], [9], [32] focus on efficient solutions to (nearly) minimal perfect hashing.

Nawaz et al. [37] seems to be the only existing work that addresses the need of a secure perfect hash function for data matching applications between private databases. They propose a solution based on cryptographic hash functions and the hash, displace, and compress technique [7]. In [37], authors argue their scheme is secure with respect to the private sets due to the preimage-resistant properties of cryptographic hash functions like SHA2. However, this is not necessarily true since their design uses only a few bytes of the cryptographic hash output to construct a perfect hash function. Thus, the resulting SPHF is prone to the membership-testing attack described earlier. Also, [37] lacks formal security, correctness, and performance analyses and is vague as to how parties interact.

Following the GGM construction for Pseudorandom Functions (PRFs) [23], Goldreich et al. [22] considered the problem of mapping long names into smaller abbreviations with a small probability of collision for each pair of items. This achieves faster information retrieval and saves memory space [22]. As an application, [22] designed a Friend or Foe Identification (FFI) mechanism, which enables members of a secret club to identify each other. The PRF-based solution to FFI [22] is suitable for cryptographic purposes as it is more robust than schemes based on classical primitives such as universal hashing [12]. In FFI, only “club members” can use the system, and their designated leader knows all members’ names. On the contrary, we consider the above name-reduction problem when there is no such leader. Additionally, we impose two further requirements: (i) a very small abbreviation space, i.e., “codomain” (by “small” we mean “equal or close to the number of input items”), and (ii) a guarantee of no collisions at all. The latter requirement is particularly challenging because of the very small codomain.

In 2015, [39] proposed a bit-length reduction scheme specifically geared towards faster Private Set Intersection (PSI) in the SMC framework. The proposed method in [39] elegantly applies simple hashing for partitioning the input sets into a number of bins such that the reduced bit-representations for two items in the same bin are equal if and only if the two items are equal. The scheme in [39] results in no intra-bin collisions, but does not guarantee the absence of inter-bin collisions; i.e., [39] allows two items to have the same size-reduced representations as long as they are not in the same bin. Although the above works perfectly for solving PSI, the fact that it allows inter-bin collisions prevents its use for many SMC problems. Our solutions do not allow collisions of any type.

Some articles use terminologies reminiscent of ours but are different in nature. One is Oblivious Hashing [14], which is a software integrity verification scheme. Another is PerfectDedup [41], which is a mechanism for data deduplication in cloud storage management that uses a PHF on encrypted data to securely identify the popular data segments for deduplication.

This paper is organized as follows. Section 2 summarizes our contributions, and Section 3 reviews the required preliminaries for our constructions. Section 4 describes our perfect SPHF construction, FindSPHF, and its two embodiments. 5 Distribution: probabilistic input partitioning, 6 Overall distribution-resolution scheme build on top of FindSPHF to further improve the performance. Section 7 discusses some use cases and practical implications of our approach, and Section 8 concludes.

Section snippets

Summary of our contributions

This paper represents the first formal attempt to define and solve the secure perfect hashing problem and uses it for secure input-size reduction in the SMC framework. Although our definitions and solutions have natural extensions for any number of parties, we focus on the two-party case. Below, we give an overview of our main contributions.

Problem Formulation (Definition 1). We formulate secure two-party input-size reduction by formalizing the notion of an SPHF for $S = S^{A} \cup S^{B}$ as a PHF for S such

Preliminaries and notations

This section reviews the existing concepts and primitives used in the remainder of this paper. Moreover, Table 2 summarizes notations widely used in this article.

Warm up: constructing a minimal perfect SPHF for $S = S^{A} \cup S^{B}$

W.l.o.g., we assume $| S^{A} | = | S^{B} | = γ$ resulting in $m = 2 γ$ . Let ${Inj}_{Σ, 2 γ} (S)$ denote the set of all functions $f : Σ \overset{Injective on S}{⟶} \{0, 1, \dots, 2 γ - 1\}$ . Lemma 1 formally argues that any function chosen uniformly at random from ${Inj}_{Σ, 2 γ} (S)$ is a minimal perfect SPHF for $S = S^{A} \cup S^{B}$ ; recall that a minimal SPHF has a codomain of size $N = m$ . Based on this observation, we propose the FindSPHF approach that gives a minimal perfect SPHF for S. The key idea in FindSPHF is (i) obliviously assigning distinct random hash values in range ${0,$

Distribution: probabilistic input partitioning

Although the FindSPHF approach gives a solution to both secure perfect hashing and multiparty input-size reduction problems, it requires either quadratic computation and communication (Label-then-Unify) or a logarithmic number of rounds (Merge-then-Unify). In this section, we further improve the performance of our constructions by building on top of the FindSPHF approach. To do so, we use a standard balls and bins analysis [35] to create m subproblems of size $O (\log m)$ or $O (\frac{\log m}{\log \log m})$ through a

Overall distribution-resolution scheme

We explain how Distribution-Resolution scheme (Fig. 9) combines Sections 4 and 5 to construct an SPHF. We also discuss the inherent $γ$ - $λ$ parameter trade-off. Finally, Theorem 1 states the correctness and security of the Distribution-Resolution scheme.

Let $Ψ$ be the distributor that partitions S into m subproblems of size at most $2 γ$ each. Moreover, let $ρ_{0}, ρ_{1}, \dots, ρ_{m - 1}$ be the resolvers obtained by FindSPHF for each subproblem. As depicted in Fig. 9, SPHF $ρ = 〈 Ψ, ρ_{0}, ρ_{1}, \dots, ρ_{m - 1} 〉$ works as follows: There are $2 γ$

Discussion: use cases and implications in SMC

Below, we describe two SMC applications for which input-size reduction and/or using SPHFs result in significant performance improvements. We also discuss the implicit connection of secure perfect hashing with the important and well-studied problem of PSI.

Conclusion

This paper formalizes and solves the problem of input-size reduction preprocessing that results in more efficient SMC protocols, without affecting the outputs of these protocols. To do so, we formalize the notion of security for perfect hashing in the SMC framework, and propose efficient constructions that give secure perfect hash functions. In addition to solving the input-size reduction, this also brings the traditional advantages of perfect hashing (less memory space, faster access, etc.) to

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Javad Darivandpour: Conceptualization, Methodology, Validation, Formal analysis, Writing - original draft, Writing - review & editing, Visualization, Project administration. Duc V. Le: Conceptualization, Methodology, Validation, Formal analysis, Writing - review & editing. Mikhail J. Atallah: Conceptualization, Methodology, Validation, Formal analysis, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Acknowledgements

Portions of this work were supported by National Science Foundation Grant CPS-1329979, and sponsors of the Center for Education and Research in Information Assurance and Security. The authors are grateful to Kent Quanrud for his insightful comments.

References (51)

F.C. Botelho et al.
Practical perfect hashing in nearly optimal space
Inf. Syst.
(2013)
J.L. Carter et al.
Universal classes of hash functions
J. Comput. Syst. Sci.
(1979)
Z.J. Czech et al.
Perfect hashing
Theoret. Comput. Sci.
(1997)
M. Qaosar et al.
Secure k-skyband computation framework in distributed multi-party databases
Inf. Sci.
(2020)
L. Shundong et al.
Symmetric cryptographic solution to yao’s millionaires’ problem and an evaluation of secure multiparty computations
Inf. Sci.
(2008)
L. Shundong et al.
Secure multiparty computation of solid geometric problems and their applications
Inf. Sci.
(2014)
C. Zhao et al.
Secure multi-party computation: theory, practice and applications
Inf. Sci.
(2019)
A. Abadi et al.
Efficient delegated private set intersection on outsourced private datasets
IEEE Trans. Dependable Secure Comput.
(2017)
K. Abrahamson
Generalized string matching
SIAM J. Comput.
(1987)
A. Amir et al.
Fast parallel and serial multidimensional approximate array matching

M.J. Atallah

Faster image template matching in the sum of the absolute value of differences measure

IEEE Trans. Image Process.

(2001)

M.J. Atallah et al.

Secure multi-party computational geometry

K.E. Batcher, Sorting networks and their applications, in: Proceedings of the April 30–May 2, 1968, Spring Joint...

D. Belazzougui, F.C. Botelho, M. Dietzfelbinger, Hash, displace, and compress, in: A. Fiat, P. Sanders (Eds.),...

D. Bogdanov, Sharemind: programmable secure computations with practical applications. PhD thesis, University of Tartu,...

J. Burns, D. Moore, K. Ray, R. Speers, B. Vohaska, Ec-oprf: oblivious pseudorandom functions using elliptic curves,...

R. Canetti, Universally composable security: a new paradigm for cryptographic protocols, in: Proceedings 42nd IEEE...

C. Chang, C. Chang, An ordered minimal perfect hashing scheme with single parameter, Inf. Process. Lett. 27 (2) (1988)...

Y. Chen, R. Venkatesan, M. Cary, R. Pang, S. Sinha, M.H. Jakubowski, Oblivious hashing: a stealthy software integrity...

T.H. Cormen et al.

Introduction to Algorithms

(2009)

I. Damgård et al.

Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation

J. Darivandpour et al.

Efficient and secure pattern matching with wildcards using lightweight cryptography

Comput. Secur.

(2018)

C. Dong et al.

Approximating private set union/intersection cardinality with logarithmic complexity

IEEE Trans. Inf. Forensics Secur.

(2017)

M.L. Fredman et al.

Storing a sparse table with o(1) worst case access time

J. ACM (JACM)

(1984)

M.J. Freedman et al.

Keyword search and oblivious pseudorandom functions

Cited by (0)

View full text

Secure two-party input-size reduction: Challenges, solutions and applications

Highlights

Abstract

Introduction

Section snippets

Summary of our contributions

Preliminaries and notations

Warm up: constructing a minimal perfect SPHF for S=SA∪SB

Distribution: probabilistic input partitioning

Overall distribution-resolution scheme

Discussion: use cases and implications in SMC

Conclusion

Declaration of Competing Interest

CRediT authorship contribution statement

Acknowledgements

Inf. Syst.

J. Comput. Syst. Sci.

Theoret. Comput. Sci.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Efficient delegated private set intersection on outsourced private datasets

IEEE Trans. Dependable Secure Comput.

Generalized string matching

SIAM J. Comput.

Fast parallel and serial multidimensional approximate array matching

Faster image template matching in the sum of the absolute value of differences measure

IEEE Trans. Image Process.

Secure multi-party computational geometry

Introduction to Algorithms

Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation

Efficient and secure pattern matching with wildcards using lightweight cryptography

Comput. Secur.

Approximating private set union/intersection cardinality with logarithmic complexity

IEEE Trans. Inf. Forensics Secur.

Storing a sparse table with o(1) worst case access time

J. ACM (JACM)

Keyword search and oblivious pseudorandom functions

Warm up: constructing a minimal perfect SPHF for $S = S^{A} \cup S^{B}$