Secure two-party input-size reduction: Challenges, solutions and applications
Introduction
Secure Multiparty Computation (SMC) has recently become a more practically applicable cryptographic technology, and many researchers have been pursuing efficient solutions to a wide variety of SMC problems (e.g., [43], [44], [39], [19], [45], [50], [42], [49], [47], [51]). Some protocols use preprocessing to improve the overall performance, mainly using techniques that are either problem-specific or approach-specific [22], [39]. In this work, we develop a generic preprocessing mechanism, applicable for all SMC problems that rely on “equal/not equal” comparisons (e.g., pattern matching). Our mechanism reduces the bit-length of inputs such that using the size-reduced inputs to solve the problem of interest results in the same output as if the original inputs had been used. Such input-size reduction is especially advantageous when its cost can be amortized over multiple subsequent computations that all benefit from the already-done size-reduction. This is true for all SMC approaches, e.g., in garbled circuits, the circuit size depends on the inputs’ bit-length, and in number-theoretic approaches, the arithmetic cost depends on the size of input integers.
Aiming for faster information retrieval and saving memory space in an SMC setting, Goldreich et al. [22] considered the problem of mapping long names into smaller abbreviations. The solution in [22] (i) requires a trusted party who knows all input names, and (ii) allows a small probability of collision for each pair of names (see Section 1.2 for details). In this paper, we consider the above size-reduction problem when there is no such trusted party. Moreover, we impose two further desiderata: We require (i) a very small abbreviation space, i.e., “codomain” (where “small” means “equal or close to the number of input items”), and (ii) a guarantee of no collisions at all, which is particularly challenging in view of the above requirement of a very small codomain. Although these requirements make the problem considerably more difficult, they are necessitated by our results’ specific applications.
Consider a functionality whose inputs are over a large domain , but only a small subset of appears in the inputs (see Section 7 for examples). In such situations, it is desirable to avoid the communication and computation costs corresponding to symbols of that do not occur in the inputs. If security is not a concern, it is easy to replace the large inputs with shorter ones whose bit-length is the logarithm of the count of occurring symbols (rather than ). However, in the SMC setting where each input is private to a party, it would be inappropriate to reveal the set of occurring symbols; doing so would leak information about the private inputs. Hence, a secure preprocessing mechanism is needed to obtain the size-reduced symbols. In other words, symbols that appear in the inputs must be securely mapped to smaller values in a way that (i) the mapping is consistent among all parties, (ii) the mapping is collision-free, and (iii) the bit-length of size-reduced symbols is as small as possible. Section 7 gives practical examples that benefit from such a secure input-size reduction.
The secure input-size reduction problem discussed in this paper is reminiscent of (but different in fundamental ways from) the classic perfect hashing problem: Let S be a set containing large integers (i.e., ). A perfect hash function (PHF) for S is a function such that N is a small integer (close to ) and is injective on S [15], [16]. In this paper, we propose a secure perfect hashing approach to the input-size reduction problem of interest. Although perfect hashing is a well-studied and broadly used topic when security and privacy are of no concern [16], [13], [25], [48], [20], [7], [9], [32], [15], the patent document by Nawaz et al. [37] seems to be the only existing work that considers perfect hashing in a secure setting; however, the results of [37] do not solve the secure input-size reduction problem (see Section 1.2 for details).
The first step in designing Secure Perfect Hash Functions (SPHFs) is to precisely formulate the security requirements with respect to the private input sets and (). One requirement is that neither Alice nor Bob should be able to individually compute for , where is the desired SPHF. Otherwise, the inherently small codomain of would enable a party to obtain information about the other party’s private set via a membership-testing attack – because of ’s small codomain, it would be trivial to find integers that collide under , resulting in information leakage about the private sets. For example, if , Alice learns that any with cannot be in (otherwise, would not be injective on S). The above vulnerability cannot be fixed by enlarging the codomain as it would defeat the purpose of input-size reduction and perfect hashing. To overcome this obstacle, Definition 1 requires that Alice and Bob learn only the image of their own respective private set under (Fig. 1). Definition 1 Secure Perfect Hash Function A hash function is an SPHF for if and only if Correctness: is a PHF for S ( is injective on S). Security: Given Alice’s private output , but not itself, Alice must not be able to learn anything about . Formally, for any Similarly, for Bob’s private output and any where is a negligible function in the implied security parameter; we say, is a perfect SPHF if and only if .
Property 2 in Definition 1 implies that a minimal SPHF has a codomain of possible hash values (the alternative choice would reveal information about ).
There are two major obstacles towards obtaining an SPHF for :
- 1.
Privacy and Security: The classical perfect hashing algorithms (e.g., [20], [7]) require centralized knowledge of S to obtain the injective behavior. Even the secure size-reduction scheme in [22] depends on a trusted party who knows all the input items (the scheme also allows collisions with a small probability). Our solutions reveal only a value m that is equal to . If it is desired to hide and [33], parties may add a random number of dummy items to their respective private sets before engaging in our SPHF construction protocols, thereby revealing only a loose upper bound on the sizes of their respective sets.
- 2.
Efficiency: It is not clear how to use the general SMC techniques to practically and securely implement the classical perfect hashing algorithms. For example, currently there is no known practical circuit for implementing such schemes through garbled circuits. Moreover, such a (hypothetical) circuit would have a large number of gates, because of the large input items in and . Our solutions mitigate this “curse of large inputs” through a judicious combination of local computations and fast cryptographic primitives such as lightweight computations in additive split form.
Below, we review the existing work in the literature related to perfect hashing and secure input-size reduction. Construction and analysis of perfect hash functions are well-studied in the area of algorithms and data structures when security and privacy are not of concern [16], [20], [15], [13], [25], [48], [9], [7], [32]. Some constructions [13], [25], [48] rely on the theoretical properties of sets of integers. Though these methods are simple and theoretically interesting, they tend to be slow either in the construction or evaluation step (or both), unlike Fredman et al. [20] who proposed an efficient two-level PHF construction that obtains asymptotically minimal codomain size . The key idea in [20] is to partition the items into m subsets using a universal family of hash functions [12], followed by another layer of universal hashing to obtain injective behavior on each subset. More recent works such as [7], [9], [32] focus on efficient solutions to (nearly) minimal perfect hashing.
Nawaz et al. [37] seems to be the only existing work that addresses the need of a secure perfect hash function for data matching applications between private databases. They propose a solution based on cryptographic hash functions and the hash, displace, and compress technique [7]. In [37], authors argue their scheme is secure with respect to the private sets due to the preimage-resistant properties of cryptographic hash functions like SHA2. However, this is not necessarily true since their design uses only a few bytes of the cryptographic hash output to construct a perfect hash function. Thus, the resulting SPHF is prone to the membership-testing attack described earlier. Also, [37] lacks formal security, correctness, and performance analyses and is vague as to how parties interact.
Following the GGM construction for Pseudorandom Functions (PRFs) [23], Goldreich et al. [22] considered the problem of mapping long names into smaller abbreviations with a small probability of collision for each pair of items. This achieves faster information retrieval and saves memory space [22]. As an application, [22] designed a Friend or Foe Identification (FFI) mechanism, which enables members of a secret club to identify each other. The PRF-based solution to FFI [22] is suitable for cryptographic purposes as it is more robust than schemes based on classical primitives such as universal hashing [12]. In FFI, only “club members” can use the system, and their designated leader knows all members’ names. On the contrary, we consider the above name-reduction problem when there is no such leader. Additionally, we impose two further requirements: (i) a very small abbreviation space, i.e., “codomain” (by “small” we mean “equal or close to the number of input items”), and (ii) a guarantee of no collisions at all. The latter requirement is particularly challenging because of the very small codomain.
In 2015, [39] proposed a bit-length reduction scheme specifically geared towards faster Private Set Intersection (PSI) in the SMC framework. The proposed method in [39] elegantly applies simple hashing for partitioning the input sets into a number of bins such that the reduced bit-representations for two items in the same bin are equal if and only if the two items are equal. The scheme in [39] results in no intra-bin collisions, but does not guarantee the absence of inter-bin collisions; i.e., [39] allows two items to have the same size-reduced representations as long as they are not in the same bin. Although the above works perfectly for solving PSI, the fact that it allows inter-bin collisions prevents its use for many SMC problems. Our solutions do not allow collisions of any type.
Some articles use terminologies reminiscent of ours but are different in nature. One is Oblivious Hashing [14], which is a software integrity verification scheme. Another is PerfectDedup [41], which is a mechanism for data deduplication in cloud storage management that uses a PHF on encrypted data to securely identify the popular data segments for deduplication.
This paper is organized as follows. Section 2 summarizes our contributions, and Section 3 reviews the required preliminaries for our constructions. Section 4 describes our perfect SPHF construction, FindSPHF, and its two embodiments. 5 Distribution: probabilistic input partitioning, 6 Overall distribution-resolution scheme build on top of FindSPHF to further improve the performance. Section 7 discusses some use cases and practical implications of our approach, and Section 8 concludes.
Section snippets
Summary of our contributions
This paper represents the first formal attempt to define and solve the secure perfect hashing problem and uses it for secure input-size reduction in the SMC framework. Although our definitions and solutions have natural extensions for any number of parties, we focus on the two-party case. Below, we give an overview of our main contributions.
Problem Formulation (Definition 1). We formulate secure two-party input-size reduction by formalizing the notion of an SPHF for as a PHF for S such
Preliminaries and notations
This section reviews the existing concepts and primitives used in the remainder of this paper. Moreover, Table 2 summarizes notations widely used in this article.
Warm up: constructing a minimal perfect SPHF for
W.l.o.g., we assume resulting in . Let denote the set of all functions . Lemma 1 formally argues that any function chosen uniformly at random from is a minimal perfect SPHF for ; recall that a minimal SPHF has a codomain of size . Based on this observation, we propose the FindSPHF approach that gives a minimal perfect SPHF for S. The key idea in FindSPHF is (i) obliviously assigning distinct random hash values in range
Distribution: probabilistic input partitioning
Although the FindSPHF approach gives a solution to both secure perfect hashing and multiparty input-size reduction problems, it requires either quadratic computation and communication (Label-then-Unify) or a logarithmic number of rounds (Merge-then-Unify). In this section, we further improve the performance of our constructions by building on top of the FindSPHF approach. To do so, we use a standard balls and bins analysis [35] to create m subproblems of size or through a
Overall distribution-resolution scheme
We explain how Distribution-Resolution scheme (Fig. 9) combines Sections 4 and 5 to construct an SPHF. We also discuss the inherent - parameter trade-off. Finally, Theorem 1 states the correctness and security of the Distribution-Resolution scheme.
Let be the distributor that partitions S into m subproblems of size at most each. Moreover, let be the resolvers obtained by FindSPHF for each subproblem. As depicted in Fig. 9, SPHF works as follows: There are
Discussion: use cases and implications in SMC
Below, we describe two SMC applications for which input-size reduction and/or using SPHFs result in significant performance improvements. We also discuss the implicit connection of secure perfect hashing with the important and well-studied problem of PSI.
Conclusion
This paper formalizes and solves the problem of input-size reduction preprocessing that results in more efficient SMC protocols, without affecting the outputs of these protocols. To do so, we formalize the notion of security for perfect hashing in the SMC framework, and propose efficient constructions that give secure perfect hash functions. In addition to solving the input-size reduction, this also brings the traditional advantages of perfect hashing (less memory space, faster access, etc.) to
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Javad Darivandpour: Conceptualization, Methodology, Validation, Formal analysis, Writing - original draft, Writing - review & editing, Visualization, Project administration. Duc V. Le: Conceptualization, Methodology, Validation, Formal analysis, Writing - review & editing. Mikhail J. Atallah: Conceptualization, Methodology, Validation, Formal analysis, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition.
Acknowledgements
Portions of this work were supported by National Science Foundation Grant CPS-1329979, and sponsors of the Center for Education and Research in Information Assurance and Security. The authors are grateful to Kent Quanrud for his insightful comments.
References (51)
- et al.
Practical perfect hashing in nearly optimal space
Inf. Syst.
(2013) - et al.
Universal classes of hash functions
J. Comput. Syst. Sci.
(1979) - et al.
Perfect hashing
Theoret. Comput. Sci.
(1997) - et al.
Secure k-skyband computation framework in distributed multi-party databases
Inf. Sci.
(2020) - et al.
Symmetric cryptographic solution to yao’s millionaires’ problem and an evaluation of secure multiparty computations
Inf. Sci.
(2008) - et al.
Secure multiparty computation of solid geometric problems and their applications
Inf. Sci.
(2014) - et al.
Secure multi-party computation: theory, practice and applications
Inf. Sci.
(2019) - et al.
Efficient delegated private set intersection on outsourced private datasets
IEEE Trans. Dependable Secure Comput.
(2017) Generalized string matching
SIAM J. Comput.
(1987)- et al.
Fast parallel and serial multidimensional approximate array matching