1 Introduction

Error correcting codes allow a message m to be encoded into a codeword c, such that m can always be recovered even from a tampered codeword \(c'\) if the tampering is done in a specific way. More formally, the class of tampering functions, \(\mathcal {F}\), tolerated by traditional error correction codes are ones that erase or modify only a constant fraction of the codeword c. However, no guarantees are provided on the output of the decoding algorithm when the tampering function \(f \notin \mathcal {F}\). A more relaxed notion, error detecting codes, allows the decoder to also output a special symbol \(\bot \) when m is unrecoverable from \(c'\). But here too, the codes can not tolerate many simple tampering functions such as a constant function.

Non-malleable Codes. The seminal work of Dziembowski, Pietrzak, and Wichs [36] introduced the notion of non-malleable codes (NMC). Informally, an encoding scheme is an NMC against a class of tampering functions, \(\mathcal {F}\), if the following holds: given a tampered codeword for some \(f \in \mathcal {F}\), the decoded message is either equal to the original message m or the original message is essentially “destroyed” and \(m'\) is completely unrelated to m. In general, NMCs cannot exist for the set of all tampering functions \(\mathcal {F}_{\textsf {all}}\). To see this, observe that a tampering function that simply runs the decode algorithm to retrieve m and then encodes a message related to m trivially defeats the requirement above. In light of this observation, a rich line of works has dealt with constructing non-malleable codes for different classes of tampering attacks (see Sect. 1.2 for a discussion).

While non-malleable codes have the obvious advantage that one can obtain meaningful guarantees for a larger class of tampering functions (compared to error correcting codes), they have also found a number of interesting applications in cryptography. In particular, NMCs have found a number of applications in tamper-resilient cryptography [36, 40, 41, 60] and they have also been useful in constructing non-malleable encryption [29]. Recently, non-malleable codes were also used to obtain a round optimal protocol for non-malleable commitments [53], as well to build non-malleable secret sharing schemes [51, 52].

Interactive Non-malleable Codes. In this work, we seek to generalize the notion of non-malleable codes. Regular non-malleable codes can be seen as dealing with “passive” data in that data is encoded and, upon being tampered, the data either remains completely intact or is essentially destroyed. Now consider the following scenario. Two parties, each holding their own inputs are interested in running a protocol to perform some task involving their inputs, such as computing a joint function on them. Now, say an adversary is able to somehow get access to their communication channel and modify messages being sent in the protocol. We would like to have a similar guarantee: either the original transcript of the underlying protocol remains fully recoverable from the encoded communication, or, very informally, the original transcript is essentially “destroyed” and any transcript possibly recovered is “unrelated” to the interaction that was originally supposed to take place. Hence, we are concerned with encoding “active communication” rather than passive data.

An interesting special case of the above scenario could also occur in terms of computation being performed on a piece of hardware. Suppose several different chips on an integrated circuit board are communicating via interconnecting wires to perform some computation on the secrets stored within them. An adversary could tamper in some way with the communication going through those wires. We would like to require that either the computation remains intact, or that the original computation is “destroyed” and whatever computation takes place is completely unrelated.

Of course, this basic idea raises a number of questions: What does it actually mean for a computation to be “unrelated” to another computation. How much power can the tampering adversary reasonably be allowed to have? Are we concerned with the secrecy of inputs in this setting?

In the setting of non-interactive non-malleable codes (INMCs), “unrelated” is easily defined as independent of the original message. However, in the interactive setting, things are a bit more complicated since there exists more than one input. Indeed, there are multiple notions of non-malleability that we can envision in the interactive setting. Below, we discuss possible notions of non-malleability.

Suppose, Alice and Bob are holding inputs x and y respectively and they jointly execute a protocol that results in a transcript \(\tau \) when not tampered with. Now suppose an adversary tampers with the messages sent over the communication channel and Alice and Bob recover transcripts \(\tau _1\) and \(\tau _2\), respectively. Then, our first notion of non-malleability requires that either \(\tau _1 = \tau \) (i.e., the original transcript remains intact) or, the distribution of \(\tau _1\) should be completely independent of the distribution of Bob’s input y.

We note that this notion still allows an adversary to simply “cut off” Bob from the communication and essentially execute the protocol honestly, but with a different input \(y'\). Clearly, this is not an attack on the notion described above, since \(y'\) and thereby the resulting transcript \(\tau _1\) is distributed completely independently of y. Nevertheless, one might want to prevent this as well, since the output after tampering still depends on one of the inputs.

To this end we consider a strengthening of the above basic definition where a party must receive either the correct transcript \(\tau \) or \(\bot \). This notion is achievable if the tampering function is not strong enough to cut off and impersonate one of the parties. It is easy to see that this notion is stronger than error detection: whether or not a party receives \(\bot \) must not depend on the inputs (xy), i.e. input dependent aborts must be prevented.Footnote 1

We do not explicitly model any secrecy requirements for the inputs (xy). We view non-malleability of codes in the interactive setting as a separate property and as such it should be studied independently. However, our definitions of encodings work by defining them using simulators relative to an underlying protocol. This formalization ensures that any security properties such as secrecy of inputs of the underlying protocol are preserved under the encoding.

Relationship to Non-malleable Codes. Consider the message transfer functionality where the transcript is simply the transferred message x. An interactive non-malleable coding protocol for this functionality gives the following guarantee: Bob either receives x from Alice or a value \(x'\) unrelated to x. It is easy to see that a one round interactive non-malleable coding protocol for this message transfer functionality is the same as a non-malleable code (encoding message x) for the same class of tampering functions. Indeed, the question that we consider in our work can be seen as generalizing non-malleable codes to more complex protocols potentially involving multiple rounds of interaction and both inputs x and y.

Our notion of INMCs is harder to achieve in one sense since more complex functionalities are involved, and yet, is easier to achieve in another sense since one is allowed multiple rounds of interaction and the order of messages introduces a natural limit on the power of an adversary, since she cannot tamper depending on “future” messages.

Similar to non-malleable codes, INMCs are impossible to achieve for arbitrary tampering functions. Very roughly, consider the first message of the protocol transcript which contains non-trivial information about the input x of Alice. The adversary at this point decodes and reconstructs this partial information about the input x, chooses a related input \(x'\) consistent with the partial information and simply executes the protocol honestly with Bob from this point onwards (cutting Alice off completely). A similar argument can also be made for the other direction. In fact, we even rule out INMCs for a more restricted class of threshold tampering functions using a very similar argument in Sect. 4. This suggests that, similar to non-malleable codes, we must focus on specific function classes for building INMCs.

One seemingly obvious approach of constructing INMCs even for multi-round protocols would be to directly use non-malleable codes. I.e., encode each message of an underlying protocol independently. The hope would be that this results in an INMC that allows at least independent tampering of each message under the same class of tampering functions as the original NMC. However, this naïve approach fails to produce INMCs for any meaningful class of functions.

As a counter example consider the following protocol: Alice has inputs (xy) and sends these to Bob in two separate messages. Bob receives the messages and outputs (xy). With the above approach, x and y would be encoded separately as . Let f be any tampering function, such that decoding . Such functions exist within the class of tampering functions against which the NMC is supposed to be secure, unless the NMC is in fact error correcting. A valid tampering function against the supposed INMC could then tamper with the first message using f and not tamper with the second message at all. This would result in Bob receiving \(z\ne x\) and y and outputting (zy). Clearly (zy) and (xy) are related. Therefore, the protocol is not non-malleable. This counter example works even when more complex constructions such as the NMC against streaming space-bounded tamperings by Ball et al. [11] are used.

An interesting additional hurdle that needs to be overcome when constructing INMCs when compared to non-malleable codes is inherent leakage. Because messages in the protocol are tampered successively, a tampering function can use conditional aborts to communicate some information to future tampering functions. Let \(\mathcal {F}\) be some class of tampering functions. Say a tampering function \(f\in \mathcal {F}\) looks at message \(m_i\) sent in round i of the protocol and aborts unless \(m_i\) is “good” in some sense. In future rounds, even if the definition of \(\mathcal {F}\) precludes f from having any knowledge of \(m_i\), the tampering function still learns that \(m_i\) must have been “good”, since the protocol would have otherwise aborted. We deal with this inherent leakage by bounding the leakage and using leakage resilient tools.

Relationship to Interactive Coding. Our notion can be seen as inspired by the notion of interactive coding (IC) [64,65,66]. Essentially, INMCs are to non-malleable codes what IC is to error correcting codes. In interactive coding, we require that the original transcript must remain preserved in face of an adversary tampering the message over the communication channel. INMCs only require something weaker, namely, that either the transcript must remain preserved or that the original transcript be destroyed and any possibly reconstructed transcript be independent of the inputs to the protocol.

An obvious advantage of such a weaker notion is that one could hope to achieve it for a larger class of tampering functions compared to ICs. Indeed, ICs are achievable only for threshold adversaries, namely, an adversary which only tampers with a fixed threshold number of bits of the communication (typically a constant fraction of the entire communication). All guarantees are lost in the case an adversary tampers with more bits than allowed by this threshold. However, as we discuss later, INMCs are achievable for adversaries which could potentially tamper with every bit going over the communication channel. For the specific case of threshold tampering functions, however, we are able to show that lower bounds on the fraction of the communication that can be tampered with transfer from ICss to INMCs.

1.1 Our Results and Techniques

In this work we initiate the study of INMCs. We formalize the tampering model and put forward a notion of securityfor INMC. Since achieving INMC for general adversaries is impossible, we turn our attention to specific classes of tampering functions.

We show both positive and negative results. We first establish a negative result for threshold tampering functions by showing that INMCs for threshold tampering imply ICs for the same class of tampering functions, thereby transferring lower bounds from interactive coding to INMCs. We then provide several positive results for specific classes of tampering functions by constructing general (unconditional) compilers \(\varSigma \) that can encode an arbitrary underlying protocol \(\varPi \) in a non-malleable fashion (for the appropriate class of tampering functions).

Threshold Tampering Functions. A threshold tampering function is not restricted in its knowledge of the protocol transcript or in its computational power, but can only modify a fixed fraction (say 1/4) of the bits in the transcript. For this class, lower bounds are known for the case of interactive coding. Specifically Braverman and Rao [18] showed that non-adaptive IC can tolerate tampering with at most 1/4 of the transcript, and Ghaffari, Haeupler, and Sudan [50] showed that an adaptive IC can tolerate tampering with at most 2/7 of the transcript. When looking for stronger classes of tampering functions, the first natural question to ask is therefore whether the weaker notion of INMCs might allow us to circumvent these lower bounds. However, it turns out that this is not the case.

We show that any INMC for a class of threshold tampering functions that allows only a negligible non-malleability error in fact implies an IC for the same class of functions in the common reference string (CRS) model and with parties running in super-polynomial time. While the resulting IC is not efficient and requires a CRS, it turns out that the lower bounds of Braverman and Rao [18] and Ghaffari, Haeupler, and Sudan [50] also apply in this setting, therefore ruling out the existence of such INMCs. This result can be found in Sect. 4. In fact, this impossibility even holds if we apply the notion of INMC to a weaker notion of encodings which does not imply knowledge-preservation. Recall that we are using a strong notion of protocol encoding that ensures that security guarantees of the underlying protocol are preserved. On the flip side, positive results for IC only translate to the positive result for this weaker notion of INMC. Getting meaningful positive result for our stronger INMC definition is an interesting open problem.

Interestingly (and fortunately), the above connection only holds for threshold tampering functions. Indeed, for the remaining families of tampering functions we consider in this paper, IC is naturally impossible and yet we are able to get positive results for INMC.

Bounded State Tampering Functions. For our first positive result we consider the class of tampering functions which can keep a bounded state. In more detail, the adversary is assumed to be arbitrarily computationally powerful, and we do not limit the size of the memory available for computing the tampering function. Instead, a limit is only placed on the size of the state that can be carried over from tampering one message to tampering with the next. That is, an adversary in this model can iteratively tamper with each message depending on some function of all previous messages, but the size of this information is limited to some fixed number of bits s. It is easy to see that achieving the notion of error correction is impossible for such a tampering function family since an adversary even with no storage can change every protocol message to an all zero string.

Adversaries with limited storage capabilities constitute a very natural model and similar adversaries have been considered before in many settings, starting with the work by Cachin and Maurer [19] on encryption and key exchange secure against computationally unbounded adversaries. In a seemingly related recent work, Faust et al. [39] studied non-malleable codes against space-bounded tampering. However in their setting, a limit is placed on the size of memory available to compute the tampering function (indeed it is meaningless to consider the state carried over from one message to the next in the non-interactive setting).

We give an unconditional positive result for this family of tampering functions: Any underlying protocol \(\varPi \) can be simulated by a protocol \(\varSigma \) which is an INMC against bounded state tampering functions. A naïve way of trying to construct such a compiler would be to try and encode each message of \(\varPi \) using a suitable (non-interactive) non-malleable code. However, this is doomed to fail. For a single message setting, our tampering adversary simply translates to an unbounded general adversary for which designing non-malleable codes is known to be impossible. Hence, getting a positive result inherently relies on making use of additional interaction.

The key technical tool we rely on to construct our compiler is the notion of seedless 2-non-malleable extractors introduced by Cheraghchi and Guruswami [25] as a natural generalization of seeded non-malleable extractors [34]. However, finding an explicit construction of such extractors was left as an open problem by Cheraghchi and Guruswami even for the case when both the sources are uniform. Such a construction was first given by Chattopadhyay, Goyal, and Li [22]. The construction in [22] requires one of the sources to be (almost) uniform, while the other source could have smaller min-entropy. We crucially rely upon a construction of seedless 2-non-malleable extractors where at least one of the sources could have small min-entropy. Our construction can be found in Sect. 5.

Split-State Tampering Functions. The second class we consider are split-state tampering functions where, very roughly, the transcript is divided into two disjoint sets of messages and each set is tampered independently. In more detail, the adversary can decide for each message of the protocol to be either in the first set or the second one. To compute an outgoing message, the tampering function takes all messages (so far) in any one set of its choice as input.

We are able to achieve interactive non-malleability for a strong class of these tampering functions, namely c-unbalanced split-state tampering functions. A c-unbalanced split-state tampering functions can split the transcript into two arbitrary sets, as long as each set contains at least a 1/c fraction of the messages (where c can be any polynomial parameter).

This notion is inspired by a corresponding notion in the non-interactive setting. Split-state tampering functions for non-interactive NMC are one of the most interesting and well studied classes of tampering functions in that setting. It was already introduced in the seminal work of Dziembowski, Pietrzak, and Wichs [36] and has since then been studied in a large number of works [2, 3, 24,25,26, 35, 60].

We give an unconditional positive result for this family of tampering functions: Any underlying protocol \(\varPi \) can be simulated by a protocol \(\varSigma \) which is an INMC against split-state tampering functions. The key technical tool we rely on in this case is a new notion of tamper evident n-out-of-n secret sharing we introduce in this work. Such a secret sharing scheme essentially guarantees that any detectable tampering with the shares can be detected when reconstructing the secret. Our construction can be found in Sect. 6.

Sliding Window Tampering Function. In the sliding window model, the tampering function “remembers” only the last w messages. In other words, the tampering function gets as input the last w (untampered) messages of the protocol transcript to compute the tampered message. The sliding window model is very natural and has been considered in a variety of contexts, such as error correcting codes [48] including convolution codes, streaming algorithms, and even in data transmission protocols such as TCP [55].

Our results in fact extend to a stronger model in which we can handle what we call fragmented sliding window tampering functions. Functions in this class are allowed to remember any w of the previous protocol messages (rather than just the w most recent ones). Thus in some sense, the window of message being stored by the tampering function is not continuous but “fragmented”.

Comparing this class of functions with bounded-state tampering functions, we can see, that here the tampering function can no longer retain some information about all previous messages, but instead all of the information about some previous messages. Because there is no hard bound on the size of the state, but instead on the number of messages which potentially differ in length, this means that the two models are incomparable.

Comparing this class with c-unbalanced split-state tampering functions, we notice that here the maximum size of the window is fixed and does not scale with the number of messages in the protocol. On the other hand, however, the different sets of messages which the tampering can depend on are not required to be disjoint. E.g., the tampering of each single protocol messages could depend on the first message of the protocol, something that would not be possible in the case of split-state functions.

While this model has important conceptual differences to the our split state model, the techniques used to achieve both of them are almost identical. In particular, essentially the same protocol as in the case of c-unbalanced split-state tampering functions also works in this case, however the proof of security differs slightly. Our construction can be found in Sect. 7.

A Common Approach. A common theme in all of our constructions is the following: We only attempt to transfer a single message in a non-malleable way and then use this message to secure the rest of the protocol. In more detail, Alice and Bob essentially exchange a random key k possibly using multiple rounds of interaction such that the following holds. The two parties either agree on the correct key k or receive completely independent keys \(k_1\) and \(k_2\), (or, \(\bot \) which leads them to abort the protocol). Subsequently, all future protocol messages will be encrypted with a one-time pad and authenticated with a one-time message authentication code using k (assuming k is long enough). This allows us to achieve non-malleability as long as we can ensure that the tampering function is not capable of predicting the exchanged key in any round. The reason is as follows: as long as the key remains (almost) uniformly distributed from the point of view of the tampering function f, the computation of f cannot depend on the encrypted messages, and any modification of the encrypted messages would be caught by the MAC and cause an abort independently of the inputs. The exact way in which we are able to prevent f from gaining any knowledge of k depends strongly upon the class of tampering functions. This leads to very different constructions of the key-exchange phase using different technical tools.

Given the common approach described above, it may be tempting to abstract a non-malleable key-exchange protocol as a new building block. Intuitively, this would allow us to easily extend our construction to new classes of tampering functions simply by designing a new key exchange protocol for said class. However, (maybe counter-intuitively) it turns out that it is very unclear how this abstraction would work. The class of tampering functions \(\mathcal {F}_1\) allowed for the full INMC differs a lot from the class \(\mathcal {F}_2\) the key-exchange would need to tolerate. Even worse, it is not clear how \(\mathcal {F}_2\) can be generically identified from \(\mathcal {F}_1\). Or, the other way round, given a key-exchange that is non-malleable relative to a class \(\mathcal {F}_2\), it is not clear against which class of functions the full protocol would then be non-malleable. In fact, our constructions for split-state and for sliding-window show that \(\mathcal {F}_1\) can be the result of a complex interplay between the properties of \(\mathcal {F}_2\) and the round complexities of both the key-exchange and the original protocol itself.

1.2 Related Works

Non-malleable Codes. To the best of our knowledge, there has been no prior work studying non-malleable codes in the interactive setting. In the non-interactive setting, however, there exists a large body of works studying non-malleable codes for various classes of tampering functions as well as various variants of non-malleable codes. We provide a brief, but non-exhaustive, survey here.

The most well-studied class in the non-interactive setting are split-state tampering functions [2,3,4, 24,25,26, 35, 57,58,60]. But other classes of tampering functions have been studied such as tampering circuits of limited size or depth [8, 10, 11, 23, 42], tampering functions computable by decision trees [12], memory-bounded tampering functions [39] where the size of the available memory is a priori bounded, bounded polynomial time tampering functions [9] and non-malleable codes against streaming tampering functions [11]. Non-malleable codes were also generalized in several ways, such as continuously non-malleable codes in [4, 29,30,31, 38, 40, 61] and locally decodable and updatable non-malleable codes [21, 32, 33].

While most work on non-malleable codes deals with the information theoretic setting, there has also been recent work [1, 5, 6, 11] in the computational setting. In the computational setting, the work of Chandran et al. [20] on block-wise non-malleable codes may seem as most closely related to our setting; however, there are important differences. Firstly, Chandran et al. do not consider the setting where both parties may have inputs. Instead their notion is similar to the original notion of non-malleable codes where a single fixed message is encoded. Indeed, the entire communication is from the sender to the receiver (rather than running an interactive bi-directional protocol between two parties). Further, their definitions are weaker, as they inherently allow selective aborts whereas our definitions do not suffer from this problem.

Interactive Coding. Starting with the seminal work of Schulmann [64,65,66], a large body of works have studied IC schemes for two-party protocols (see, e.g., [15, 17, 18, 37, 43,44,45, 47, 49, 50, 54]). Most recently, several works have also studied IC for multiparty protocols [7, 16, 46, 56, 62] in various models.

Secure Computation without Authentication. We also mention a related work of Barak et al. [13] on secure computation in a setting where the communication channel among the parties may be completely controlled by a polynomial-time adversary. The setting in their work is therefore inherently computational and their techniques rely on using bounded concurrent secure multi-party computation and are unrelated to ours. However, our setting can indeed be seen as being inspired by theirs.

2 Preliminaries

In this section we introduce our notation and recall some definitions needed for our constructions and proofs.

Notation. we denote by the security parameter. For a distribution D, we denote by the process of sampling a random variable x according to D. By \(U_\ell \) we denote the uniform distribution over . For a set S, denotes sampling from S uniformly at random. For a pair \(D_1,D_2\) of distributions over a domain X, we denote their statistical distance by

If \(\mathsf {SD}(D_1,D_2)\le \epsilon \), we say that \(D_1,D_2\) are \(\epsilon \)-close. We denote by \(\mathsf {replace}\) the function that behaves as follows: If the second input is a singular value s then it replaces any occurrence of \(\mathsf {same}\) in the first input with s. If the second input is a tuple \((s_1,\dots ,s_n)\) then it replaces any occurrence of \(\mathsf {same}_i\) in the first input with \(s_i\). We will write \(\mathsf {replace}(D,x)\) for some distribution D to denote the distribution defined by sampling and applying \(\mathsf {replace}(d,x)\).

Extractors. In our constructions we make use of two types of extractors. We first recall the standard notion of strong two-source extractors. Two source extractors were first implicitly introduced by Chor and Goldreich [27]. An argument due to Barak [63] shows that any extractor with a small enough error \(\epsilon \) is also a strong extractor. This means we can instantiate strong extractors for example with the two-source extractor due to Bourgain [14].

Definition 1

(Strong 2-source Extractor). A function is a strong 2-source extractor for sources with min-entropy k and with error \(\epsilon \) if it satisfies the following property: If X and Y are independent sources of length n with min-entropy k then

Seedless 2-non-malleable extractors were first defined by Cheraghchi and Guruswami [25] but their construction was left as an open problem. The definition was finally instantiated by Chattopadhyay et al. [22]. Such an extractor allows to non-malleably extract an almost uniform random string from two sources with a given min-entropy that are being tampered by a split-state tampering function.

We closely follow the definition from [22].

Definition 2

(2-non-malleable Extractor). A function is a 2-non-malleable extractor for sources with min-entropy k and with error \(\epsilon \) if it satisfies the following property: If X and Y are independent sources of length n with min-entropy k and \(f=(f_0,f_1)\) is an arbitrary 2-split-state tampering function, then there exists a distribution \(D_f\) over which is independent of sources X and Y, such that

where both \(U_m\) refer to the same uniform m-bit string.

Tamper Evident Secret Sharing. We will define a new notion of tamper evident secret sharing in the following. Such tamper evident secret sharing schemes behave the same as regular secret sharing, except that we are guaranteed that the reconstruction algorithm is able to detect any detectable tampering of the shares that would lead to a different reconstructed message and will reject them if they have been tampered with.

Intuitively a tampering is detectable if it meets two criteria: First it must leave at least one of the shares unchanged, since otherwise the shares could simply be replaced by a completely independent sharing, which is trivially undetectable. Second, each tampered share must be independent of at least one of the untampered shares, except for some bounded leakage. This is formally defined in the following.

Definition 3

(n-out-of-n Secret Sharing). A pair of algorithms \((\mathsf {Share},\mathsf {Reconstruct})\) is a perfectly private, n-out-of-n secret sharing scheme with message space and share length \(\ell '\), if all of the following hold.

  1. 1.

    Correctness: Given all shares, the secret can be reconstructed. I.e., for any secret , it holds that

  2. 2.

    Statistical Privacy: Given any strict subset of shares, the secret remains perfectly hidden. I.e., for any two secrets and any set of indices \(\mathcal {I}\subsetneq \{1,\dots ,n\}\) it holds that for any (computationally unbounded) distinguisher

Definition 4

(Detectable Tampering for Secret Sharing). Let \((\mathsf {Share},\mathsf {Reconstruct})\) be an n-out-of-n Secret Sharing scheme, let be a message. A tampering function f for a secret sharing \((s_1,\dots ,s_n)\) of m with \(\nu \) bits of leakage is described by functions \((f_1,\dots ,f_n)\), sets of indices \(\mathcal {I}^{\mathsf {in}}_1,\dots ,\mathcal {I}^{\mathsf {in}}_n\) and leakage functions \((\mathsf {leak}_1,\dots ,\mathsf {leak}_n)\) such that and

$$\begin{aligned} f(s_1,\dots ,s_n) = \Bigl (\!f_1\bigl (\!(s_j)_{j\in \mathcal {I}^{\mathsf {in}}_1},\mathsf {leak}_1\!\bigl (\!(s_j)_{j\not \in \mathcal {I}^{\mathsf {in}}_1}\bigr )\bigr ),\dots ,f_n\bigl (\!(s_j)_{j\in \mathcal {I}^{\mathsf {in}}_n},\mathsf {leak}_n\!\bigl (\!(s_j)_{j\not \in \mathcal {I}^{\mathsf {in}}_n}\bigr )\bigr )\!\Bigr ). \end{aligned}$$

For any fixed secret sharing \(\vec {s}\leftarrow \mathsf {Share}(m)\) let \(\mathcal {M}\) be the set of indices i, such that \(s'_i \ne s_i\) for \((s'_1,\dots ,s'_n) := f(s_1,\dots ,s_n)\). A tampering function f is called detectable for \(\vec {s}\) if it holds that for all \(i \in \mathcal {M}\) we have \(\mathcal {M}\cup \mathcal {I}^{\mathsf {in}}_i \subsetneq \{1,\dots ,n\}\). We define the predicate \(\mathsf {Dtct}(\vec {s},f)\) to be 1 iff f is detectable for \(\vec {s}\).

This now allows us to formally define tamper evident n-out-of-n secret sharing.

Definition 5

(Tamper Evident n-out-of-n Secret Sharing). A perfectly private secret sharing scheme \((\mathsf {Share},\mathsf {Reconstruct})\) is said to be -tamper evident for up to \(\nu \) bits of leakage if the reconstruction algorithm will reject shares with overwhelming probability if they have been tampered detectably with up to \(\nu \) bits of leakage. I.e., for all and all detectable tampering functions f with \(\nu \) bits of leakage it holds that

Please refer to the full version of this paper for an instantiation of this notion from XOR-based secret sharing and an information theoretic message authentication code. The concept of tamper evident secret sharing may seem superficially similar to non-malleable secret sharing [51] but the two concepts are in fact incomparable. The guarantee of tamper evident secret sharing is very strong, requiring that the secret cannot be changed except to \(\bot \), but only holds against a weak class of tamperings that must leave at least one share unchanged. In contrast, NM-secret sharing provides a weaker guarantee, namely that a tampered secret must be unrelated, but against a stronger class of tampering functions.

3 Definitions

In this section we first formally define interactive protocols and encodings of interactive protocols. We then introduce our notions of non-malleability for encodings of interactive protocols.

3.1 Interactive Protocols

We consider protocols \(\varPi \) between a pair of parties \(P_0,P_1\) (also called Alice and Bob, respectively, for convenience) for evaluating functionalities \(g=(g_0,g_1)\) of the form \(g_b : X \times Y \rightarrow Z\), where XYZ are finite domains. Alice holds an input \(x\in X\), and Bob holds \(y\in Y\), and the goal of the protocol is to interactively evaluate the functionality, such that at the end of the protocol Alice outputs \(g_0(x,y)\) and Bob outputs \(g_1(x,y)\). The interactive protocol consists of r rounds, in each of which a single message is sent. Without loss of generality we assume that the parties in \(\varPi \) alternate in sending their messages and that Alice always sends the first message. Formally, an interactive protocol \(\varPi \) between two parties is described by a pair of “next message” functions \(\pi _0,\pi _1\) (or \(\pi _A,\pi _B\)) and a pair of output functions \(\mathsf {out}_A\) and \(\mathsf {out}_B\). The next message function \(\pi _A\) (\(\pi _B\)) takes the input x (y), round number i, and message sequence sent and received by Alice (Bob) so far \({\text {trans}}_A\) (\({\text {trans}}_B\)) and outputs the next message to be sent by Alice (Bob). For simplicity of notation, we assume \(\pi _A,\pi _B\) always output binary strings. Furthermore, we assume that each message output by \(\pi _A,\pi _B\) is always of the same length \(\ell \). The output function \(\mathsf {out}_A\) (\(\mathsf {out}_B\)) takes as input x (y) and the final message sequence sent and received by Alice (Bob) \({\text {trans}}_A\) (\({\text {trans}}_B\)) and outputs Alice’s (Bob’s) protocol output. We denote by \(\mathsf {Trans}(x,y)\) the function mapping inputs xy to the transcript of an honest execution of \(\varPi \) between A(x) and B(y). Note that in this setting we do not explicitly consider probabilistic protocols. However, this is not a limitation, since any probabilistic protocol can be written as a deterministic protocol with additional random tapes given as input to the two parties A and B.

This now allows us to define both correctness of a protocol as well as encodings of interactive protocols.

Definition 6

(Correctness). A protocol \(\varPi \), is said to \(\epsilon \)-correctly evaluate a functionality \((g_0,g_1)\) if it holds that without tampering the output of each party \(\mathsf {out}_b(x_b,{\text {trans}}_b)=g_b(x_0,x_1)\) with probability \(\ge 1-\epsilon \).

Definition 7

(Encoding of an Interactive Protocol). An encoding \(\varPi '\) of a protocol \(\varPi =(A,B)\) is defined by two simulators \(S_0,S_1\) with black-box access to stateful oracles encapsulating the next message functions of A and B respectively. The protocol \(\varPi '= (S_0^A,S_1^B)\) is an \(\epsilon \)-correct encoding of protocol \(\varPi =(A,B)\) if for all inputs xy, \(\varPi '= (S_0^{A(x)},S_1^{B(y)})\) \(\epsilon \)-correctly evaluates the functionality \((\mathsf {Trans}(x,y),\mathsf {Trans}(x,y))\).

We note that, given a correct encoding \(\varPi '\) of protocol \(\varPi \) evaluating functionality \((g_0,g_1)\) it is easy to also evaluate \((g_0,g_1)\). To do so, simply run \(\varPi '\) resulting in output \(\tau =\mathsf {Trans}(x,y)\) and then evaluate \(\mathsf {out}_A(x,\tau )\) and \(\mathsf {out}_B(y,\tau )\) respectively. Definition 7 slightly differs from the interactive coding literature [15, 65]. In most of the IC literature, encodings are not defined relative to a stateful oracle, but instead relative to a next-message function oracle. This difference is significant, because, as observed by Chung et al. [28] in the context of IC, an encoding as defined in the IC literature can leak the parties’ inputs under adversarial errors. I.e., security guarantees of \(\varPi \) are not necessarily preserved under \(\varPi '\). In contrast, under Definition 7, any security guarantee of \(\varPi \) is preserved under \(\varPi '\). This follows from the fact that the encoding is defined using a pair of simulators with only black-box access to A and B without the ability to know the inputs or rewind the participants of the underlying protocol. Therefore, access to this oracle is equivalent to communicating with an actual instance of A (or B respectively). Any attacker against \(\varPi \) – whether a man in the middle attacker or an attacker acting as either A or B – always has at least black-box access to the two parties. This means she can easily simulate \(\varPi '\) simply by running \(S_0,S_1\) herself. Thus any attack against some arbitrary security property of \(\varPi '\) directly corresponds to an attack against the same property of \(\varPi \), implying that security guarantees of \(\varPi \) are preserved under \(\varPi '\).

Protocols Under Tampering. It may appear tempting to try and define non-malleability in the interactive setting in the same manner as regular non-malleability by, e.g, considering tampering on the full transcript of the protocol. Split-state tampering for an r-round protocol would then for example mean that an adversary could separately tamper on the first n/2 and the second n/2 of the protocol messages. However, at least in the synchronous tampering setting we’re focusing on such a definition would be very problematic. It would allow an adversary to tamper with the first message depending on future messages, which themselves could depend on the first message, therefore potentially causing an infinite causal loop, even if we allow such “time-travelling” adversaries. So instead we make the reasonable restriction that tampering on each message must happen separately and can only depend on past messages.

We formally describe the process of executing a protocol under tampering with a tampering function \(f\in \mathcal {F}\), from some family of tampering functions \(\mathcal {F}\). First, empty sequences of sent and received messages \({\text {trans}}_A = {\text {trans}}_B = \emptyset \) are initialized. Lets assume that it is Alice’s turn to send a message in round i. The next message function \(\pi _A\) is evaluated to compute the next message \(m_i := \pi _A(x,i,{\text {trans}}_A)\). Then \(m_i\) is added to Alice’s transcript . Next the tampering function is applied to compute the tampered message \(m'_i := f(m_1,\dots ,m_i)\) and \(m'_i\) is added to . If it is Bob’s turn the execution proceeds identically with reversed roles. Finally the output functions of Alice and Bob are evaluated respectively as \(\mathsf {out}_A(x,{\text {trans}}_A),\mathsf {out}_B(y,{\text {trans}}_B)\). Note that due to tampering it does not necessarily hold for the sequences of messages \({\text {trans}}_A=m^A_1,\ldots ,m^A_r\) and \({\text {trans}}_B=m^B_1,\ldots ,m^B_r\) that \(m^A_i=m^B_i\).

We note that this only models “synchronous” tampering, meaning that the adversary cannot drop or delay messages or desynchronize the two parties by first running the protocol with one party and then the other. This choice is partially inspired by the literature on interactive coding and helps keep our definitions simple. However, cryptographic primitives such as non-malleable commitments have been studied in the setting where there is a non-synchronizing man-in-the-middle adversary. We remark that even in these settings, getting a construction for the synchronous case is often the hardest (for example, there exist general compilers for non-malleable commitments to go from synchronous security to non-synchronous security [67]). We leave the study of more general tampering models for INMCs as an interesting topic for future work.

3.2 Interactive Non-malleable Codes

In the non-interactive setting, non-malleability intuitively means that after tampering the result should be either the original input, or the original input should be completely destroyed, i.e., the output should be independent of the original input. In the interactive setting, there are two different outputs and two different outputs and the question is which output (or pair of outputs) should be independent from which input(s). This leads to an entire space of possible notions, however we settle for the strongest possible – and arguably most natural – notion: In this notion we simply call protocol-non-malleability, we require that the output of Alice and Bob respectively are either the correct transcript \(\mathsf {Trans}(x,y)\) or \(\bot \) and that the product distribution over the two is (almost) completely independent of the two parties’ respective inputs x and y. It is very important that the decisions whether to output \(\bot \) or not must be made independently of x and y, since otherwise an adversary could potentially force selective aborts and thus learn at least one bit of information about the combined input. This means that protocol-non-malleability not only implies error detection, but is even stronger, since in error detection the output distribution over the real output and \(\bot \) is not required to be independent of the inputs.

We note, that weaker definitions may still be meaningful and are not necessarily trivial. In Sect. 4 we will show that even for a much weaker notion of protocol-non-malleability strong lower bounds exist in the case of threshold tampering functions. We formally define protocol-non-malleability in the following.

Definition 8

(Protocol Non-malleability). An encoding \(\varPi ' = (S_0^A,S_1^B)\), of protocol \(\varPi =(A,B)\) is \(\epsilon \)-protocol-non-malleable for a family \(\mathcal {F}\) of tampering functions if the following holds: For each tampering function \(f\in \mathcal {F}\) there exists a distribution \(D_f\) over \(\{\bot ,\mathsf {same}\}^2\) such that for all xy, the product distribution of \(S_0^{A(x)}\)’s and \(S_1^{B(y)}\)’s outputs is \(\epsilon \)-close to the distribution \(\mathsf {replace}(D_f,\mathsf {Trans}(x,y))\).

4 Lower Bounds for Threshold Tampering Functions

Threshold tampering functions are classes of tampering functions where the function is only limited in the fraction of the messages they can tamper with. For these classes of tampering functions, lower bounds are known in the case of interactive codes. Specifically Braverman and Rao [18] showed that non-adaptive interactive codes can tolerate tampering with at most 1/4 of the transcript, and Ghaffari, Haeupler, and Sudan [50] showed that an adaptive interactive code can tolerate tampering with at most 2/7 of the transcript. A natural question to ask is whether one can bypass these lower bounds in the case of non-malleable interactive codes. Unfortunately, we show in the following that the known lower bounds for interactive coding translate to identical lower bounds for -non-malleable interactive coding. In fact, we show that the lower bounds even apply to a much weaker form of protocol-non-malleability, where each party’s output by itself (rather than the product distribution of both outputs) only needs to be independent of the other party’s input.

The basic idea of this lower bound is essentially to show that a non-malleable interactive code is also a regular interactive code. In any encoded protocol, if the output of one party in the underlying protocol depends non-trivially on the other party’s input (which should always be the case since otherwise the communication is completely unnecessary) then information theoretically, the transcript must leak this information. If the encoding was not error correcting, then that means that there is a way for a threshold tampering function to cause at least one of the parties to abort. Since the tampering function is unlimited in it’s knowledge of the transcript, it can extract the information about one of the parties’ input and depending on the function of the input thus revealed either cause the abort or not. This would be an input dependent abort which clearly means that the encoding is not non-malleable.

However, this straightforward approach does not work. The reason is, that the information about the input might only be revealed in say the ith message of protocol, while the threshold tampering function requires tampering with earlier messages to cause the abort. But there is a way around this problem. If we can cleanly define which message in the protocol is the first message that reveals information about the input, then we can construct another INMC in the CRS model, where all previous messages are pushed into the CRS. This is possible since those messages are “almost” independent of the actual input and it is possible for the INMC to (inefficiently) sample a consistent internal state, once it gets the input. This means that now the information about the input is revealed in the very first protocol message and thus the approach described above works.

For the lower bound to translate to INMC, we therefore need that the lower bounds for IC apply also to inefficient interactive encodings in the CRS model. Luckily, this follows easily from the structure of the results in [18] and [50]. We discuss the application of the bounds to the CRS model in a bit more detail in the full version.

As mentioned above, we can in fact show this lower bound for a much weaker form of non-malleability we formally define in the following.

Definition 9

(Weak Protocol Non-malleability). An encoding \(\varPi ' = (S_0^A,S_1^B)\), of protocol \(\varPi =(A,B)\) is \(\epsilon \)-weakly-protocol-non-malleable for a family \(\mathcal {F}\) of tampering functions if the following holds: For each tampering function \(f\in \mathcal {F}\) and for each x (resp. y) there exists a distribution \(D^{A}_{f,x}\) (resp. \(D^{B}_{f,y}\)) over \(\{\bot ,\mathsf {same}\} \cup \{0,1\}^n\) such that for all y (resp. x), the output distribution of \(S_0^{A(x)}\) (resp. \(S_1^{B(y)}\)) is \(\epsilon \)-close to the distribution \(\mathsf {replace}(D^{A}_{f,x},\mathsf {Trans}(x,y))\) (resp. \(\mathsf {replace}(D^{B}_{f,y},\mathsf {Trans}(x,y))\)).

It is easy to see, that this notion is strictly weaker than protocol-non-malleability as defined in Definition 8. If a distribution \(D_f\) as required by Definition 8 exists, then \(D^{A}_{f,x}\) and \(D^{B}_{f,y}\) can easily be sampled by sampling from \(D_f\) and throwing away half of the output. On the other hand, since \(D^{A}_{f,x}\) can depend on x, it does not help in sampling a distribution \(D_f\) that is required to be (almost) independent of x.

Theorem 1

Let \(\varPi =(A,B)\) be an r-round protocol with inputs such that there exists at least one triple of inputs \((x^*_1,x^*_2,y^*)\) or \((x^*,y^*_1,y^*_2)\) such that \(\mathsf {Trans}(x^*_1,y^*) \ne \mathsf {Trans}(x^*_2,y^*)\) or \(\mathsf {Trans}(x^*,y^*_1) \ne \mathsf {Trans}(x^*,y^*_2)\) respectively. Let \(\varPi '\) be an \(\delta (\ell )\)-correct, -weakly-protocol-nonmalleable INMC for protocol \(\varPi \) for a family \(\mathcal {F}\) of threshold tampering functions. Then there also exists an (computationally unbounded) interactive code \(\overline{\varPi }\) in the CRS model for the same protocol \(\varPi \) and the same family of threshold tampering functions \(\mathcal {F}\).

Due to space constraints, the proof of Theorem 1 is deferred to the full version of this paper.

Applying the Lower Bound to Other Tampering Functions. It is natural to ask whether the lower bound stated above also applies to other classes of functions. This would be unfortunate, since it would trivially rule out INMCs for most classes of tampering functions. However, fortunately, this is not the case.

In the proof of Theorem 1, we explicitly use that the tampering function at any point has complete knowledge of the full transcript so far and is completely unbounded in the resources necessary to compute the tampering. It then follows that if the transcript information theoretically reveals anything about the inputs, then the tampering function can extract this information and cause a conditional abort, thus allowing for the proof to go through. In each of the classes of tampering functions we consider in the following sections, however, the tampering functions are restricted in one way or another in its view of the full transcript. This means that the proof no longer applies, since even when the full transcript contains information about the inputs, the tampering function is no longer capable of extracting it.

In fact, we explicitly exploit this observation in each of our protocols. Our protocols consist of an initial input-independent phase, where key material is established. This phase is constructed in such a way that in any future round, the established key material will be almost uniform from the point of view of the tampering function. Using information theoretically secure encryption and authentication we can then execute the underlying protocol in such a way that the transcript of that execution is remains independent of the input from the point of view of the tampering function.

5 Bounded State Tampering

The first class of tampering functions we consider are tampering functions with bounded state. This is a very natural model in which adversaries are assumed to be arbitrarily powerful, but there exists an a priori upper bound on the size of the state they can hold. Similar adversaries have been considered before in many settings, starting with the work by Cachin and Maurer [19] on encryption and key exchange secure against computationally unbounded adversaries. Recently, in related work, Faust et al. [39] studied non-malleable codes against space-bounded tampering. However, the notion of bounded state tampering we introduce in this section is stronger than one would expect from naïvely extending the notion to interactive non-malleable codes. In particular we do not limit the size of the memory available for computing the tampering function. Instead, a limit is only placed on the size of the state that can be carried over from tampering one message to tampering with the next. I.e., the idea is, that an adversary in this model can iteratively tamper with each message depending on some function of all previous messages, but the size of this information is limited to some fixed number of bits s. We formally define this in terms of a tampering function in the following.

Definition 10

(Bounded State Tampering Functions). Functions of the class of s-bounded state tampering functions \(\mathcal {F}^{s}_{\text {bounded}}\) for an r-round interactive protocols are defined by an r-tuple of pairs of functions \(((g_1,h_1),\dots ,(g_r,h_r))\) where the range of the functions \(h_i\) is \(\{0,1\}^s\). Let \(m_1,\dots ,m_i\) be the messages sent by the participants of the protocol in a partial execution. The tampering function for the ith message is then defined as

$$ f_i(m_1,\dots ,m_i) := g_i\bigl (m_i,h_{i-1}\bigl (m_{i-1},h_{i-2}(m_{i-2},\dots )\bigr )\bigr ). $$

5.1 Interactive Non-malleable Code for Bounded State Tampering

We devise a generic protocol-non-malleable encoding \(\varPi \) for bounded state tampering for any two-party protocol \(\varPi _0\). The basic idea is to first run a key exchange phase in which Alice and Bob exchange enough key material that they can execute the original protocol encrypted under one-time pad and authenticated with information theoretically secure MACs. The main challenge is to craft the key-exchange phase in such a way, that the adversary’s limitations, i.e., having bounded state, preclude her from both, learning any meaningful information about the exchanged key material, as well as influencing the key material in a meaningful way. For bounded state tampering functions, we achieve this using 2-non-malleable extractors. The idea behind this is that each party chooses two random sources that are significantly longer than the size of the bounded state and sends it to the other party. Both parties then apply a 2-non-malleable extractor to each pair of sources and thus extract a key they can use to secure the following communication using information theoretic authenticated encryption. A tampering function with bounded state will not be able to “remember” enough information about the two sources to predict the exchanged key with a any significant probability and thus will not be able to change the authenticated ciphertexts without being caught. Formally this is stated in the following theorem.

Theorem 2

Let \(\varPi _0\) denote a correct, r-round protocol, with length-\(\ell \) messages. We assume wlog that Alice sends both the first and last message in \(\varPi _0\) Let be any bound as defined in Definition 10. Let be the target security parameter, then we set . Let be a -secure information theoretic message authentication code. Let be a 2-non-malleable extractor for sources with min-entropy and with error \(\epsilon \). Then there exists a \(r+7\)-round encoding \(\varPi \) of \(\varPi _0\) that is -protocol-non-malleable against \(\mathcal {F}^{s}_{\text {bounded}}\).

Note that the required extractor can be instantiated using the construction of Chattopadhyay et al. [22], while the MAC can be instantiated with a family of pair-wise independent hash functions.

figure a

Proof of Theorem 2. The protocol \(\varPi \) is specified in Algorithm 1. We need to argue that the protocol is correct and protocol-non-malleable.

Correctness: The correctness of \(\varPi \) follows from the fact that the extractor is deterministic and the message authentication code is correct. Since the extractor is deterministic, both parties will extract the same string k. The correctness of the message authentication code then implies neither party will ever abort during the protocol. Further, since the one-time pad is correct it follows that messages of the underlying protocol will always be decrypted correctly and thus both parties are faithfully executing an honest instance of \(\varPi _0\). Thus at the end of the protocol the collected transcripts correspond to an honest execution of \(\varPi _0\).

Protocol-Non-malleability: Let f be an s-bounded state tampering function described by \(((g_1,h_1),\dots ,(g_r,h_r))\). To prove that the coding scheme is protocol-non-malleable, we need to prove that a distribution \(D_{f}\) as in Definition 8 exist.

The Distribution \(D_{f}\). When sampling from \(D_{f}\) we need to deal with the problem that in addition to the s bits of state f can keep by design, it can learn additional information by making use of conditional aborts. I.e., in round i the function \(g_i\) can force an abort in the protocol unless the message sent in round i is “good”. In any future round \(j>i\), even if it’s s bit state does not retain any information about \(m_i\) the function \(g_j\) therefore “remembers” that \(m_i\) must have been “good”, since otherwise the protocol would have aborted.

Technically the tampering function can use conditional aborts to leak an arbitrary amount of information. However, this comes at the expense of having to abort with high probability. Let be the probability of f causing either party to abort before the last message in the protocol is sent. Then this allows the tampering function to leak at most additional bits to future rounds. Note that causing an abort by tampering with the very last message cannot add any additional leakage, since there are no more future rounds to consider. Further note, that either party aborting before the last message is sent automatically causes both parties to output \(\bot \) in the synchronized setting.

We use the above observation to sample from \(D_f\) by sampling differently depending on . If , the distribution \(D_f\) is sampled by simply outputting \((\bot ,\bot )\). Clearly this distribution is close to the real distribution, since f causes both Alice and Bob to abort and output \(\bot \) with probability at least . If , the distribution \(D_{f}\) is sampled as shown in Algorithm 2. The difference between \(D_{f}\) and the real tampered transcript distribution is captured by the event in which the sampler aborts the execution in steps 4b or 4c, but the real execution continues. To see why \(D_{f}\) is close to the tampered transcript distribution, consider the four cases.

figure b

1. The tampering function did not change \((\alpha _1,\alpha _2)\) or \((\beta _1,\beta _2)\): This is the simplest case. Note that the tampering function may store a bounded function of the messages seen so far. That is, the tampering function stores \(\gamma = h_4(\beta _2, h_3(\alpha _2, h_2(\beta _1, h_1(\alpha _1))))\) where \(h_i\) denotes a memory bounded function as described above. We claim that given \(\gamma \) and up to many bits of additional leakage due to conditional aborts, \((k_1, k_2)\) and hence k is \(2\epsilon \)-close to uniform. This follows from the property of strong extractors. Conditioned on \(\gamma \) and the leakage, the sources \((\alpha _1,\alpha _2)\) are still independent and have sufficient min-entropy. This may not be immediately apparent, since future tampering can depend on \(\gamma \), which technically constitutes joint leakage over \((\alpha _1,\alpha _2)\). However, we can see that this particular joint leakage is not an issue for a 2-nonmalleable extractor by switching to a different but equivalent viewpoint. If we fix \(h_1(\alpha _1)\), then \(\alpha _1\) is no longer uniformly distributed but it is still a source with a distribution with at least \(n-s\) bits of min-entropy. This is ensured by the fixed upper bound on the size of the leakage. From this viewpoint, since \(h_1(\alpha _1)\) is fixed, \(\gamma \) is no longer joint leakage over \((\alpha _1,\alpha _2)\) but merely bounded leakage over \(\alpha _2\). The same applies to additional potential leakage due to conditional aborts, leaving us with a source \(\alpha _1\) with at least bits of min-entropy. Similarly, the same holds for sources \((\beta _1,\beta _2)\).

Now it follows that if the tampering function changes any message in the protocol execution phase, the MAC verification will fail (up to the error ) causing the receiving party to abort. Unless the tampered message was the one sent in round \(r+7\) this in turn automatically causes the other party to abort as well (corresponding to step 4b). If the tampered message was the one sent in round \(r+7\) then only Bob would abort (corresponding to step 4c). Furthermore, by the property of one-time pads, the probability of the tampering function changing any message is independent of the message itself.

2. The tampering function changed \((\alpha _1,\alpha _2)\) (i.e., changed at least one of them) but not \((\beta _1,\beta _2)\): We claim that is \(\epsilon \)-close to uniform given \(\gamma \) and up to many bits of additional leakage due to conditional aborts, , and \((\beta _1,\beta _2)\). This follows from the fact that \(k_1\) is \(\epsilon \)-close to uniform given \(k'_1\), \(\gamma \) and bits of leakage (by the property of 2-non-malleable extractors), and, that \((\beta _1,\beta _2)\) are independent of \((\alpha _1,\alpha _2)\). This also implies that \(k_1\) is \(\epsilon \)-close to uniform given \(\gamma , k'_1, (\beta _1,\beta _2)\), \(k_2\), and bits of leakage since \(k_2\) is entirely determined by \((\beta _1,\beta _2)\). This in turn implies that \(k_1\) is \(\epsilon \)-close to uniform given \(\gamma , k'_1, (\beta _1,\beta _2), k_2\), \(k'_2\), and bits of leakage since \(k'_2 = k_2\). This implies that is \(\epsilon \)-close to uniform conditioned on \(\gamma , k'_1, (\beta _1,\beta _2), k_2\) and leakage. This finally implies that k is \(\epsilon \)-close to uniform conditioned on and leakage. Thus, the MAC verification will fail for Alice in the key confirmation phase (up to the error ) causing both parties to output \(\bot \).

3. The tampering function changed \((\beta _1,\beta _2)\) but not \((\alpha _1,\alpha _2)\): This case is symmetric to the previous case.

4. The tampering function changed both \((\alpha _1,\alpha _2)\) and \((\beta _1,\beta _2)\): The only difference between this case and case 2 is that now \(k'_2\) may not be equal to \(k_2\). As in the previous case, \(k_1\) is almost uniform given \(\gamma , k'_1, (\beta _1,\beta _2), k_2\) and leakage. But note that \(k'_2\) is entirely determined by \((\beta _1,\beta _2)\), \(\gamma \) and the (fixed) tampering function. Hence, \(k_1\) is almost uniform given \(\gamma , k'_1, (\beta _1,\beta _2), k_2, k'_2\) and leakage.

Overall using a union bound over the errors of the extractor and the MAC, we get an upper bound on the statistical distance between \(D_{f}\) and the outputs of a real execution of .    \(\square \)

6 Split-State Tampering

Split-state tampering functions are one of the most interesting and well studied families of tampering functions for regular non-malleable codes and were already considered by Dziembowski, Pietrzak, and Wichs [36] in their seminal paper. A 2-split-state tampering function independently tampers on two fixed disjoint parts of a codeword. Transferring this idea to the interactive setting is straightforward. We can divide the transcript of a protocol into two disjoint sets of messages and allow the tampering function to tamper independently on those two sets.

However, we are actually able to achieve protocol non-malleability for a stronger class, namely c-unbalanced split-state tampering functions. In the regular split state setting, the encoding scheme determines the “split”. In contrast, a c-unbalanced split-state tampering function can split the transcript into two arbitrary sets, as long as each set contains at least a 1/c fraction of the messages.

Definition 11

(c-Unbalanced Split-State Tampering Functions). Functions of the class of c-unbalanced 2-split-state tampering functions \(\mathcal {F}^{c}_{\text {strong-split}}\) for an r-round interactive protocols are defined by an r-tuple of functions \((g_1,\dots ,g_r)\) and two disjoint sets \(\mathcal {I}_0,\mathcal {I}_1\) such that and \(\mathcal {I}_0 \cup \mathcal {I}_1 = \{1,\dots ,r\}\). Let \(m_1,\dots ,m_i\) denote the messages sent by the participants of the protocol in a partial execution. The tampering function for message \(m_i\) is then

$$ f_i(m_1,\dots ,m_i) := {\left\{ \begin{array}{ll} g_i((m_j)_{j\in \mathcal {I}_0,j\le i}) &{} \text {if } i \in \mathcal {I}_0\\ g_i((m_j)_{j\in \mathcal {I}_1,j\le i}) &{} \text {if } i \in \mathcal {I}_1 \end{array}\right. } $$

As a special case functions in \(\mathcal {F}^{2}_{\text {strong-split}}\) must split the messages into two equal size sets. These functions are also alternatively simply called split-state tampering functions, since the split is not unbalanced.

6.1 INMC for Split-State Tampering

We devise a generic protocol-non-malleable encoding \(\varPi \) for c-unbalanced split-state tampering functions for any two-party protocol \(\varPi _0\). The basic idea of the encoding will seem similar to the protocol for bounded state tampering functions, however the instantiation is quite different. We again first run a key exchange phase in which enough key material is exchanged to execute the original protocol encrypted under one-time pad and authenticate all messages with information theoretically secure MACs. The main difference is in the implementation of the key exchange phase. Unlike before, where we relied on non-malleable extractors, we use a notion of tamper-evident n-out-of-n secret sharing in this case. The idea behind this is that both parties contribute to the key material and share their part of the key-material into many shares that are sent in separate messages. If we are able to enforce that the tampering function must jointly tamper with almost all of the messages in the key-exchange phase to be able to predict the key with any significant probability, then we can scale the key exchange phase to make sure that such a function would not be c-unbalanced. The tamper-evidence of the secret sharing scheme allows us to ensure that either party’s shares must be tampered with jointly to learn anything about the reconstructed secret. However, this is not enough. We must also ensure that the other party’s messages must also be tampered jointly. We achieve this via a use of MACs with “successively revealed keys.” I.e., each message must be authenticated using a key that is only revealed if one has knowledge of all of the other party’s previous messages. In this way, each message is “chained” to the other party’s previous messages and any successful tampering must necessarily tamper with the full key-exchange phase in a joint manner.

Theorem 3

Let \(\varPi _0\) denote a correct, r-round protocol, with length-\(\ell \) messages. Let \((\mathsf {Share},\mathsf {Reconstruct})\) be a \(\lceil ((c-1)(r+5)+1)/2\rceil \)-out-of-\(\lceil ((c-1)(r+5)+1)/2\rceil \) perfectly private, \(\epsilon '\)-tamper evident secret sharing scheme for up to bits of leakage with message length \(\ell ''\) and share length \(\ell '\) Let be the target security parameter, then we set . Let be a -secure information theoretic message authentication code. Let be a strong two-source extractor for sources with min-entropy with error \(\epsilon ''\). We assume without loss of generality that Alice sends both the first and last message in \(\varPi _0\) Then for any c there exists a \(c(r+5)\)-round encoding \(\varPi \) of \(\varPi _0\) that is -non-malleable against \(\mathcal {F}^{c}_{\text {strong-split}}\).

The tamper evident secret sharing scheme can be instantiated using the construction described in the full version of this paper, the MAC can be instantiated with a family of pairwise-independent hash functions and the strong 2-source extractor can be instantiated with the extractor due to Bourgain [14].

figure c

Proof of Theorem 3. The protocol \(\varPi \) is specified in Algorithm 3. We need to argue that the protocol is correct and protocol-non-malleable.

Correctness: The correctness of \(\varPi \) follows from the correctness of the secret sharing scheme and the message authentication code. The correctness of the secret sharing scheme implies that when no tampering takes place, Bob and Alice will both reconstruct the correct string \(k_1\) or \(k_2\) respectively. Thus, they will compute the same key k. Combined with the correctness of the message authentication code, this means that neither party will ever abort during the protocol. Further, since the one-time pad is correct it follows that messages of the underlying protocol will always be decrypted correctly and thus both parties are faithfully executing an honest instance of \(\varPi _0\). Thus at the end of the protocol the collected transcripts correspond to an honest execution of \(\varPi _0\).

Protocol Non-malleability: Let f be a c-unbalanced split state tampering function described by \((g_1,\dots ,g_{c(r+5)})\) and \(\mathcal {I}_0,\mathcal {I}_1\) (refer to Definition 11). To prove that the coding scheme is protocol-non-malleable, we show that a distributions \(D_{f}\) as in Definition 8 exists.

The Distribution \(D_{f}\): When sampling from \(D_{f}\) we again need to deal with the problem that the tampering function can communicate information through conditional aborts. I.e., in round i with \(i\in \mathcal {I}_b\), the function \(g_i\) can force an abort in the protocol unless the message sent in round i is “good”. In any future round \(j>i\), even if \(j\in \mathcal {I}_{1-b}\) the function \(g_j\) therefore has the information that the message in round i must have been “good”. This implies leakage between the two split states. To deal with this problem we sample differently depending on the probability of f causing an abort during a protocol execution. Let be the probability of f causing either party to abort before the last message in the protocol is sent. If , the distribution \(D_f\) is sampled by simply outputting \((\bot ,\bot )\). Clearly this distribution is close to the real distribution, since f causes both parties to abort and output \(\bot \) with probability at least . If , the distribution \(D_{f}\) is sampled as shown in Algorithm 4.

figure d

Analysis. It remains to show that \(D_{f}\) is close to the tampered transcript distribution. We first note that the protocol \(\varPi \) overall has \(((c-1)(r+5)+1)+r+4 = c(r+5)\) rounds, of which \((c-1)(r+5)+2\) form the key exchange phase, 3 the key confirmation phase, and r the protocol execution phase. We therefore have that . As noted above, we need to deal with leakage due to conditional aborts for every message being tampered. I.e., the tampered message \(\bar{m}_i\) in round i with \(i\in \mathcal {I}_b\) can, in addition to all previous messages in \(\mathcal {I}_b\), also depend on some joint leakage over all previous messages in \(\mathcal {I}_{1-b}\) due to conditional aborts, simply by observing that the protocol has not aborted.

Claim 4

The tampered message \(\bar{m}_i\) in round i with \(i\in \mathcal {I}_b\) can depend on at most bits of joint leakage over \(\{m_j| j\in \mathcal {I}_{1-b} \wedge j\le i\}\).

Proof

We know that f does not cause an abort with probability at least . Therefore, the tampering function \(g_i\) learns at most bits of joint leakage over previous messages in \(\mathcal {I}_{1-b}\).   \(\square \)

We will argue that conditioned on the protocol not having aborted and the complete view of any tampering function \(g_i\) in the key confirmation and protocol execution phase the key computed by Alice in the key exchange phase remains \(\epsilon ''\) close to uniform. For this we first note that up to step 5 in Algorithm 4 the sampler acts identically to a real execution of the protocol.

Lemma 5

If Alice, or respectively \(D_f\), does not abort during the key exchange phase, then \(\bar{k}_2 = k_2\) except with probability .

Due to space constraints, the proof of Lemma 5 is deferred to the full version. A completely symmetric argument can be made for \(\bar{k}_1=k_1\), where otherwise Bob aborts with probability , causing Alice to also abort. This means that if Alice does not abort, we have that with probability at least .Footnote 2

Now, consider how much information about \(k_1\) and \(k_2\) a tampering function \(g_i\) can learn. Let \(\mathcal {I}_b\) be the set of indices, such that \(i \in \mathcal {I}_b\). Clearly, \(g_i\) has complete knowledge of all shares \(s^B_j\) with \(2j \in \mathcal {I}_b\) and all shares \(s^A_j\) with \(2j+1 \in \mathcal {I}_b\). Further, \(g_i\) receives joint leakage over shares in \(\mathcal {I}_{1-b}\) simply by observing the fact that the protocol has not yet aborted. This leakage is however bounded by Claim 4 by bits. By the perfect privacy of the secret sharing scheme, it follows that bits of joint leakage over all shares can reveal at most bits of the secret.

Since a set of indices with would be too large for a c-unbalanced split state tampering function, \(\mathcal {I}_b\) cannot possibly contain all the shares. Thus, the maximum amount of information the tampering function \(g_i\) can gain about \(k_1\) and \(k_2\) is exactly one of the two strings and bits of the other string. Since is a strong 2-source extractor for sources with min-entropy , this implies that in this case with probability at least \(1-\epsilon ''\) the extracted key-material remains \(\epsilon ''\) close to uniform. Overall, this means that with probability at least , k remains \(\epsilon ''\) close to uniform from the point of view of any tampering function \(g_i\).

To recap, if any of the key-shares are tampered with in such a way that the original keys are not reconstructed, then the sampling algorithm will always output \((\bot ,\bot )\), while the parties in the real protocol will do so with probability at least . If the shares were not tampered with and thus , then since k is distributed \(\epsilon ''\)-close to uniform – the random messages in the simulated protocol execution phase are distributed \(\epsilon ''\) close to a real protocol execution. Now, if f tampers with any message of the key-confirmation or protocol-execution phase except for the very last one, then the sampling algorithm always outputs \((\bot ,\bot )\), whereas if only the very last message is tampered with the sampling algorithm outputs \((\mathsf {same},\bot )\). In a real protocol execution when tampering with any message, the information theoretic MAC must be computed almost independently of k, since k remains \(\epsilon ''\) close to uniform. Therefore, if any message is tampered with in a real protocol execution, the receiving party will abort with probability , causing both parties to output \(\bot \), except if it only happens in the very last message, where only Bob will abort with probability and output \(\bot \) and Alice will retain the correct transcript. On the other hand, if no message is tampered with, the sampling algorithm outputs \((\mathsf {same},\mathsf {same})\) and both Alice and Bob in a real protocol execution retain the correct transcript. This follows since in this case Alice and Bob agree on a key. Overall a union bound then gives us an upper bound on the statistical distance between \(D_{f}\) and the distribution of both parties’ outputs in a real execution of . With \(d = \lceil ((c-1)(r+5)+1)/2\rceil \), this leads to the claimed bound of .    \(\square \)

7 Fragmented Sliding Window Tampering

The sliding window model is a very natural restriction of algorithms and is considered in a variety of contexts, in particular also for error correcting codes [48]. The idea of the sliding window is that an adversary can only watch a stream of data through a window of fixed size. In the context of interactive non-malleable codes this means that the tampering function “remembers” only the last w messages. That is, the tampering function gets as input the last w (untampered) messages of the protocol transcript to compute the tampered message.

We in fact consider a stronger class of functions that we call fragmented sliding window. Functions with a fragmented window of size w can depend on any w previous messages, not just the last w. In a sense the adversary is still watching the transcript through a fixed size window, it can freely choose which fragments of the window remain transparent and which ones become opaque.

Comparing this class with c-unbalanced split-state tampering functions, we note that the size of the window is now fixed and does not scale with the number of messages. On the other hand the different sets of messages tampering can depend on are no longer required to be disjoint. E.g., the tampering of each single message could depend on the first message of the protocol, something that would not be possible in the case of split-state functions.

Definition 12

(Fragmented Sliding Window Tampering Functions). Functions of the class of w-size fragmented sliding window tampering functions \(\mathcal {F}^{w}_{\text {frag}}\) for an r-round interactive protocols are defined by an r-tuple of functions \((g_1,\dots ,g_r)\) and an r-tuple of sets \((S_1,\dots ,S_r)\) such that \(S_1 = \emptyset \), \(S_i \subseteq S_{i-1} \cup \{i-1\}\) and for \(1< i\le r\). Let \(m_1,\dots ,m_i\) be the messages sent by the participants of the protocol in a partial execution. The tampering function for message \(m_i\) is then defined as \( f_i(m_1,\dots ,m_i) := g_i\big (m_i,(m_j)_{j\in S_i}\big ). \)

7.1 INMC for Fragmented Sliding Window Tampering

Even though there are important conceptual differences between fragmented sliding window tampering functions and c-unbalanced split-state tampering functions, essentially identical protocol can be used to achieve protocol-non-malleability for fragmented sliding window tampering functions. The difference is how the key exchange phase scales. The window-size is fixed and does not depend on the round complexity of the protocol. This means that d – the number of shares Alice and Bob split their keys into – must scale with w instead of the underlying protocol’s round complexity.

Theorem 6

Let \(\varPi _0\) denote a correct, r-round protocol, with length-\(\ell \) messages. Let \((\mathsf {Share},\mathsf {Reconstruct})\) be a \(w+2\)-out-of-\(w+2\) perfectly private, \(\epsilon '\)-tamper evident secret sharing scheme for up to bits of leakage with message length \(\ell ''\) and share length \(\ell '\) Let be the target security parameter, then we set . Let be a -secure information theoretic message authentication code. Let be a strong two-source extractor for sources with min-entropy with error \(\epsilon ''\). We assume wlog that Alice sends both the first and last message in \(\varPi _0\). Then for any w there exists a \(r+2w+8\)-round encoding \(\varPi \) of \(\varPi _0\) that is -protocol non-malleable against \(\mathcal {F}^{w}_{\text {frag}}\).

Due to space constraints, the proof of Theorem 6 is deferred to the full version of this paper.