1 Introduction

Passwords provide the dominant mechanism for electronic authentication, protecting a plethora of sensitive information. However, passwords are vulnerable to both online and offline attacks. A network adversary can test password guesses in online interactions with the server while an attacker who compromises the authentication data stored by the server (i.e., a database of salted password hashes) can mount an offline dictionary attack by testing each user’s authentication information against a dictionary of likely password choices. Offline dictionary attacks are a major threat, routinely experienced by commercial vendors, and they lead to the compromise of billions of user accounts [6, 7, 12, 15, 17, 20]. Moreover, because users often re-use their passwords across multiple services, compromising one service typically also compromises user accounts at other services.

Two-factor password authentication (TFA), where user \(\mathsf {U}\) authenticates to server \(\mathsf {S}\) by “proving possession” of an auxiliary personal device \(\mathsf {D}\) (e.g. a smartphone or a USB token) in addition to knowing her password, forms a common defense against online password attacks as well as a second line of defense in case of password leakage. A TFA scheme which uses a device that is not directly connected to \(\mathsf {U}\)’s client terminal \(\mathsf {C}\) typically works as follows: \(\mathsf {D}\) displays a short one-time secret PIN, either received from \(\mathsf {S}\) (e.g. using an SMS message) or computed by \(\mathsf {D}\) based on a key shared with \(\mathsf {S}\), and the user manually types the PIN into client \(\mathsf {C}\) in addition to her password. Examples of systems that are based on such one-time PINs include SMS-based PINs, TOTP [10], HOTP [14], Google Authenticator [4], FIDO U2F [2], and schemes in the literature such as [47].

Vulnerabilities of traditional TFA schemes. Existing TFA schemes, both PIN-based and those that do not rely on PINs, e.g. [1, 8], combine password authentication and 2nd-factor authentication as separate authentication mechanisms leading to several limitations. Chief among these is that such TFA solutions remain vulnerable to offline dictionary attacks upon server compromise in the same way as non-TFA password authentication schemes (i.e. via exposure of users’ salted hashes), thus perpetuating the main source of password leakage. Moreover, existing TFA’s have several vulnerabilities against online attacks: (1) The read-and-copy PIN-transfer is subject to a variety of eavesdropping attacks, including SMS hijackingFootnote 1, shoulder-surfing, PIN recording, client-side or device-side attacks via keyloggers or screen scrapers, e.g. [43], and PIN phishing [16]. (2) The read-and-copy PIN-transfer allows only limited PIN entropy and while, say, a 6-digit PIN is hard to guess, PIN guessing can be used in a large-scale online attack against accounts whose passwords the attacker already collected, e.g. [12, 15, 17, 20]. For example, if the attacker obtains password information for a large set of accounts, PINs are 6-digit long, and the attacker can try 10 PIN guesses per account, one expects a successful impersonation per 100,000 users. (3) Current PIN-based TFAs perform sequential authentication using the password and the PIN, i.e. \(\mathsf {C}\) sends the password to \(\mathsf {S}\) (over TLS), \(\mathsf {S}\) confirms whether \(\mathsf {pwd}\) is correct, and only then \(\mathsf {C}\) sends to \(\mathsf {S}\) the PIN retrieved from \(\mathsf {D}\). This enables online password attacks without requiring PIN guessing or interaction with a device, thus voiding the effects of PIN on password-guessing or password-confirmation online attacks.

Our Contributions. In this paper we aim to address the vulnerabilities of the currently deployed TFA schemes by (1) introducing a precise security model for TFA schemes capturing well-defined maximally-attainable security bounds, (2) exhibiting a practical TFA scheme which we prove to achieve the strong security guaranteed by our formal model, and (3) prototyping several methods for validating user’s possession of the secondary authentication factor. We expand on each of these aspects next.

TFA Security Model with End-to-End Security. We introduce a Two-Factor Authenticated Key Exchange (TFA-KE) model in which a user authenticates to server \(\mathsf {S}\) by (1) entering a password into client terminal \(\mathsf {C}\) and (2) proving possession of a personal device \(\mathsf {D}\) which forms the second authenticator factor. In the TFA-KE model, possession of \(\mathsf {D}\) is proved by the user confirming in the device equality of a t-bit checksum displayed by \(\mathsf {D}\) with a checksum displayed by \(\mathsf {C}\). Following [50] (see below), this implements a t-bit \(\mathsf {C}\)-to-\(\mathsf {D}\) user-authenticated channel, which confirms that the same person is in control of client \(\mathsf {C}\) and device \(\mathsf {D}\). This channel authentication requirement is weaker than the private channel required by current PIN-based TFAs and, as we show, it allows TFA schemes to be both more secure and easier to use.

The TFA-KE model, that we define as an extension of the standard Password-Authenticated Key Exchange (PAKE) [24] and the Device-Enhanced PAKE (DE-PAKE) [37] models, captures what we call end-to-end security by allowing the adversary to control all communication channels and compromise any protocol party. For each subset of compromised parties, the model specifies best-possible security bounds, leaving inevitable (but costly) exhaustive online guessing attacks as the only feasible attack option. In particular, in the common case that \(\mathsf {D}\) and \(\mathsf {S}\) are uncorrupted, the only feasible attack is an active simultaneous online attack against both \(\mathsf {S}\) and \(\mathsf {D}\) that also requires guessing the password and the t-bit checksum. Compromising server \(\mathsf {S}\) allows the attacker to impersonate \(\mathsf {S}\), but does not help in impersonating the user to \(\mathsf {S}\), and in particular does not enable an offline-dictionary attack against the user’s password. Compromising device \(\mathsf {D}\) makes the authentication effectively password-only, hence offering best possible bounds in the PAKE model (in particular, the offline dictionary attack is possible only if \(\mathsf {D}\) and \(\mathsf {S}\) are both compromised). Finally, compromising client \(\mathsf {C}\) leaks the password, but even then impersonating the user to the server requires an active attack on \(\mathsf {D}\). We prove our protocols in this strong security model.

Practical TFA with End-to-End Security. Our main result is a TFA scheme, \(\mathsf {GenTFA}\) that achieves end-to-end security as formalized in our TFA-KE model and is based on two general tools. The first is a Device-Enhanced Password Authenticated Key Exchange (DE-PAKE) scheme as introduced by Jarecki et al. [37]. Such a scheme assumes the availability of a user’s auxiliary device, as in our setting, and utilizes the device to protect against offline dictionary attacks in case of server compromise. However, DE-PAKE schemes provide no protection in case that the client machine \(\mathsf {C}\) is compromised and, moreover, security completely breaks down if the user’s password is leaked. Thus, our approach for achieving TFA-KE security is to start with a DE-PAKE scheme and armor it against client compromise (and password leakage) using our second tool, namely, a SAS-MA (Short-Authentication-String Message Authentication) as defined by Vaudenay [50]. In our application, a SAS-MA scheme utilizes a t-bit user-authenticated channel, called a SAS channel, to authenticate data sent from \(\mathsf {C}\) to \(\mathsf {D}\). More specifically, the SAS channel is implemented by having the user verify and confirm the equality of two t-bit strings, called checksums, displayed by both \(\mathsf {C}\) and \(\mathsf {D}\). It follows from [50] that if the displayed checksums coincide then the information received by \(\mathsf {D}\) from \(\mathsf {C}\) is correct except for a \(2^{-t}\) probability of authentication error. We then show how to combine a DE-PAKE scheme with such a SAS channel to obtain a scheme, \(\mathsf {GenTFA}\), for which we can prove TFA-KE security, hence provably avoiding the shortcomings of PIN-based schemes. Moreover, the use of the SAS channel relaxes the required user’s actions from a read-and-copy action in traditional schemes to a simpler compare-and-confirm which also serves as a proof of physical possession of the device by the user (see more below).

We show a concrete practical instantiation of our general scheme \(\mathsf {GenTFA}\), named \(\mathsf {OpTFA}\), that inherits from \(\mathsf {GenTFA}\) its TFA-KE security. Protocol \(\mathsf {OpTFA}\) is modular with respect to the (asymmetric) password protocol run between client and server, thus it can utilize protocols that assume PKI as the traditional password-over-TLS, or those that do not require any form of secure channels, as in the (PKI-free) asymmetric PAKE schemes [25, 32]. In the PKI case, \(\mathsf {OpTFA}\) can run over TLS, offering a ready replacement of current TFA schemes in the PKI setting. In the PKI-free case one gets the advantages of the TFA-KE setting without relying on PKI, thus obtaining a strict strengthening of (password-only) PAKE security [24, 44] as defined by the TFA-KE model.

The cost of \(\mathsf {OpTFA}\) is two communication rounds between \(\mathsf {D}\) and \(\mathsf {C}\), with 4 exponentiations by \(\mathsf {C}\) and 3 by \(\mathsf {D}\), plus the cost of a password authentication protocol between \(\mathsf {C}\) and \(\mathsf {S}\). In the PKI setting the latter is the cost of establishing a server-authenticated TLS channel, while in the PKI-free case one can use an asymmetric PAKE (e.g., [27, 36]) with cost (some of it computable offline) of 3 exponentiations for \(\mathsf {C}\), 2 for S, and one multi-exponentiation for each.

Implementation and SAS Channel Designs. We prototyped protocol \(\mathsf {OpTFA}\), in both the PKI and PKI-free versions, with the client implemented as a Chrome browser extension, the device as an Android app, and \(\mathsf {D}\)-\(\mathsf {C}\) communication implemented using Google Cloud Messaging. We also designed and implemented several instantiations of the human-assisted \(\mathsf {C}\)-to-\(\mathsf {D}\) SAS channel required by our TFA-KE solution and model. Recall that a SAS channel replaces the user’s read-and-copy action of a PIN-based TFA with the compare-and-confirm action used to validate the checksums displayed by \(\mathsf {C}\) and \(\mathsf {D}\). The security of a SAS-model TFA-KE depends on the checksum entropy t, called the SAS channel capacity, hence the two important characteristics of a physical design of a SAS channel are its capacity t and the ease of the compare-and-confirm action required of the user. In Sect. 6 we show several SAS designs that present different options in terms of channel capacity and user-friendliness.

Our base-line implementation of a SAS channel encodes 20-bit checksums as 6-digit decimal PINs, which the user compares when displayed by \(\mathsf {C}\) and \(\mathsf {D}\) (no copying involved). However, we also propose two novel and higher-capacity SAS channels. In the first design, the device \(\mathsf {D}\) is assumed to have a camera and the checksum calculated by the client is encoded as a QR code and displayed by \(\mathsf {C}\). The user prompts \(\mathsf {D}\) to capture this QR code which \(\mathsf {D}\) decodes and compares against its own computed checksum. The second design is based on an audio channel implemented using a human speech transcription software. If device \(\mathsf {D}\) is a smartphone then the user can read out an alphanumeric checksum displayed by \(\mathsf {C}\) into \(\mathsf {D}\)’s microphoneFootnote 2, and \(\mathsf {D}\) decodes the audio using the transcriber tool and compares it to its checksum.

Related Works. We discuss related works in greater detail in Sect. 7. The main observations are: First, multiple methods have been proposed in the crypto literature for strengthening password authentication against offline dictionary attacks in case of server compromise by introducing an additional party in the protocol (e.g., password-hardened or device-enhanced authentication [23, 27, 31, 37] and Threshold-PAKE or 2-PAKE, e.g. [28, 40, 44]), but these schemes offer no security against an active attacker in case of password leakage or client compromise, hence they are not TFAs. Second, many TFA schemes offer alternatives to PIN-based TFAs, but none of them offer protection against offline attacks upon server compromise except for the scheme of [47] (see Sect. 7). Moreover, if these schemes consider \(\mathsf {D}\) as an independent entity (rather than a local component of client \(\mathsf {C}\)) then they either have on-line security vulnerabilities or they require a pre-set secure full-bandwidth \(\mathsf {C}\)-\(\mathsf {D}\) channel. In our case, we do with just a SAS channel that as we show in Sect. 6 has several practical implementations. Third, we are not aware of any attempt to model security of TFA schemes where \(\mathsf {D}\) and \(\mathsf {C}\) are not co-located, nor do we know any PKI-free TFA schemes proposed for this setting.

Road-Map. In Sect. 2 we present TFA-KE security model. In Sect. 3 we describe our protocol building blocks. In Sect. 4 we present a practical TFA-KE protocol \(\mathsf {OpTFA}\), and we provide informal rationale for its design choices. In Sect. 5 we show a more general TFA-KE protocol \(\mathsf {GenTFA}\), of which \(\mathsf {OpTFA}\) is an instance, together with its formal security proof. In Sect. 6 we report on the implementation and testing of protocol \(\mathsf {OpTFA}\), and we describe several SAS channel designs. In Sect. 7 we include more detailed direlated works.

2 TFA-KE Security

We introduce the Two-Factor Authenticated Key Exchange (TFA-KE) security model that defines the assumed environment and participants in our protocols as well as the attacker’s capabilities and the model’s security guarantees. Our starting point is the Device-Enhanced PAKE (DE-PAKE) model, introduced in [37], which extends the well-known two-party Password-Authenticated Key Exchange (PAKE) model [24] to a multi-party setting that includes users \(\mathsf {U}\), communicating from client machines \(\mathsf {C}\), servers \(\mathsf {S}\) to which users log in, and auxiliary devices \(\mathsf {D}\), e.g. a smartphone. A DE-PAKE scheme has the security properties of a two-server PAKE (2-PAKE) [28, 40] where \(\mathsf {D}\) plays the role of the 2nd server. Namely, a compromise of either \(\mathsf {S}\) or \(\mathsf {D}\) (but not both) essentially does not help the attacker, and in particular leaks no information about the user’s password. However, whereas 2-PAKE might be insecure in case of a compromise of both \(\mathsf {S}\) and \(\mathsf {D}\), in a DE-PAKE the adversary who compromises \(\mathsf {S}\) and \(\mathsf {D}\) must stage an offline dictionary attack to learn anything about the password.

The TFA-KE model considers the same set of parties as in the DE-PAKE model (which we recall in Appendix A) and all the same adversarial capabilities, including controlling all communication links, the ability to mount online active attacks, offline dictionary attacks, and to compromise devices and servers. However, the DE-PAKE model does not consider client corruption or password leakage. Indeed, in case of password leakage an active adversary can authenticate to \(\mathsf {S}\) by impersonating the legitimate user in a single DE-PAKE session with \(\mathsf {D}\) and \(\mathsf {S}\). Since a TFA scheme is supposed to protect against the client corruption and password leakage attacks, our TFA-KE model enhances the DE-PAKE model by adding these capabilities to the adversary while preserving all the other strict security requirements of DE-PAKE. In general, DE-PAKE requirements were such that the only allowable attacks on the system, under a given set of corrupted parties, are the unavoidable exhaustive online guessing attacks for that setting; the same holds for TFA-KE but with additional best resilience to client compromise and password leakage.

Note, however, that if \(\mathsf {C},\mathsf {D},\mathsf {S}\) communicate only over insecure links then an attacker who learns the user’s password will always be able to authenticate to \(\mathsf {S}\) as in the case of DE-PAKE, by impersonating the user to \(\mathsf {D}\) and \(\mathsf {S}\). Consequently, to allow device \(\mathsf {D}\) to become a true second factor and maintain security in case the password leaks, one has to assume some form of authentication in the \(\mathsf {C}\) to \(\mathsf {D}\) communication which would allow the user to validate that \(\mathsf {D}\) communicates with the user’s own client terminal \(\mathsf {C}\) and not with the attacker who performs a man-in-the-middle attack and impersonates this user to \(\mathsf {D}\).

To that end our TFA-KE model augments the communication model by an authentication abstraction on the client-to-device channel, but it does so without requiring the client to store any long-term keys (other than the user’s password). Namely, we assume a uni-directional \(\mathsf {C}\)-to-\(\mathsf {D}\) “Short Authenticated String” (SAS) channel, introduced by Vaudenay [50], which allows \(\mathsf {C}\) to communicate t bits to \(\mathsf {D}\) that cannot be changed by the attacker. The t-bit \(\mathsf {C}\)-to-\(\mathsf {D}\) SAS channel abstraction comes down to a requirement that the user compares a t-bit checksum displayed by both \(\mathsf {C}\) and \(\mathsf {D}\), and approves (or denies) their equality by choosing the corresponding option on device \(\mathsf {D}\).

As is standard, we quantify security by attacker’s resources that include the computation time and the number of instances of each protocol party the adversary interacts with. We denote these as \(q_D,q_S,q_C,q_C'\), where the first two count the number of active sessions between the attacker and \(\mathsf {D}\) and \(\mathsf {S}\), resp., while \(q_C\) (resp. \(q_C'\)) counts the number of sessions where the attacker poses to \(\mathsf {C}\) as \(\mathsf {S}\) (resp. as \(\mathsf {D}\)). Security is further quantified by the password entropy d (we assume the password is chosen from a dictionary of size \(2^d\) known to the attacker), and parameter t, which is called the SAS channel capacity. As we explain in Sect. 3, a \(\mathsf {C}\)-to-\(\mathsf {D}\) SAS channel allows for establishing a \(\mathsf {D}\)-authenticated secure channel between \(\mathsf {D}\) and \(\mathsf {C}\), except for the \(2^{-t}\) probability of error [50], which explains \(2^{-t}\) factors in the TFA-KE security bounds stated below.

TFA Security Definition. We consider a communication model of open channels plus the t-bit SAS-channel between \(\mathsf {C}\) and \(\mathsf {D}\), and a man-in-the-middle adversary that interacts with \(q_D,q_S,q_C,q_C'\) sessions of \(\mathsf {D},\mathsf {S},\mathsf {C}\), as described above. The adversary can also corrupt any party, \(\mathsf {S}\), \(\mathsf {D}\), or \(\mathsf {C}\), learning its stored secrets and the internal state as that party executes its protocol, which in the case of \(\mathsf {C}\) implies learning the user’s password. All other adversarial capabilities as well as the test session experiment defining the adversary’s goal are as in DE-PAKE (and PAKE) models – see Appendix A. In particular, the adversary’s advantage is, as in DE-PAKE and PAKE, an advantage in distinguishing between a random string and a key computed by \(\mathsf {S}\) or \(\mathsf {C}\) on a test session.

The security requirements set by Definition 1 below are the strictest one can hope for given the communication and party corruption model. That is, wherever we require the attacker’s advantage to be no more than a given bound with a set of corrupted parties, then there is an (unavoidable) attack - in the form of exhaustive guessing attack - that achieves this bound under the given compromised parties. Importantly, and in contrast to typical two-factor authentication solutions, the TFA-KE model requires that the second authentication factor \(\mathsf {D}\) not only provides security in case of client and/or password compromise, but that it also strengthens online and offline security (by \(2^t\) factors) even when the password has not been learned by the attacker.

Definition 1

A TFA-KE protocol \(\mathsf{TFA}\) is \((T,\epsilon )\)-secure if for any password dictionary \(\mathsf {Dict}\) of size \(2^d\), any t-bit SAS channel, and any attacker \(\mathsf {A}\) bounded by time T, \(\mathsf {A}\)’s advantage \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\) in distinguishing the tested session key from random is bounded as follows, for \(q_S,q_C,q_C',q_D\) as defined above:

  1. 1.

    If \(\mathsf {S}\), \(\mathsf {D}\), and \(\mathsf {C}\) are all uncorrupted:

    $${\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\le \min \{q_C+q_S/2^t,q_C'+q_D/2^t\}/2^d+\epsilon $$
  2. 2.

    If only \(\mathsf {D}\) is corrupted: \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\le (q_C+q_S)/2^d+\epsilon \)

  3. 3.

    If only \(\mathsf {S}\) is corrupted: \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\le (q_C'+q_D/2^t)/2^d+\epsilon \)

  4. 4.

    If only \(\mathsf {C}\) is corrupted (or the user’s password leaks by any other means): \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\le \min (q_S,q_D)/2^t+\epsilon \)

  5. 5.

    If both \(\mathsf {D}\) and \(\mathsf {S}\) are corrupted (but not \(\mathsf {C}\)), and \(\overline{q}_S\) and \(\overline{q}_D\) count \(\mathsf {A}\)’s offline operations performed based on resp. \(\mathsf {S}\)’s and \(\mathsf {D}\)’s state: \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\le \min \{\overline{q}_S,\overline{q}_D\}/{2^d}\)

Explaining the bounds. The security of the TFA scheme relative to the DE-PAKE model can be seen by comparing the above bounds to those in Definition 2 in Appendix A. Here we explain the meaning of some of these bounds. In the default case of no corruptions, the adversary’s probability of attack is at most \(\min (q_C{+}q_S/2^t,q_C'{+}q_D/2^t)/2^d\) improving on DE-PAKE bound \(\min (q_C{+}q_S,q_C'{+}q_D)/2^d\) and on the PAKE bound \((q_C{+}q_S)/2^d\). For simplicity, assume that \(q_C=q_C'=0\) (e.g., in the PKI setting where \(\mathsf {C}\) talks to \(\mathsf {S}\) over TLS and the communication from \(\mathsf {D}\) to \(\mathsf {C}\) is authenticated), in which case the bound reduces to \(\min (q_S,q_D)/2^{t+d}\). The interpretation of this bound, and similarly for the other bounds in this model, is that in order to have a probability \(q/2^{t+d}\) to impersonate the user, the attacker needs to run q online sessions with S and also q online sessions with D. (In each such session the attacker can test one password out of a dictionary of \(2^d\) passwords, and can do so successfully only if its communication with \(\mathsf {D}\) is accepted over the SAS channel, which happens with probability \(2^{-t}\).) This is the optimal security bound in the TFA-KE setting since an adversary who guesses both the user’s password and the t-bit SAS-channel checksum can successfully authenticate as the user to the server.

In case of client corruption (and password leakage), the adversary’s probability of impersonating the user to the server is at most \(\min (q_S,q_D)/2^t\), which is the best possible bound when the attacker holds the user’s password. In case of device corruption, the adversary’s advantage is at most \((q_C{+}q_S)/2^d\), which matches the optimal PAKE probability, namely, when a device is not available. Finally, upon server corruption, the adversary’s probability of success in impersonating the user to any uncorrupted server session is (assuming \(q_C'=0\) for simplicity) at most \(q_D/2^{t+d}\). In other words, learning server’s private information necessarily allows the adversary to authenticate as the server to the client, but it does not help to impersonate as the client to the server. In contrast, widely deployed PIN-based TFA schemes that transmit passwords and PINs over a TLS channel are subject to an offline dictionary attack in this case.

Extension: The Case of \(\mathsf {C}\) and \(\mathsf {S}\) Corruption. Note that when \(\mathsf {C}\) and \(\mathsf {D}\) are corrupted, there is no security to be offered because the attacker has possession of all authenticator factors, the password and the auxiliary device. However, in the case that both \(\mathsf {C}\) and \(\mathsf {S}\) are corrupted one can hope that the attacker could not authenticate to sessions in \(\mathsf {S}\) that the attacker does not actively control. Indeed, the above model can be extended to include this case with a bound of \(q_D/2^t\). Our protocols as described in Figs. 3 and 4 do not achieve this bound, but it can be easily achieved for example by the following small modification (refer to the figures): \(\mathsf {S}\) is initialized with a public key of \(\mathsf {D}\) and before sending the value \(zid\) to \(\mathsf {D}\) (via \(\mathsf {C}\)), \(\mathsf {S}\) encrypts it under \(\mathsf {D}\)’s public key.

3 Building Blocks

We recall several of the building blocks used in our TFA-KE protocol.

SAS-MA Scheme of Vaudenay [50]. The Short Authentication String Message Authentication (SAS-MA) scheme allows the transmission of a message from a sender to a receiver so that the receiver can check the integrity of the received message. A SAS-MA scheme considers two communication channels. One that allows the transmission of messages of arbitrary length and is controlled by an active man-in-the-middle, and another that allows sending up to t bits that cannot be changed by the attacker (neither channel is assumed to provide secrecy). We refer to these as the open channel and the SAS channel, respectively, and call the parameter t the SAS channel capacity. A SAS-MA scheme is called secure if the probability that the receiver accepts a message modified by a (computationally bounded) attacker on the open channel is no more than \(2^{-t}\) (plus a negligible fraction). In Fig. 1 we show a secure SAS-MA implementation of [50] for a sender \(\mathsf {C}\) and a receiver \(\mathsf {D}\). The SAS channel is abstracted as a comparison of two t-bit strings \(\mathsf {checksum}_C\) and \(\mathsf {checksum}_D\) computed by sender and receiver, respectively. As shown in [50], the probability that an active man-in-the-middle attacker between \(\mathsf {D}\) and \(\mathsf {C}\) succeeds in changing message \(\mathsf {M_C}\) while \(\mathsf {D}\) and \(\mathsf {C}\) compute the same checksum is at most \(2^{-t}\). Note that this level of security is achieved without any keying material (secret or public) pre-shared between the parties. Also, importantly, there is no requirement for checksums to be secret. (In Sect. 5 we present a formal SAS-MA security definition.)

Thus, the SAS-MA protocol reduces integrity verification of a received message \(\mathsf {M_C}\) to verifying the equality of two strings (checksums) assumed to be transmitted “out-of-band”, namely, away from adversarial control. In our application, the checksums will be values displayed by device \(\mathsf {D}\) and client \(\mathsf {C}\) whose equality the user verifies and confirms via a physical action, e.g. a click, a QR snapshot, or an audio read-out (see Sect. 6). In the TFA-KE application this user-confirmation of checksum equality serves as evidence for the physical control of the terminal \(\mathsf {C}\) and device \(\mathsf {D}\) by the same user, and a confirmation of user’s possession of the 2nd authentication factor implemented as device \(\mathsf {D}\).

Fig. 1.
figure 1

SAS Message Authentication (SAS-MA) [50]

SAS-SMT. One can use a SAS-MA mechanism from \(\mathsf {C}\) to \(\mathsf {D}\) to bootstrap a confidential channel from \(\mathsf {D}\) to \(\mathsf {C}\). The transformation is standard: To send a message m securely from \(\mathsf {D}\) to \(\mathsf {C}\) (in our application m is a one-time key and \(\mathsf {D}\)’s PTR response, see below), \(\mathsf {C}\) picks a CCA-secure public key encryption key pair \(({\mathsf {sk}},{\mathsf {pk}})\) (e.g., pair \((x,g^x)\)) for an encryption scheme \(({\mathsf {KG}},{\mathsf {Enc}},{\mathsf {Dec}})\), sends \({\mathsf {pk}}\) to \(\mathsf {D}\), and then \(\mathsf {C}\) and \(\mathsf {D}\) execute the SAS-MA protocol on \(\mathsf {M_C}={\mathsf {pk}}\). If \(\mathsf {D}\) accepts, it sends m encrypted under \({\mathsf {pk}}\) to \(\mathsf {C}\), who decrypts it using \({\mathsf {sk}}\). The security of SAS-MA and the public-key encryption imply that an attacker can intercept m (or modify it to some related message) only by supplying its own key \({\mathsf {pk}}'\) instead of \(\mathsf {C}\)’s key, and causing \(\mathsf {D}\) to accept in the SAS-MA authentication of \({\mathsf {pk}}'\) which by SAS-MA security can happen with probability at most \(2^{-t}\). The resulting protocol has 4 messages, and the cost of a plain Diffie-Hellman exchange if implemented using ECIES [22] encryption. We refer to this scheme as SAS-SMT (SMT for “secure message transmission”).

aPAKE. Informally, an aPAKE (for asymmetric or augmented PAKE) is a password protocol secure against server compromise [25, 32], namely, one where the server stores a one-way function of the user’s password so that an attacker who breaks into the server can only learn information on the password through an exhaustive offline dictionary attack. While the aPAKE terminology is typically used in the context of password-only protocols that do not rely on public keys, we extend it here (following [37]) to the standard PKI-based password-over-TLS protocol. This enables the use of our techniques in the context of TLS, a major benefit of our TFA schemes. Note that this standard protocol, while secure against server compromise is not strictly an aPAKE as it allows an attacker to learn plaintext passwords (decrypted by TLS) for users that authenticate while the attacker is in control of the server. As shown in [37], dealing with this property requires a tweak in the DE-PAKE protocol (\(\mathsf {C}\) needs to authenticate the value b sent by \(\mathsf {D}\) in the PTR protocol described below - see also Sect. 6).

DE-PAKE. A Device-Enhanced PAKE (DE-PAKE) [37] is an extension of the asymmetric PAKE model by an auxiliary device, which strengthens aPAKE protocols by eliminating offline dictionary attacks upon server compromise. We discuss DE-PAKE in more detail in Sect. 2 and recall its formal model in Appendix A. We use DE-PAKE protocols as a main module in our general construction of TFA-KE, and our practical instantiation of this construction, protocol \(\mathsf {OpTFA}\), uses the DE-PAKE scheme of [37] which combines an asymmetric aPAKE with a password hardening procedure \(\mathsf{PTR}\) described next.

Password-to-Random Scheme \(\mathsf{PTR}\) . A PTR is a password hardening procedure that allows client \(\mathsf {C}\) to translate with the help of device \(\mathsf {D}\) (which stores a key k) a user’s master password \(\mathsf {pwd}\) into independent pseudorandom passwords (denoted \(\mathsf {rwd}\)) for each user account. The PTR instantiation from [37] is based on the Ford-Kaliski’s Blind Hashed Diffie-Hellman technique [31]: Let G be a group of prime order q, let \({H'}\) and \(H\) be hash functions which map onto, respectively, elements of G and \(\kappa \)-bit strings, where \(\kappa \) is a security parameter. Define \(F_k(x)=H(x,({H'}(x))^k)\), where the key k is chosen at random in \(\mathbb {Z}_q\). In \(\mathsf{PTR}\) this function is computed jointly between \(\mathsf {C}\) and \(\mathsf {D}\) where \(\mathsf {D}\) inputs key k and \(\mathsf {C}\) inputs \(x=\mathsf {pwd}\) as the argument, and the output, denoted \(\mathsf {rwd}=F_k(\mathsf {pwd})\), is learned by \(\mathsf {C}\) only. The protocol is simple: \(\mathsf {C}\) sends \(a=(H'(\mathsf {pwd}))^r\) for r random in \(\mathbb {Z}_q\), \(\mathsf {D}\) responds with \(b=a^k\), and \(\mathsf {C}\) computes \(\mathsf {rwd}=H(x,b^{1/r})\). Under the One-More (Gap) Diffie-Hellman (OM-DH) assumption in the Random Oracle Model (ROM), this scheme realizes a universally composable oblivious PRF (OPRF) [36], which in particular implies that \(x=\mathsf {pwd}\) is hidden from all observers and function \(F_k(\cdot )\) remains pseudorandom on all inputs which are not queried to \(\mathsf {D}\).

Fig. 2.
figure 2

Schematic representation of protocol \(\mathsf {OpTFA}\) of Fig. 3

4 \(\mathsf {OpTFA}\): A Practical Secure TFA-KE Protocol

In Sect. 5 we present and prove a general design, \(\mathsf {GenTFA}\), of a TFA-KE protocol based on two generic components, namely, a SAS-MA and DE-PAKE protocols. But first, in this section, we show a practical instantiation of \(\mathsf {GenTFA}\) using the specific building blocks presented in Sect. 3, namely, the SAS-MA scheme from Fig. 1 and the DE-PAKE scheme from [37] (that uses the DH-based PTR scheme described in that section composed with any asymmetric PAKE). This concrete instantiation serves as the basis of our implementation work (Sect. 6) and helps explaining the rationale of our general construction. \(\mathsf {OpTFA}\) is presented in Fig. 3. A schematic representation is shown in Fig. 2.

Fig. 3.
figure 3

\(\mathsf {OpTFA}\): efficient TFA-KE protocol with optimal security bounds

Enhanced TFA via SAS. Before going into the specifics of \(\mathsf {OpTFA}\), we describe a general technique for designing TFA schemes using a SAS channel. In traditional TFA schemes, a PIN is displayed to the user who copies it into a login screen to prove access to that PIN. As discussed in the introduction, this mechanism suffers of significant weaknesses mainly due to the low entropy of PINs (and inconvenience of copying them). We suggest automating the transmission of the PIN over a confidential channel from device \(\mathsf {D}\) to client \(\mathsf {C}\). To implement such channel, we use the SAS-SMT scheme from Sect. 3 where security boils down to having \(\mathsf {D}\) and \(\mathsf {C}\) display t-bit strings (checksums) that the user checks for equality. In this way, low-entropy PINs can be replaced with full-entropy values (we refer to them as one-time keys (OTK)) that are immune to eavesdropping and bound active attacks to a success probability of \(2^{-t}\). These active attacks are impractical even for \(t=20\) (more a denial-of-service than an impersonation threat) and with larger t’s as illustrated in Sect. 6 they are just infeasible. Note that this approach works with any form of generation of OTK’s, e.g., time-based mechanisms, challenge-response between device and server, etc.

4.1 \(\mathsf {OpTFA}\) Explained

Protocol \(\mathsf {OpTFA}\) (Fig. 3) requires several mechanisms that are necessary to obtain the strong security bounds of the TFA-KE model. To provide rationale for the need of these mechanisms we show how the protocol is built bottom-up to deliver the required security properties. We stress that while the design is involved the resultant protocol is efficient and practical. The presentation and discussion of security properties here is informal but the intuition can be formalized as we do via the TFA-KE model (Sect. 2), the generic protocol \(\mathsf {GenTFA}\) in next section and the proof of Theorem 1.

In general terms, \(\mathsf {OpTFA}\) can be seen as a DE-PAKE protocol using the PTR scheme from Sect. 3 and enhanced with fresh OTKs transmitted from \(\mathsf {D}\) to \(\mathsf {C}\) via the above SAS-SMT mechanism. The OTK is generated by the device and server for each session and then included in the aPAKE interaction between \(\mathsf {C}\) and \(\mathsf {S}\). We note that \(\mathsf {OpTFA}\) treats aPAKE generically, so any such scheme can be used. In particular, we start by illustrating how \(\mathsf {OpTFA}\) works with the standard password-over-TLS aPAKE, and then generalize to the use of any aPAKE, including PKI-free ones.

This is standard password-over-TLS where the user’s password is transmitted from \(\mathsf {C}\) to \(\mathsf {S}\) under the protection of TLS.

We enhance password-over-TLS with the OTK-over-SAS mechanism described above. First, \(\mathsf {C}\) transmits the user’s password to \(\mathsf {S}\) over TLS and if the password verifies at \(\mathsf {S}\), \(\mathsf {S}\) sends a nonce \(zid\) to \(\mathsf {C}\) who relays it to \(\mathsf {D}\). On the basis of \(zid\) (which also acts as session identifier in our analysis), \(\mathsf {D}\) computes a OTK \(z=\mathsf {R}_{K_z}(zid)\) where \(\mathsf {R}\) is a PRF and \(K_z\) a key shared between \(\mathsf {D}\) and \(\mathsf {S}\). \(\mathsf {D}\) transmits z to \(\mathsf {C}\) over the SAS-SMT channel and \(\mathsf {C}\) relays it to \(\mathsf {S}\) over TLS. The user is authenticated only if the received value z is the same as the one computed by \(\mathsf {S}\).

This scheme offers defense in case of password leakage. With a full-entropy OTK it ensures security against eavesdroppers on the \(\mathsf {D}\)-\(\mathsf {C}\) link and limits the advantage of an active attacker to a probability of \(2^{-t}\) for SAS checksums of length t. However, the scheme is open to online password attacks (as in current commonly deployed schemes) because the attacker can try online guesses without having to deal with the transmission of OTK z. In addition, it offers no security against offline dictionary attacks upon server compromise.

We change \(\mathsf {OpTFA}\) 0.1 so that the user’s password \(\mathsf {pwd}\) is only transmitted to \(\mathsf {S}\) at the end of the protocol together with the OTK z (it is important that if z does not verify as the correct OTK, that the server does not reveal if \(\mathsf {pwd}\) is correct or not). This change protects the protocol against online guessing attacks and reduces the probability of the successful testing of a candidate password to \(2^{-(d+t)}\) rather than \(2^{-d}\) in version 0.1.

We add defense against offline dictionary attacks upon server compromise by resorting to the DE-PAKE construction of [37] and, in particular, to the password-to-random hardening procedure \(\mathsf{PTR}\) from Sect. 3. For this, we now assume that the user has a master password \(\mathsf {pwd}\) that \(\mathsf{PTR}\) converts into randomized passwords \(\mathsf {rwd}\) for each user account. By registering \(\mathsf {rwd}\) with server \(\mathsf {S}\) and using \(\mathsf{PTR}\) for the conversion, DE-PAKE security ensures that offline dictionary attacks are infeasible even if the server is compromised (case (3) in Definition 1). Note that the \(\mathsf{PTR}\) procedure runs between \(\mathsf {D}\) and \(\mathsf {C}\) following the establishment of the SAS-SMT channel.

We change the run of \(\mathsf{PTR}\) between \(\mathsf {D}\) and \(\mathsf {C}\) so that the value a computed by \(\mathsf {C}\) as part of \(\mathsf{PTR}\) is transmitted over the SAS-authenticated channel from \(\mathsf {C}\) to \(\mathsf {D}\). Without this authentication the strict bound of case (3) in Definition 1 (simplified for \(q_C'=0\)), namely, \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\le q_D/2^{d+t}+\epsilon \) upon server compromise, would not be met. Indeed, when the attacker compromises server \(\mathsf {S}\), it learns the key \(K_z\) used to compute the OTK z so the defense provided by OTK is lost. So, how can we still ensure the \(2^t\) denominator in the above bound expression? The answer is that by authenticating the \(\mathsf{PTR}\) value a under SAS-MA, the attacker is forced to run (expected) \(2^t\) sessions to be able to inject its own value a over that channel. Such injection is necessary for testing a password guess even when \(K_z\) is known. When considering a password dictionary of size \(2^d\) this ensures the denominator \(2^{d+t}\) in the security bound.

We add the following mechanism to \(\mathsf {OpTFA}\): Upon initialization of an authentication session (for a given user), \(\mathsf {C}\) and \(\mathsf {S}\) run an unauthenticated (a.k.a. anonymous) key exchange \(\mathsf {uKE}\) (e.g., a plain Diffie-Hellman protocol) to establish a shared key \(K_{CS}\) that they use as a MAC key applied to all subsequent \(\mathsf {OpTFA}\) messages. To see the need for \(\mathsf {uKE}\) assume it is omitted. For simplicity, consider the case where attacker \(\mathsf {A}\) knows the user’s password. In this case, all \(\mathsf {A}\) needs for impersonating the user is to learn one value of z which it can attempt by acting as a man-in-the-middle on the \(\mathsf {C}\)-\(\mathsf {D}\) channel. After \(q_D\) such attempts, \(\mathsf {A}\) has probability of \(q_D/2^t\) to learn z which together with the user’s password allows \(\mathsf {A}\) to authenticate to \(\mathsf {S}\). In contrast, the bound required by Definition 1 in this case is the stricter \(\min \{q_S,q_D\}/2^{t}\). This requires that for each attempt at learning z in the \(\mathsf {C}\)-\(\mathsf {D}\) channel, not only \(\mathsf {A}\) needs to try to break SAS-MA authentication but it also needs to establish a new session with \(\mathsf {S}\). For this we resort to the \(\mathsf {uKE}\) channel. It ensures that a response z to a value \(zid\) sent by \(\mathsf {S}\) over a \(\mathsf {uKE}\) session will only be accepted by \(\mathsf {S}\) if this response comes back on the same \(\mathsf {uKE}\) session (i.e., authenticated with the same keys used by \(\mathsf {S}\) to send the challenge \(zid\)). It means that both \(zid\) and z are exchanged with the same party. If \(zid\) was sent to the legitimate user then the attacker, even if it learns the corresponding z, cannot use it to authenticate back to \(\mathsf {S}\). We note that \(\mathsf {uKE}\) is also needed in the case that the attacker does not know the password. Without it, the success probability for this case is about a factor \(2^d/q_S\) higher than acceptable by Definition 1.

Note. When all communication between \(\mathsf {C}\) and \(\mathsf {S}\) goes over TLS, there is no need to establish a dedicated \(\mathsf {uKE}\) channel; TLS serves as such.

We stipulate that \(\mathsf {D}\) never responds twice to the same \(zid\) value (for this, \(\mathsf {D}\) keeps a stash of recently seen \(zid\)’s; older values become useless to the attacker once they time out at the server). Without this mechanism the attacker gets multiple attempts at learning z for a single challenge \(zid\). However, this would violate bound (1) (for the case \(q_C=q_C'=0\)) \(\min \{q_S,q_D\}/2^{d+t}\) which requires that each guess attempt at z be bound to the establishment of a new session of the attacker with \(\mathsf {S}\).

Finally, we generalize \(\mathsf {OpTFA}\) so that the password protocol run as the last stage of \(\mathsf {OpTFA}\) (after \(\mathsf{PTR}\) generates \(\mathsf {rwd}\)) can be implemented with any asymmetric aPAKE protocol, with or without assuming PKI, using the server-specific user’s password \(\mathsf {rwd}\). As shown in [37], running any aPAKE protocol on a password \(\mathsf {rwd}\) produced by \(\mathsf{PTR}\) results in a DE-PAKE scheme, a property that we use in an essential way in our analysis.

We need one last mechanism for \(\mathsf {C}\) to prove knowledge of z to \(\mathsf {S}\), namely, we specify that both \(\mathsf {C}\) and \(\mathsf {S}\) use z as a MAC key to authenticate the messages sent by protocol aPAKE (this is in addition to the authentication of these messages with key \(K_{CS}\)). Without this, an attack is possible where in case that \(\mathsf {OpTFA}\) fails the attacker learns if the reason for it was an aPAKE failure or a wrong z. This allows the attacker to mount an online attack on the password without the attacker having to learn the OTK. (When the aPAKE is password-over-TLS the above MAC mechanism is not needed, the same authentication effect is achieved by encrypting \(\mathsf {rwd}\) and z under the same CCA-secure ciphertext [33].)

Version 0.7 constitutes the full specification of the \(\mathsf {OpTFA}\) protocol, described in Fig. 3, with generic aPAKE.

Performance: The number of exponentiations in \(\mathsf {OpTFA}\) is reported in the introduction; implementation and performance information is presented in Sect. 6.

Security of \(\mathsf {OpTFA}\) follows from that of protocol \(\mathsf {GenTFA}\) because \(\mathsf {OpTFA}\) is its instantiation. See Theorem 1 in Sect. 5 and Corollary 1.

5 The Generic \(\mathsf {GenTFA}\) Protocol

In Fig. 4 we show protocol \(\mathsf {GenTFA}\) which is a generalization of protocol \(\mathsf {OpTFA}\) shown in Fig. 3 in Sect. 4. Protocol \(\mathsf {GenTFA}\) is a compiler which converts any secure DE-PAKE and SAS-MA schemes into a secure TFA-KE. It uses the same uKE and CCA-PKE tools as protocol \(\mathsf {OpTFA}\), but it also generalizes two other mechanisms used in \(\mathsf {OpTFA}\) as, resp. a generic symmetric Key Encapsulation Mechanism (KEM) scheme and an Authenticated Channel (AC) scheme.

A Key Encapsulation Mechanism, denoted \(({\mathsf {KemE}},{\mathsf {KemD}})\) (see e.g. [48]), allows for encrypting a random session key given a (long-term) symmetric key \(K_z\), i.e., if \((zid,z)\leftarrow {\mathsf {KemE}}(K_z)\) then \(z\leftarrow {\mathsf {KemD}}(K_z,zid)\). A KEM is secure if key z corresponding to \(zid\not \in \{zid_1,...,zid_q\}\) is pseudorandom even given the keys \(z_i\) corresponding to all \(zid_i\)’s. In protocol \(\mathsf {OpTFA}\) of Fig. 3, KEM is implemented using PRF R: \(zid\) is a random \(\kappa \)-bit string and \(z=R(K_z,zid)\). We also generalize the usage of the MAC function in \(\mathsf {OpTFA}\) as an Authenticated Channel, defined by a pair \(\mathsf {ACSend},\mathsf {ACRec}\), which implements bi-directional authenticated communication between two parties sharing a symmetric key K [29, 34]. Algorithm \(\mathsf {ACSend}\) takes inputs key K and message m and outputs m with authentication tag computed with key K, while the receiver procedure, \(\mathsf {ACRec}(K,\cdot )\), outputs either a message or the rejection symbol \(\perp \). We assume that the AC scheme is stateful and provides authenticity and protection against replay.

Fig. 4.
figure 4

Generic TFA-KE scheme: protocol \(\mathsf {GenTFA}\)

The security of \(\mathsf {GenTFA}\) is stated in the following theorem:

Theorem 1

Assuming security of the building blocks DE-PAKE, SAS, uKE, PKE, KEM, and AC, protocol \(\mathsf {GenTFA}\) is a \((T,\epsilon )\)-secure TFA-KE scheme for \(\epsilon \) upper bounded by

$$ {\epsilon ^\mathsf{{DEPAKE}}}+n\cdot (\epsilon ^\mathsf{{SAS}}+{\epsilon ^\mathsf{{uKE}}}+{\epsilon ^\mathsf{{PKE}}}+{\epsilon ^\mathsf{{KEM}}}+6{\epsilon ^\mathsf{{AC}}})+n^2/2^\kappa $$

for \(n=q_{HbC}+\max (q_S,q_D,q_C,q_C')\) where \(q_{HbC}\) denotes the number of \(\mathsf {GenTFA}\) protocol sessions in which the adversary is only eavesdropping, and each quantity of the form \({\epsilon ^{\mathsf{P}}}\) is a bound on the advantage of an attacker that works in time \(\approx T\) against the protocol building block \(\mathsf{P}\).

As a corollary we obtain a proof of TFA-KE security for protocol \(\mathsf {OpTFA}\) from Fig. 3 which uses specific secure instantiations of \(\mathsf {GenTFA}\) components. The corollary follows by applying the result of Vaudenay [50], which implies in particular that the SAS-MA scheme used in \(\mathsf {OpTFA}\) is secure in ROM, and the result of [37], which implies that the DE-PAKE used in \(\mathsf {OpTFA}\) is secure under the OM-DH assumption if the underlying \(\mathsf{aPAKE}\) is a secure asymmetric PAKE.

We note that protocol \(\mathsf {OpTFA}\) optimizes \(\mathsf {GenTFA}\) instantiated with the DE-PAKE of [37] by piggybacking the \(\mathsf {C}\)-\(\mathsf {D}\) round of communication in that protocol, \(a=H'(\mathsf {pwd})^r\) and \(b=a^k\), onto resp. \(\mathsf {C}\)’s message \(\mathsf {M_C}\) and the plaintext in \(\mathsf {D}\)’s ciphertext \(e_D\). The security proof extends to this round-optimized case because SAS-MA authentication of \(\mathsf {M_C}\) and CCA-security of PKE bind DE-PAKE messages ab to this session just as the \(\mathsf {ACSend}(K_{CD},\cdot )\) mechanism does in (non-optimized) protocol \(\mathsf {GenTFA}\).

Corollary 1

Assuming that \(\mathsf{aPAKE}\) is a secure asymmetric PAKE, \(\mathsf {uKE}\) is secure Key Exchange, \(({\mathsf {KG}},{\mathsf {Enc}},{\mathsf {Dec}})\) is a CCA-secure PKE, \(\mathsf {R}\) is a secure PRF, and MAC is a secure message authentication code, protocol \(\mathsf {OpTFA}\) is a secure TFA-KE scheme under the OM-DH assumption in ROM.

Security definition of SAS authentication. For the purpose of the proof below we state the security property assumed of a SAS-MA scheme which was informally described in Sect. 3. While [50] defines the security of SAS-MA using a game-based formulation, here we do it via the following (universally composable) functionality \(\mathsf {F}_\mathsf{{SAS}[t]}\): On input a message \([\mathsf {SAS.SEND}, sid ,P',m]\) from an honest party P, functionality \(\mathsf {F}_\mathsf{{SAS}[t]}\) sends \([\mathsf {SAS.SEND}, sid ,P,P',m]\) to \(\mathsf {A}\), and then, if \(\mathsf {A}\)’s response is \([\mathsf {SAS.CONNECT}, sid ]\), then \(\mathsf {F}_\mathsf{{SAS}[t]}\) sends \([\mathsf {SAS.SEND}, sid ,P,m]\) to \(P'\), if \(\mathsf {A}\)’s response is \([\mathsf {SAS.ABORT}, sid ]\), then \(\mathsf {F}_\mathsf{{SAS}[t]}\) sends \([\mathsf {SAS.SEND}, sid ,P,\perp ]\) to \(P'\), and if \(\mathsf {A}\)’s response is \([\mathsf {SAS.ATTACK}, sid ,m']\) then \(\mathsf {F}_\mathsf{{SAS}[t]}\) throws a coin \(\rho \) which comes out 1 with probability \(2^{-t}\) and 0 with probability \(1\,{-}\,2^{-t}\), and if \(\rho =1\) then \(\mathsf {F}_\mathsf{{SAS}[t]}\) sends \(\mathsf {succ}\) to \(\mathsf {A}\) and \([\mathsf {SAS.SEND}, sid ,P,m']\) to \(P'\), and if \(\rho =0\) then \(\mathsf {F}_\mathsf{{SAS}[t]}\) sends \(\mathsf {fail}\) to \(\mathsf {A}\) and \([\mathsf {SAS.SEND}, sid ,P,\perp ]\) to \(P'\).

In our main instantiation of the generic protocol \(\mathsf {GenTFA}\) of Fig. 4, i.e. in protocol \(\mathsf {OpTFA}\) of Fig. 3, we instantiate SAS-MA with the scheme of [50], but even though the original security argument given for it in [50] used the game-based security notion, it is straightforward to adopt this argument to see that this scheme securely realizes the above (universally composable) functionality.

Proof of Theorem 1 . Let \(\mathsf {A}\) be an adversary limited by time T playing the TFA-KE security game, which we will denote \(\mathsf {G}_0\), instantiated with the TFA-KE scheme \(\mathsf {GenTFA}\). Let the security advantage defined in Definition 1 for adversary \(\mathsf {A}\) satisfy \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}=\epsilon \). Let \(\varPi ^{\mathsf {S}}_{i}\), \(\varPi ^{\mathsf {C}}_{j}\), \(\varPi ^{\mathsf {D}}_{l}\) refer to respectively the i-th, j-th, and l-th instances of \(\mathsf {S}\), \(\mathsf {C}\), and \(\mathsf {D}\) entities which \(\mathsf {A}\) starts up. Let t be the SAS channel capacity, \(\kappa \) the security parameter, \(q_S,q_D,q_C,q_C'\) the limits on the numbers of rogue sessions of \(\mathsf {S}\), \(\mathsf {D}\), \(\mathsf {C}\) when communicating with \(\mathsf {S}\), and \(\mathsf {C}\) when communicating with \(\mathsf {D}\), and let \(q_{HbC}\) be the number of \(\mathsf {GenTFA}\) protocol sessions in which \(\mathsf {A}\) plays only a passive eavesdropper role except that we allow \(\mathsf {A}\) to abort any of these protocol executions at any step. Let \(n_S=q_S+q_{HbC}\), \(n_D=q_D+q_{HbC}\), \(n_C=q_C+q_C'+q_{HbC}\), and note that these are the ranges of indexes ijl for instances \(\varPi ^{\mathsf {S}}_{i}\), \(\varPi ^{\mathsf {C}}_{j}\), and \(\varPi ^{\mathsf {D}}_{l}\). We will use [n] to denote range \(\{1,...,n\}\).

The security proof goes by cases depending on the type of \(\mathsf{corrupt}\) queries \(\mathsf {A}\) makes. In all cases the proof starts from the security-experiment game \(\mathsf {G}_0\) and proceeds via a series of game changes, \(\mathsf {G}_1\), \(\mathsf {G}_2\), etc., until a modified game \(\mathsf {G}_i\) allows us to reduce an attack on the DE-PAKE with the same corruption pattern (except in the case of corrupt client \(\mathsf {C}\)) to the attack on \(\mathsf {G}_i\). In the case of the corrupt client the argument is different because it does not rely on the underlying DE-PAKE (note that DE-PAKE does not provide any security properties in the case of client corruption). In some game changes we will consider a modified adversary algorithm, for example an algorithm constructed from the original adversary \(\mathsf {A}\) interacting with a simulator of some higher-level procedure, e.g. the \(\mathrm{SAS{-}MA}\) simulator. Wlog, we use \(\mathsf {A}_i\) for an adversary algorithm in game \(\mathsf {G}_i\).

We will use \(p_i\) to denote the probability that \(\mathsf {A}_i\) interacting with game \(\mathsf {G}_i\) outputs \(b'\) s.t. \(b'=b\) where b is the bit chosen by the game on the test session. Recall that when \(\mathsf {A}\) makes the test session query \(\mathsf{test}(P,i)\), for \(P\in \{\mathsf {S},\mathsf {C}\}\), then, assuming that instance \(\varPi ^{P}_{i}\) produced a session key \({\mathsf {sk}}\), game \(\mathsf {G}_0\) outputs that session key if \(b=1\) or produces a random string of equal size if \(b=0\) (and if session \(\varPi ^{P}_{i}\) did not produce the key then \(\mathsf {G}_0\) outputs \(\perp \) regardless of bit b). Note that by assumption \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}=\epsilon \) we have that \(p_0=1/2+1/2\cdot {\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}=1/2+\epsilon /2\).

Case 1: No party is compromised. This is the case when \(\mathsf {A}\) makes no \(\mathsf{corrupt}\) queries, i.e. it’s the default “network adversary” case. For lack of space we describe below only the game changes in the proof, and we state what we claim about the effects of that game change and what assumption we use. The full details of the proof are included in the full version of the paper [38].

Game \(\mathsf {G}_{1}:\) Let Z be a random function which maps onto \(\kappa \)-bit strings. If \((zid_i,z_i)\) dentes the KEM (ciphertext,key) pair generated by \(\varPi ^{\mathsf {S}}_{i}\) then in \(\mathsf {G}_1\) we set \(z_i=Z(zid_i)\) instead of using \({\mathsf {KemE}}\), and we abort if there is ever a collision in \(z_i\) values. Security of KEM implies that \(p_1\le p_0+{\epsilon ^\mathsf{{KEM}}}(n_S)+n_S^2/2^{\kappa }\).

Game \(\mathsf {G}_{2}:\) Here we replace the SAS-MA procedure with the simulator \({\mathsf {SIM}}_\mathsf{{SAS}}\) implied by the UC security of the SAS-MA scheme of [50]. In other words, whenever \(\varPi ^{\mathsf {C}}_{j}\) and \(\varPi ^{\mathsf {D}}_{l}\) execute the \(\mathrm{SAS{-}MA}\) sub-protocol, we replace this execution with a simulator \({\mathsf {SIM}}_\mathsf{{SAS}}\) interacting with \(\mathsf {A}\) and the ideal \(\mathrm{SAS{-}MA}\) functionality \(\mathsf {F}_\mathsf{{SAS}[t]}\). For example, \(\varPi ^{\mathsf {C}}_{j}\), instead of sending \(\mathsf {M_C}=({\mathsf {pk}},zid)\) to \(\mathsf {A}_1\) and starting a \(\mathrm{SAS{-}MA}\) instance to authenticate \(\mathsf {M_C}\) to \(\mathsf {D}\), will send \([\mathsf {SAS.SEND}, sid ,\varPi ^{\mathsf {D}}_{l},\mathsf {M_C}]\) to \(\mathsf {F}_\mathsf{{SAS}[t]}\), which triggers \({\mathsf {SIM}}_\mathsf{{SAS}}\) to start simulating to \(\mathsf {A}\) the \(\mathrm{SAS{-}MA}\) protocol on input \(\mathsf {M_C}\) between \(\varPi ^{\mathsf {C}}_{j}\) and \(\varPi ^{\mathsf {D}}_{l}\). The rules of \(\mathsf {F}_\mathsf{{SAS}[t]}\) imply that \({\mathcal {A}}\) can make this connection either succeed, abort, or, if it attacks it then \(\varPi ^{\mathsf {D}}_{l}\) will abort with probability \(1-2^{-t}\), but with probability \(2^{-t}\) it will accept \({\mathcal {A}}\)’s message \(\mathsf {M_C}^*\) instead of \(\mathsf {M_C}\). Security of \(\mathrm{SAS{-}MA}\) implies that \(p_2\le p_1+\min (n_C,n_D)\cdot \epsilon ^\mathsf{{SAS}}\).

Game \(\mathsf {G}_{3}:\) Here we re-name entities involved in game \(\mathsf {G}_2\). Note that adversary \(\mathsf {A}_2\) interacts with \(\mathsf {G}_2\) which internally runs algorithms \({\mathsf {SIM}}_\mathsf{{SAS}}\) and \(\mathsf {F}_\mathsf{{SAS}[t]}\), and that \({\mathsf {SIM}}_\mathsf{{SAS}}\) interacts only with \(\mathsf {F}_\mathsf{{SAS}[t]}\) on one end and \(\mathsf {A}_2\) on the other. We can therefore draw the boundaries between the adversarial algorithm and the security game slightly differently, by considering an adversary \(\mathsf {A}_3\) which executes the steps of \(\mathsf {A}_2\) and \({\mathsf {SIM}}_\mathsf{{SAS}}\), and a security game \(\mathsf {G}_3\) which executes the rest of game \(\mathsf {G}_2\), including the operation of functionality \(\mathsf {F}_\mathsf{{SAS}[t]}\). In other words, \(\mathsf {G}_3\) interacts with \(\mathsf {A}_3\) using the \(\mathsf {F}_\mathsf{{SAS}[t]}\) interface to \({\mathsf {SIM}}_\mathsf{{SAS}}\), i.e. \(\mathsf {G}_3\) sends to \(\mathsf {A}_3\) messages of the type \([\mathsf {SAS.SEND}, sid ,\varPi ^{\mathsf {C}}_{j},\varPi ^{\mathsf {D}}_{l},\mathsf {M_C}]\), and \(\mathsf {A}_3\)’s response must be one of \([\mathsf {SAS.CONNECT}, sid ]\), \([\mathsf {SAS.ABORT}, sid ]\), and \([\mathsf {SAS.ATTACK}, sid ,\mathsf {M_C}^*]\). Since we are only re-drawing the boundaries between the adversarial algorithm and the security game, we have that \(p_3=p_2\).

Game \(\mathsf {G}_{4}:\) Here we change game \(\mathsf {G}_3\) s.t. if \(\mathsf {A}\) sends \([\mathsf {SAS.CONNECT}, sid ]\) to let the SAS-MA instance go through between \(\varPi ^{\mathsf {C}}_{j}\) and \(\varPi ^{\mathsf {D}}_{l}\) with \(\mathsf {M_C}\) containing \(\varPi ^{\mathsf {C}}_{j}\)’s key \({\mathsf {pk}}\), then we replace the ciphertext \(e_D\) subsequently sent by \(\varPi ^{\mathsf {D}}_{l}\) by encrypting a constant string instead of \({\mathsf {Enc}}({\mathsf {pk}},(z,K_{CD}))\), and if \(\mathsf {A}\) passes this \(e_D\) to \(\varPi ^{\mathsf {C}}_{j}\) then it decrypts it as \((z,K_{CD})\) generated by \(\varPi ^{\mathsf {D}}_{l}\). In other words, we replace the encryption under SAS-authenticated key \({\mathsf {pk}}\) by a “magic” delivery of the encrypted plaintext. The CCA security of PKE implies that \(p_4\le p_3+\min (n_C,n_D)\cdot {\epsilon ^\mathsf{{PKE}}}\).

Game \(\mathsf {G}_{5}:\) Here we abort if, assuming that key \({\mathsf {pk}}\) and ciphertext \(e_D\) were exchanged between \(\varPi ^{\mathsf {C}}_{j}\) and \(\varPi ^{\mathsf {D}}_{l}\) correctly, any party accepts wrong messages in the subsequent DE-PAKE execution authenticated by \(K_{CD}\) created by \(\varPi ^{\mathsf {D}}_{l}\). The authentic channel security implies that \(p_5\le p_4+\min (n_C,n_D)\cdot {\epsilon ^\mathsf{{AC}}}\).

Game \(\mathsf {G}_{6}:\) We perform some necessary cleaning-up, and abort if the SAS-MA instance between \(\varPi ^{\mathsf {C}}_{j}\) and \(\varPi ^{\mathsf {D}}_{l})\) sent \(\mathsf {M_C}\) correctly, but adversary did not deliver \(\varPi ^{\mathsf {D}}_{l}\)’s response \(e_D\) back to \(\varPi ^{\mathsf {C}}_{j}\) and yet \(\varPi ^{\mathsf {D}}_{l}\) did not abort in subsequent DE-PAKE. Since this way \(\varPi ^{\mathsf {C}}_{j}\) has no information about key \(K_{CD}\) we get \(p_6\le p_5 +q_D\cdot {\epsilon ^\mathsf{{AC}}}\).

Game \(\mathsf {G}_{7}:\) We replace the keys created by \(\mathsf {uKE}\) for every \(\varPi ^{\mathsf {S}}_{i}\)-\(\varPi ^{\mathsf {C}}_{j}\) session in step I.1 on which \(\mathsf {A}\) was only an eavesdropper, with random keys. Security of \(\mathsf {uKE}\) implies that \(p_7\le p_6+\min (n_C,n_S)\cdot {\epsilon ^\mathsf{{uKE}}}\).

At this point the game has the following properties: If \(\mathsf {A}\) is passive on the \(\mathsf {C}\)-\(\mathsf {S}\) key exchange in step I then \(\mathsf {A}\) is forced to be passive on the \(\mathsf {C}\)-\(\mathsf {S}\) link in the DE-PAKE in step III. Also, if \(\mathsf {A}\) does not attack the \(\mathrm{SAS{-}MA}\) and delivers \(\mathsf {D}\)’s response to \(\mathsf {C}\) then \(\mathsf {A}\) is forced to be passive on the \(\mathsf {C}\)-\(\mathsf {D}\) link in the DE-PAKE in step III (and if \(\mathsf {A}\) does not deliver \(\mathsf {D}\)’s response to \(\mathsf {C}\) then this \(\mathsf {D}\) instance will abort too). The remaining cases are either (1) active attacks on the key exchange in step I or (2) when \(\mathsf {A}\) attacks the \(\mathrm{SAS{-}MA}\) sub-protocol and gets \(\mathsf {D}\) to accept \(\mathsf {M_C}*\ne \mathsf {M_C}\) or (3) \(\mathsf {A}\) sends \(e_D^*\ne e_D\) to \(\mathsf {C}\). In handling these cases the crucial issue is what \(\mathsf {A}\) does with the \(zid\) created by \(\mathsf {S}\). Consider any \(\mathsf {S}\) instance \(\varPi ^{\mathsf {S}}_{i}\) in which the adversary interferes with the key exchange protocol in step I.1. Without loss of generality assume that the adversary learns key \(K_{CS}\) output by \(\varPi ^{\mathsf {S}}_{i}\) in this step. Note that \(\mathsf {D}\) keeps a variable \(\mathsf {zid}\mathsf {Set}\) in which it stores all \(zid\) values it ever receives, and that \(\mathsf {D}\) aborts if it sees any \(zid\) more than once. Therefore each game execution defines a 1-1 function \(L:[n_S]\rightarrow [n_D]\cup \{\perp \}\) s.t. if \(L(i)\ne \perp \) then L(i) is the unique index in \([n_D]\) s.t. \(\varPi ^{\mathsf {D}}_{L(i)}\) receives \(\mathsf {M_C}=({\mathsf {pk}},zid_i)\) in step II.1 for some \({\mathsf {pk}}\), and \(L(i)=\perp \) if and only if no \(\mathsf {D}\) session receives \(zid_i\). If \(L(i)\ne \perp \) then we consider two cases: First, if \(\mathsf {M_C}=({\mathsf {pk}},zid_i)\) which contains \(zid_i\) originates with some session \(\varPi ^{\mathsf {C}}_{j}\), and second if \(\mathsf {M_C}=({\mathsf {pk}},zid_i)\) is created by the adversary.

Game \(\mathsf {G}_{9}:\) Let \(\varPi ^{\mathsf {S}}_{i}\) and \(\varPi ^{\mathsf {C}}_{j}\) be rogue sessions s.t. \(\mathsf {A}\) sends \(zid_i\) to \(\varPi ^{\mathsf {C}}_{j}\) in step I.2, but then stop \(\varPi ^{\mathsf {C}}_{j}\) from getting the corresponding \(z_i\) by either attacking SAS-MA or misdelivering \(\mathsf {D}\)’s response \(e_D\). In that case neither \(\varPi ^{\mathsf {C}}_{j}\) nor \(\mathsf {A}\) have any information about \(z_i\), and therefore \(\varPi ^{\mathsf {S}}_{i}\) should reject. Namely, if in \(\mathsf {G}_9\) we set \(\varPi ^{\mathsf {S}}_{i}\)’s output to \(\perp \) in such cases then \(p_9\le p_8+q_S\cdot {\epsilon ^\mathsf{{AC}}}\).

Game \(\mathsf {G}_{10}:\) Let \(\varPi ^{\mathsf {S}}_{i}\) and \(\varPi ^{\mathsf {C}}_{j}\) be rogue sessions and \(\mathsf {A}\) send \(zid_i\) to \(\varPi ^{\mathsf {C}}_{j}\) as above, but now consider the case that \(\mathsf {A}\) lets \(\varPi ^{\mathsf {C}}_{j}\) learn \(z_i\) but \(\mathsf {A}\) does not learn \(z_i\) itself, i.e. \(\mathsf {A}\) lets SAS-MA and \(e_D\) go through. In this case we will abort if in DE-PAKE communication in Step III between \(\varPi ^{\mathsf {S}}_{i}\) and \(\varPi ^{\mathsf {C}}_{j}\) either party accepts a message not sent by the other party. Since \(\mathsf {A}\) has no information about \(z_i\) the authenticated channel security implies that \(p_{10}\le p_9+\min (q_C,q_S)\cdot {\epsilon ^\mathsf{{AC}}}\).

Note that at this point if \(\mathsf {A}\) interferes with the KE in step I.1 with session \(\varPi ^{\mathsf {S}}_{i}\), sends \(zid_i\) to some \(\varPi ^{\mathsf {C}}_{j}\) and does not send it to some \(\varPi ^{\mathsf {D}}_{l}\) by sending \([\mathsf {SAS.ATTACK}, sid ,({\mathsf {pk}}^*,zid_i)]\) for any l then \(\mathsf {A}\) is forced to be a passive eavesdropper on the DE-PAKE protocol in step III. Note that this holds when \(L(i)=l\) s.t. the game issues \([\mathsf {SAS.SEND}, sid ,\varPi ^{\mathsf {C}}_{j},\varPi ^{\mathsf {D}}_{l},({\mathsf {pk}},zid_i)]\) for some \({\mathsf {pk}}\), i.e. if some \(\varPi ^{\mathsf {D}}_{l}\) receives value \(zid_i\), it receives it as part of a message \(\mathsf {M_C}\) sent by some \(\varPi ^{\mathsf {C}}_{j}\).

Game \(\mathsf {G}_{11}:\) Finally consider the case when \(\mathsf {A}\) itself sends \(zid_i\) to \(\mathsf {D}\), i.e. when \(L(i)=l\) s.t. \(\mathsf {A}\) sends \([\mathsf {SAS.ATTACK}, sid ,\mathsf {M_C}^*=({\mathsf {pk}}^*,zid_i)]\) in response to \([\mathsf {SAS.SEND}, sid ,\varPi ^{\mathsf {C}}_{j},\varPi ^{\mathsf {D}}_{l},\mathsf {M_C}]\), but the \(\mathsf {F}_\mathsf{{SAS}[t]}\) coin-toss comes out \(\rho _l=0\), i.e. \(\mathsf {A}\) fails in this SAS-MA attack. In that case we can let \(\varPi ^{\mathsf {S}}_{i}\) abort in step III because if \(\rho _l=0\) then \(\mathsf {A}\) has no information about \(z_i=Z(zid_i)\), hence \(p_{11}\le p_{10}+q_S\cdot {\epsilon ^\mathsf{{AC}}}\).

After these game changes, we finally make a reduction from an attack on underlying DE-PAKE to an attack on TFA-KE. Namely, we construct \(\mathsf {A}^*\) which achieves advantage \(\mathsf {Adv}_{\mathsf {A}^*}^\mathsf{{DEPAKE}}=2\cdot (p_{11}-1/2)\) against DE-PAKE, and makes \(q_S^*,q_D^*,q_C,q_C\) rogue queries respectively to \(\mathsf {S}\), \(\mathsf {D}\), to \(\mathsf {C}\) on its connection to \(\mathsf {S}\), and to \(\mathsf {C}\) on its connection with \(\mathsf {D}\), where \(q_S^*=q_D^*=q^*\) where \(q^*\) is a random variable equal to the sum of \(q=\min (q_S,q_D)\) coin tosses which come out 1 with probability \(2^{-t}\) and 0 with probability \(1-2^{-t}\). Recall that \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}=2\cdot (p_0-1/2)\) and that by the game changes above we have that \(|p_{11}-p_0|\) is a negligible quantity, and hence \(\mathsf {Adv}_{\mathsf {A}^*}^\mathsf{{DEPAKE}}\) is negligibly close to \({\mathsf {Adv}_\mathsf {A}^\mathsf{{TFA}}}\).

The reduction goes through because after the above game-changes \(\mathsf {A}\) can either essentially let a DE-PAKE instance go through undisturbed, or it can attempt to actively attack the underlying DE-PAKE instance either via a rogue \(\mathsf {C}\) session or via rogue sessions with device \(\mathsf {S}\) and server \(\mathsf {D}\). However, each rogue \(\mathsf {D}\) session is bound to a unique rogue \(\mathsf {S}\) session, because of the uKE and (zidz) mechanism, and for each such \(\mathsf {D},\mathsf {S}\) session pair, the probability that an active attack is not aborted is only \(2^{-t}\). This implies that the \((q_S,q_D,q_C)\) parameters characterizing the TFA-KE attacker \(\mathsf {A}\) scale-down to \((q_S/2^t,q_D/2^t,q_C)\) parameters for the resulting DE-PAKE attacker \(\mathsf {A}^*\), which leads to the claimed security bounds by the security of DE-PAKE. The details of construction for \(\mathsf {A}^*\) and the above argument are included in the full version of this paper [38].

Case 2: Party corruptions. In the full version of the paper [38] we include the cases of client corruption and of device and/or server corruption, showing that our scheme achieves all the bounds from Definition 1. Here we just comment on how these bounds are derived. For the case of device corruption, the value z is learned by the attacker hence it is equivalent to setting \(t=0\). Also, rogue queries to \(\mathsf {D}\) are free for the attacker hence \(q_D\) is virtually unbounded (can think of it as “infinity”). Setting these values in the bound of Case 1, one obtains the claimed bound \((q_C+q_S)/2^d\) for the case of device corruption. Similarly, in case of server corruption one sets \(q_S\) to “infinity”. In addition, and in spite of the attacker learning z in this case, one obtains a bound involving \(2^{-t}\) thanks to the fact that we run the PTR protocol over the SAS channel, hence reducing the probability of the attacker successfully testing a candidate password \(\mathsf {pwd}'\) by \(2^{-t}\). In the case of client compromise where the attacker learns the user’s password \(\mathsf {pwd}\), we set \(d=0\) (a dictionary of size 1) and set \(q_C=q_C'=0\) since \(\mathsf {C}\) is corrupted and the attacker cannot choose a test session at \(\mathsf {C}\). Finally, when both \(\mathsf {D}\) and \(\mathsf {S}\) (but not C) are corrupted one gets the same security as plain DE-PAKE, namely, requiring a full offline dictionary attack to recover \(\mathsf {pwd}\).

6 System Development and Testing

Here we report on an experimental prototype of protocol \(\mathsf {OpTFA}\) from Fig. 3 on page 12 and present novel designs for the SAS channel implementation. We experiment with \(\mathsf {OpTFA}\) using two different instantiations of the password protocol between \(\mathsf {C}\) and \(\mathsf {S}\). One is PKI-based that runs \(\mathsf {OpTFA}\) over a server-authenticated TLS connection; in particular, it uses this connection in lieu of the \(\mathsf {uKE}\) in step I and implements step III by simply transmitting the concatenation of password \(\mathsf {rwd}\) and the value z under the TLS authenticated encryption. The second protocol we experimented with is a PKI-free asymmetric PAKE borrowed from [27, 36]. Roughly, it runs the same \(\mathsf{PTR}\) protocol as described in Sect. 3 but this time between \(\mathsf {C}\) and \(\mathsf {S}\). \(\mathsf {C}\)’s input is \(\mathsf {rwd}\) and the result \(F_k(\mathsf {rwd})\) serves as a user’s private key for the execution of an authenticated key-exchange between \(\mathsf {C}\) and \(\mathsf {S}\). We implement the latter with HMQV [41] (as an optimization, the DH exchange used to implement \(\mathsf {uKE}\) in step I of \(\mathsf {OpTFA}\) is “reused” in HMQV).

In Table 1 we provide execution times for the various protocol components, including times for the TLS-based protocol and the PKI-free one with some elements borrowed from the implementation work from [37]. We build on the following platform. The webserver \(\mathsf {S}\) is a Virtual Machine running Debian 8.0 with 2 Intel Xeon 3.20 GHz and 3.87 GB of memory. Client terminal \(\mathsf {C}\) is a MacBook Air with 1.3 GHz Intel Core i5 and 4 GB of memory. Device \(\mathsf {D}\) is a Samsung Galaxy S5 smartphone running Android 6.0.1. \(\mathsf {C}\) and \(\mathsf {D}\) are connected to the same WiFi network with the speed of 100 Mbps and \(\mathsf {S}\) has Internet connection speed of 1 Gbps. The server side code is implemented in HTML5, PHP and JavaScipt. On the client terminal, the protocol is implemented in JavaScript as an extension for the Chrome browser and the smartphone app in Java for Android phones.

All DH-based operations (\(\mathsf{PTR}\), key exchange and SAS-SMT encryption) use elliptic curve NIST P-256, and hashing and PRF use HMAC-SHA256. Hashing into the curve is implemented with simple iterated hashing till an abscissa x on the curve is found (it will be replaced with a secure mechanism such as [26]).

Table 1. Average execution time of \(\mathsf {OpTFA}\) and its components (10,000 iterations)

Communication between \(\mathsf {C}\) and \(\mathsf {S}\) uses a regular internet connection between the browser \(\mathsf {C}\) and web server \(\mathsf {S}\). Communication between \(\mathsf {C}\) and \(\mathsf {D}\) (except for checksum comparison) goes over the internet using a bidirectional Google Cloud Messaging (GCM) [5], in which \(\mathsf {D}\) acts as the GCM server and \(\mathsf {C}\) acts as the GCM client. GCM involves a registration phase during which GCM client (here \(\mathsf {C}\)) registers with the GCM generated client ID to the GCM server (here \(\mathsf {D}\)), to assure that \(\mathsf {D}\) only responds to the registered clients. In case that the PAKE protocol in \(\mathsf {OpTFA}\) is implemented with password-over-TLS, [37] specifies the need for \(\mathsf {D}\) to authenticate the \(\mathsf{PTR}\) value b sent to \(\mathsf {C}\) (see Sect. 3). In this case, during the GCM registration we install at \(\mathsf {C}\) a signature public key of \(\mathsf {D}\).

6.1 Checksum Validation Design

An essential component in our approach and solutions (in particular in protocol \(\mathsf {OpTFA}\)) is the use of a SAS channel implemented via the user-assisted equality verification of checksums displayed by both \(\mathsf {C}\) and \(\mathsf {D}\) (denoted hereafter as \(\mathsf {checksum}_C\) and \(\mathsf {checksum}_D\), resp.). Here we discuss different implementations of such user-assisted verification which we have designed and experimented with.

Manual Checksum Validation. In the simplest approach, the user compares the checksums displayed on \(\mathsf {D}\) and \(\mathsf {C}\) and taps the Confirm button on \(\mathsf {D}\) in case the two match [49]. Although, this type of code comparison has recently been deployed in TFA systems, e.g., [8], it carries the danger of neglectful users pressing the confirm button without comparing the checksum strings. Another common solution for checksum validation is “Copy-Confirm” [49] where the user types the checksum displayed on \(\mathsf {C}\) into \(\mathsf {D}\), and only if this matches \(\mathsf {D}\)’s checksum does \(\mathsf {D}\) proceeds with the protocol. We implemented this scheme using a 6 digit number. We stress that in spite of the similarity between this mechanism and PIN copying in traditional TFA schemes, there is an essential security difference: Stealing the PIN in traditional schemes suffices to authenticate instead of the user (for an attacker that holds the user’s password) while stealing the checksum value entered by the user in \(\mathsf {OpTFA}\) is worthless to the attacker (the checksum is a validation code, not the OTK value needed for authentication).

The above methods using human visual examination and/or copying limit the SAS channel capacity (typically to 4–6 digits) and may degrade usability [46]. As an alternative we consider the following designs (however one may fallback to the manual schemes when the more secure schemes below cannot be used, e.g., missing camera or noisy environments).

QR Code Checksum Validation. In this checksum validation model, we encode the full, 256-bit checksum computed in protocol \(\mathsf {OpTFA}\) into a hexstring and show it as a \(230 \times 230\) pixel QR Code on the web-page. We used ZXing library to encode the QR code and display it on the web page and read and decode it \(\mathsf {D}\). To send the checksum to \(\mathsf {D}\), the user opens the app on \(\mathsf {D}\) and captures the QR code. \(\mathsf {D}\) decodes the QR code and compares checksums, and proceeds with the protocol if the match happens. In this setting, the user does not need to enter the checksum but only needs to hold her phone and capture a picture of the browser’s screen. With the larger checksum (\(t=256\)) active attacks on SAS-SMT turn infeasible and the expressions \(2^{-t}\) in Definition 1) negligible.

Voice-based Checksum Validation. We implement a voice-based checksum validation approach that assumes a microphone-equipped device (typically a smartphone) where the user speaks a numerical checksum displayed by the client into the device. The device \(\mathsf {D}\) receives this audio, recognizes and transcribes it using a speech recognition tool, and then compares the result with the checksum computed by \(\mathsf {D}\) itself. The client side uses a Chrome extension as in the manual checksum validation case while on the device we developed a transcriber application using Android.Speech API. The user clicks on a “Speak” button added to the app and speaks out loud the displayed number (6-digit in our implementation). The transcriber application in \(\mathsf {D}\) recognizes the speech and convert it to text that is then compared to \(\mathsf {D}\)’s checksum. To further improve the usability of this approach one can incorporate a text-to-speech tool that would speak the checksum automatically (i.e., replacing the user). The transcription approach would perhaps be easy for the users to employ compared to the QR-based approach, but would only be suitable if the user is in an environment that is non-noisy and allows her to speak out-loud. We note that the QR-code and audio-based approaches do not require a browser plugin or add-on and can be deployed on any browser with HTML5 support.

Performance Evaluation. As preliminary information, we report on 30 checksum validation iterations performed by one experimenter. The time taken by manual checksum validation was 8.50 s on average (standard deviation 2.84 s). The time taken by QR-Coded validation was 4.87 s on average for capturing the code (standard deviation 1.32s) and 0.02 s on average for decoding the code (standard deviation 0.00s). The time taken by audio-based validation was 4.08 s on average for speaking the checksum (standard deviation 0.34 s) and 1.18 s on average for transcribing the spoken checksum (standard deviation 0.42 s). The average time for these tasks may vary between different users. The time taken by the device to perform the checksum comparison is negligible. Our preliminary testing of these two channels shows virtually-0 error rate.

7 Discussion of Related Work

Device-enhanced password-authentication with security against offline dictionary attacks (ODA). There are several proposals in cryptographic literature for password authentication schemes that utilize an auxiliary computing component to protect against ODA in case of server compromise. This was a context of the Password Hardening proposal of Ford-Kaliski [31], which was generalized as Hidden Credential Retrieval by Boyen [27], and then formalized as (Cloud) Single Password Authentication (SPA) by Acar et al. [23] and as a Device-Enhanced PAKE (DE-PAKE) by Jarecki et al. [37]. These schemes are functionally similar to a TFA scheme if the role of the auxiliary component is played by the user’s device \(\mathsf {D}\), but they are insecure in case of password leakage e.g. via client compromise.Footnote 3 The threat of an ODA attack on compromise of an authentication server also motivated the notion of Threshold Password Authenticated Key Exchange (T-PAKE) [44], i.e. a PAKE in which the password-holding server is replaced by n servers so that a corruption of up to \(t<n\) of them leaks no information about the password. In addition to general T-PAKE’s, several solutions were also given for the specific case of \(n\,{=}\,2\) servers tolerating \(t\,{=}\,1\) corruption, known as 2-PAKE [28, 40], and every 2-PAKE, with the user’s device \(\mathsf {D}\) playing the role of the second server, is a password authentication scheme that protects against ODA in case of server compromise. However, as in the case of [23, 27, 31, 37], if a password is leaked then 2-PAKE offers no security against an active attacker who engages with a single 2-PAKE session.

TFA with ODA security. Shirvanian et al. [47] proposed a TFA scheme which extends the security of traditional PIN-based TFAs against ODA in case of server compromise. However, \(\mathsf {OpTFA}\) offers several advantages compared to [47]: First, [47] relies on PKI (the client sends the password and the one-time key, OTK, to the PKI-authenticated server) while \(\mathsf {OpTFA}\) has both a PKI-model and a PKI-free instantiation. Second, [47] assumes full security of the t-bit \(\mathsf {D}\)-\(\mathsf {C}\) channel for OTK transmission while we reduce this assumption to a t-bit authenticated channel between \(\mathsf {C}\) and \(\mathsf {D}\). Consequently, we improve user experience by replacing the read-and-copy action with simpler and easier compare-and-confirm. On the other hand, [47] can use only the t-bit secure \(\mathsf {D}\)-\(\mathsf {C}\) link while \(\mathsf {OpTFA}\) requires transmission of full-entropy values between \(\mathsf {D}\) and \(\mathsf {C}\).

TFA with the 2nd factor as a local cryptographic component. Some Two-Factor Authentication schemes consider a scenario where the 2nd factor is a device \(\mathsf {D}\) capable of storing cryptographic keys and performing cryptographic algorithms, but unlike in our model, \(\mathsf {D}\) is connected directly to client \(\mathsf {C}\), i.e. it effectively communicates with \(\mathsf {C}\) over secure links. (However, security must hold assuming the adversary can stage a lunch-time attack on device \(\mathsf {D}\), so \(\mathsf {D}\) cannot simply hand off its private keys to \(\mathsf {C}\).) The primary example is a USB stick, like YubiKey [13], implementing e.g. the FIDO U2F authentication protocol [2, 42]. A generalized version of this problem, including biometric authentication, was formalized by Pointcheval and Zimmer as Multi-Factor Authentication [45], but the difference between that model and our TFA-KE notion is that we consider device \(\mathsf {D}\) which has no pre-set secure channel with client \(\mathsf {C}\). Moreover, to the best of our knowledge, all existing MFA/TFA schemes even in the secure-channel \(\mathsf {D}\)-\(\mathsf {C}\) model are still insecure against ODA on server compromise, except for the aforementioned TFA of Shirvanian et al. [47].

Alternatives to PIN-based TFA with remote auxiliary device. Many TFA schemes improve on PIN-based TFAs by either reducing user involvement, by not requiring the user to copy a PIN from \(\mathsf {D}\) to \(\mathsf {C}\), or by improving on its online security, but none of them protect against ODA in case of server compromise, and their usability and online security properties also have downsides.

PhoneAuth [30] and Authy [11] replace PINs with \(\mathsf {S}\)-to-\(\mathsf {D}\) challenge-response communication channeled by \(\mathsf {C}\), but they require a pre-paired Bluetooth connection to secure the \(\mathsf {C}\)-\(\mathsf {D}\) channel. A full-bandwidth secure \(\mathsf {C}\)-\(\mathsf {D}\) channel reduces the three-party TFA notion to a two-party setting, where device \(\mathsf {D}\) is a local component of client \(\mathsf {C}\), but requiring an establishment of such secure connection between a browser \(\mathsf {C}\) and a cell phone \(\mathsf {D}\) makes a TFA scheme harder to use. TFA schemes like SlickLogin (acquired by Google) [3], Sound-Login [9], and Sound-Proof [39] in essence attempt to implement such secure \(\mathsf {C}\)-to-\(\mathsf {D}\) channel using physical security assumptions on physical media e.g. near-ultrasounds [3], audible sounds [9], or ambient sounds detecting proximity of \(\mathsf {D}\) to \(\mathsf {C}\) [39], but they are subject to eavesdropping attacks and co-located attackers.

Several TFA proposals, including Google Prompt [8] and Duo [1], follow a one-click approach to minimize user’s involvement if \(\mathsf {D}\) is a data-connected device like a smartphone. In [1, 8] \(\mathsf {S}\) communicates directly over data-network to \(\mathsf {D}\), which prompts the user to approve (or deny) an authentication session, where the approve action prompts \(\mathsf {D}\) to respond in an entity authentication protocol with \(\mathsf {S}\), e.g. following the U2F standard [2]. This takes even less user’s involvement than the compare-and-confirm action of our TFA-KE, but it does not establish a strong binding between the \(\mathsf {C}\)-\(\mathsf {S}\) login session and the \(\mathsf {D}\)-\(\mathsf {S}\) interaction. E.g., if the adversary knows the user’s password, and hence the TFA security depends entirely on \(\mathsf {D}\)-\(\mathsf {S}\) interaction, a man-in-the-middle adversary who detects \(\mathsf {C}\)’s attempt to establish a session with \(\mathsf {S}\), and succeeds in establishing a session with \(\mathsf {S}\) before \(\mathsf {C}\) does, will authenticate as that user to \(\mathsf {S}\) because the honest user’s approval on \(\mathsf {D}\)’s prompt will result in \(\mathsf {S}\) authenticating the adversarial session.