Keywords

1 Introduction

In this paper, we deal with the problem of computing discrete logarithms when the radix-b representation of the exponent sought is known to have low weight (i.e., only a small number of nonzero digits). We propose several new baby-step giant-step algorithms for solving such discrete logarithms in time depending mostly on the radix-b weight (and length) of the exponent.

Briefly, the discrete logarithm (DL) problem in a multiplicative group \({\mathbb {G}}\) of order q is the following: Given as input a pair \((g,h)\in {\mathbb {G}}\times {\mathbb {G}}\), output an exponent \(x\in {\mathbb {Z}}_q\) such that \(h=g^x\), provided one exists. The exponent x is called a discrete logarithm of h with respect to the base g and is denoted, using an adaptation of the familiar notation for logarithms, by \(x\equiv \log _gh\bmod {q}\). A longstanding conjecture, commonly referred to as the DL assumption, posits that the DL problem is “generically hard”; that is, that there exist infinite families of groups in which no (non-uniform, probabilistic) polynomial-time (in \(\lg {q}\)) algorithm can solve uniform random instances of the DL problem with inverse polynomial (again, in \(\lg {q}\)) probability.

Our results do not refute (or even pose a serious challenge to) the DL assumption. Indeed, although our algorithms are generic,Footnote 1 they do not apply to uniform random DL instances nor do they generally run in polynomial time. Rather, we demonstrate that, for certain non-uniform instance distributions, one can solve the DL problem in time that depends mostly on a parameter strictly smaller than \(\lg {q}\). Specifically, to solve a DL problem instance in which the radix-b representation of the exponent has length m and weight t, our fastest deterministic algorithm evaluates group operations and stores group elements in the worst case; for the same problem, our randomized (Las Vegas) algorithm evaluates fewer than group operations and stores group elements, on average. For the special case of radix-2, our fastest deterministic algorithm improves on the previous result (due to Stinson [29, Sect. 4.1]) by a factor \(c\sqrt{t}\lg {m}\) for some constant c, reducing the number of group operations used from to . While a far cry from challenging established cryptographic best practices, we do observe that our new algorithms are not without practical ramifications. Specifically, we demonstrate a practical attack against several recent Verifier-based Password Authenticated Key Exchange (VPAKE) protocols from the literature [12,13,14,15, 34].

Organization. The remainder of this paper is organized as follows. In Sect. 2, we recall mathematical preliminaries necessary to frame our main results. In Sect. 3, we review and improve on several variants of an algorithm for solving the “low-Hamming-weight DL problem” and then, in Sect. 4, we present our new generalizations to arbitrary radixes \(b>1\). In Sect. 5, we review existing work that addresses some related “low-weight” DL variants. In Sect. 6, we showcase the cryptanalytic implications of the new algorithms by explaining how they can be used to attack several Verifier-based Password Authenticated Key Exchange (VPAKE) protocols from the literature.

2 Mathematical Preliminaries

Throughout this paper, \({\mathbb {G}}\) denotes a fixed cyclic group with order q, which we express using multiplicative notation, and g denotes a fixed generator of \({\mathbb {G}}\). We are interested in computing the DLs of elements \(h\in {\mathbb {G}}\) to the base g. We assume that the group order q is known, though our techniques work much the same when q is not known.

Radix-b representations. Let \(b>1\) be a positive integer (the “radix”). For every positive integer x, there exists a unique positive integer m and an m-tuple \((x_m,\ldots ,x_{1})\in \{0,1,\ldots ,b-1\}^m\) with \(x_{m}\ne 0\) such that

(1)

called the radix-b representation of x. Here the component \(x_i\) is called the radix-b digit, and \(m=\lceil \log _bx\rceil \) the radix-b length, of x. The number of nonzero digits in the radix-b representation of x is called its radix-b weight (or simply its weight when the radix is clear from context). When \(b=2\), the radix-b weight of x is its Hamming weight and the radix-b length of x is its bit length.

Decomposing radix-b representations. Let m and t be positive integers. We write [m] as shorthand for the set \(\{1,2,\ldots ,m\}\) of positive integers less than or equal to m and, given a finite set A, we define as the set of all size-t subsets of A. We are especially interested in , a collection of subsets equipped with a natural bijective mapping to the set of all (at-most)-m-bit positive integers with Hamming weight t. The mapping from to the set of such integers is given by the function that maps each size-t subset to the integer .

The above \(\mathrm{val}_{m,t}\) function naturally generalizes to a family of two-operand functions parametrized by a radix \(b>1\). Specifically, for any integer \(b>1\), the function maps each t-tuple \(X=(x_t,\ldots ,x_1)\in [b-1]^t\) and size-t subset to the integer . In the preceding notation, Y[i] denotes the ith smallest integer in the set Y. Note that the function \(\mathrm{val}_{b,m,t}\) is injective: the possible inputs to \(\mathrm{val}_{b,m,t}\) map to pairwise distinct positive integers, each having radix-b weight t and radix-b length at most m. Also note that when \(b=2\), the all-ones tuple \((1,1,\ldots ,1)\) is the only element in \([b-1]^t\); thus, \(\mathrm{val}_{2,m,t}\) is functionally equivalent to the \(\mathrm{val}_{m,t}\) function introduced in the preceding paragraph. Going forward, we omit the subscripts mt from the preceding notations, noting that m and t can always be inferred from context when needed.

Stinson [29] describes three algorithms to compute low-Hamming-weight DLs. Lemmas 1 and 2 generalize Lemmas 1.1 and 1.2 from Stinson’s paper to the above-generalized family of radix-b \(\mathrm{val}\) functions. Proofs of these simple lemmas are in our extended technical report [11, Sects. A.1 and A.2].

Lemma 1

Fix a radix \(b>1\), let m be a positive integer, and let t be an even integer in [m]. If \(g^{\mathrm{val}_b(X_1,Y_1)}=h\cdot \bigl (g^{\mathrm{val}_b(X_2,Y_2)}\bigr )^{-1}\) for and , then \(\log _gh\equiv \bigl (\mathrm{val}_b(X_1,Y_1)+\mathrm{val}_b(X_2,Y_2)\bigr )\bmod {q}\).

Note that \(h\cdot \bigl (g^{\mathrm{val}_b(X_2,Y_2)}\bigr )^{-1}=h\cdot \bigl (g^{-1}\bigr )^{\mathrm{val}_b(X_2,Y_2)}\). The algorithms we present in the next two sections use the right-hand side of this expression instead of the left-hand side, as doing so allows us to invert g once and for all, rather than inverting \(g^{\mathrm{val}_b(X_2,Y_2)}\) once for each new choice of \((X_2,Y_2)\).

Lemma 2

Fix a radix \(b>1\), let m be an arbitrary positive integer, and let t be an even integer in [m]. If there is an \(x\equiv \log _gh\bmod {q}\) with radix-b weight t and radix-b length at most m, then there exist two disjoint subsets and corresponding such that \(g^{\mathrm{val}_b(X_1,Y_1)}=h\cdot \bigl (g^{\mathrm{val}_b(X_2,Y_2)}\bigr )^{-1}\).

Lemmas 1 and 2 assume that t is even so that is an integer. We make this simplifying assumption purely for notational and expositional convenience; indeed, both lemmas still hold if, for example, we let and . The algorithms that follow in Sects. 3 and 4 make the same simplifying assumption (in fact, the algorithms in Sect. 3.3 also assume that m is even); however, we stress that each algorithm is likewise trivial to adapt for t and m with arbitrary parities (as discussed by Stinson [29, Sect. 5]).

3 Computing DLs with Low Hamming Weight

In this section, we describe and improve upon two variants of the celebrated “baby-step giant-step” algorithm [28] for computing DLs. These algorithm variants have been specially adapted for cases in which the exponent is known to have low Hamming weight. The most basic form of each algorithm is described and analyzed in a paper by Stinson [29], who credits the first to Heiman [8] and Odlyzko [24] and the second to Coppersmith (by way of unpublished correspondence with Vanstone [4]).Footnote 2 In both cases, our improvements yield modest-yet-notable performance improvements—both concretely and asymptotically—over the more basic forms of the algorithms; indeed, our improvements to the second algorithm yield a worst-case computation complexity superior to that of any known algorithm for the low-Hamming-weight DL problem. In Sect. 4, we propose and analyze a simple transformation that generalizes each low-Hamming-weight DL algorithm in this paper to a corresponding low-radix-b-weight DL algorithm, where the radix \(b>1\) can be arbitrary.

figure a

3.1 The Basic Algorithm

Algorithm 3.1 gives pseudocode for the most basic form of the algorithm, which is due to Heiman and Odlyzko [8, 24].

Theorem 3

Algorithm 3.1 is correct: If there is an m-bit integer x with Hamming weight t such that \(g^{x\!}=h\), then the algorithm returns a DL of h to the base g.

Proof

(sketch). This follows directly from Lemmas 1 and 2. Specifically, Lemma 1 ensures that any value returned on Line 12 of Algorithm 3.1 satisfies \(g^{x\!}=h\), while Lemma 2 ensures that the baby-step loop (Lines 8–14) will indeed find the requisite pair \((Y_1,Y_2)\) if such a pair exists.   \(\square \)

Remark

When the order q is unknown, one can set m to be any upper bound on \(\lceil \lg {q}\rceil \), and then omit the modular reduction on Line 12 of Algorithm 3.1. Indeed, one may even set \(m>\lceil \lg {q}\rceil \) when q is known if, for example, the canonical representation of the desired DL has large Hamming weight but is known to be congruent (modulo q) to an m-bit integer with low Hamming weight.

The next theorem follows easily by inspection of Algorithm 3.1.

Theorem 4

The storage cost and (both average- and worst-case) computation cost of Algorithm 3.1, counted respectively in group elements and group exponentiations, each scale as .

Remark

Each exponentiation counted in Algorithm 3.1 is to a power with Hamming weight . By pre-computing \(g^{\mathrm{val}(\{i\})}\) for \(i\in [m]\), one can evaluate these exponentiations using just group operations a piece. The (both average- and worst-case) computation complexity becomes group operations. Going a step further, one can pre-compute \(g^{\mathrm{val}(\{i\})-\mathrm{val}(\{j\})}\) for each \(i\ne j\), and then iterate through following a “minimal-change ordering” [19, Sect. 2.3.3] wherein each successive pair of subsets differ by exactly two elements [30]. Then all but the first iteration of the baby-step (respectively, giant-step) loop uses a single group operation to “update” the \(y_1\) (respectively, \(y_2\)) from the previous iteration. The worst-case computation cost becomes group operations (plus one inversion and \(m^2\) group operations for pre-computation).

3.2 Improved Complexity via Interleaving

Next, we propose and analyze an alternative way to implement the basic algorithm (i.e., Algorithm 3.1), which interleaves the baby-step and giant-step calculations in a manner reminiscent of Pollard’s interleaved variant of the classic baby-step giant-step algorithm [27, Sect. 3]. Although such interleaving is a well-known technique for achieving constant-factor average-case speedups in baby-step giant-step algorithms, it had not previously been applied in the context of low-Hamming-weight DLs. Our analysis reveals that interleaving can, in fact, yield a surprisingly large (super-constant) speedup in this context.

The interleaved variant comprises a single loop and two lookup tables, \(H_1\) and \(H_2\). The loop iterates simultaneously over the subsets and in respectively increasing and decreasing order. (To keep the following analysis simple, we assume the order is lexicographic; however, we note that one can obtain a factor t speedup by utilizing some pre-computation and a minimal-change ordering, exactly as we suggested in the above remarks following the non-interleaved algorithm.) In each iteration, the algorithm computes both and , storing \((y_1,Y_1)\) in \(H_1\) and \((y_2,Y_2)\) in \(H_2\), and also checking if \(y_1\) collides with a key in \(H_2\) or \(y_2\) with a key in \(H_1\). Upon discovering a collision, it computes and outputs \(x\equiv \log _gh\bmod {q}\) using Lemma 1 (cf. Line 12 of Algorithm 3.1) and then halts. A pseudocode description of our interleaved algorithm is included in our extended technical report [11, Sect.  B.1].)

Despite its simplicity, this modification appears to be novel and has a surprisingly large impact on the average-case complexity. Indeed, if we assume that the interleaved loop iterates through in increasing and decreasing lexicographic order (for the giant-step and baby-step calculations, respectively), then the worst possible costs arise when the t one bits in the binary representation of x occur consecutively in either the t highest-order or the t lowest-order bit positions (i.e., when \(x=1^t0^{m-t}\) or \(x=0^{m-t}1^t\)). In this case, the algorithm produces a collision and halts after iterations of the loop. For , this gives a worst-case constant factor speedup compared to the non-interleaved algorithm;Footnote 3 for , the worst-case speedup is asymptotic (alas, we are unable to derive a precise characterization of the speedup in terms of m and t). The average-case speedup can be much more dramatic, depending on the distribution of the targeted \(x\equiv \log _gh\bmod {q}\). For a uniform distribution (among the set of all m-bit exponents with Hamming weight t) on x, we heuristically expect the one bits in x to be distributed evenly throughout its binary representation; that is, we expect to find the and one bits in x in or around bit positions and , respectively. Therefore, we expect the interleaved algorithm to produce a collision and halt after at most around loop iterations. (Contrast this with the original average-case complexity of the non-interleaved algorithm.) We summarize our above analysis in Theorem 5.

Theorem 5

The worst-case storage and computation costs of the interleaved algorithm described above, counted respectively in group elements and group operations, each scale as . If x is uniform among m-bit exponents with Hamming weight t, then the average-case storage and computation complexities scale as .

3.3 The Coppersmith Algorithms

Algorithm 3.1 and our interleaved variant are “direct” algorithmic instantiations of Lemmas 1 and 2 with a fixed radix \(b=2\). Such direct instantiations perform poorly in the worst case because Lemma 2 guarantees only existence—but not uniqueness—of the subsets \(Y_1\) and \(Y_2\) and, as a result, the collections of subsets over which these direct instantiations ultimately iterate are only guaranteed to be sufficient—but not necessary—to compute the desired logarithm. Indeed, given such that \(\log _gh\equiv \mathrm{val}(Y)\bmod {q}\), there exist distinct ways to partition Y into and \(Y_2=Y\setminus {Y_1}\) to satisfy the congruence \(\log _gh\equiv \bigl (\mathrm{val}(Y_1)+\mathrm{val}(Y_2)\bigr )\bmod {q}\) arising in Lemma 2. Stirling’s approximation implies that approaches as t grows large so that the number of “redundant” values these basic algorithms may end up computing (and storing) grows exponentially with t. We now describe a more efficient variant of this algorithm, originally proposed by Coppersmith [4], that improves on the complexity of the basic algorithms by taking special care to iterate over significantly fewer redundant subsets. (Actually, Coppersmith proposed two related algorithms—one deterministic and the other randomized; however, due to space constraints, we discuss only the deterministic algorithm in this section, relegating our discussion of the randomized algorithm to our extended technical report [11, Sect. D].)

Coppersmith’s Deterministic Algorithm. The first variant of Algorithm 3.1 proposed by Coppersmith is based on the following observation.

Observation 6

(Coppersmith and Seroussi [5]). Let t and m be even positive integers with \(t\le m\) and, for each , define and \(\bar{B}_i=[m]\setminus B_i\). For any , there exists some and (disjoint) subsets and such that \(Y=Y_1\cup Y_2\).

A proof of Observation 6 is included in our extended technical report [11, Sect.  A.4]. The following analog of Lemma 2 is an immediate corollary to Observation 6.

Corollary 7

Let t and m be even positive integers with \(t\le m\) and, for each , define and \(\bar{B}_i=[m]\setminus B_i\). If there is an \(x\equiv \log _gh\bmod {q}\) with Hamming weight t and bit length at most m, then there exists some and (disjoint) subsets and such that \(g^{\mathrm{val}(Y_1)}=h\cdot g^{-\mathrm{val}(Y_2)}\).

Using Corollary 7 to improve on the worst-case complexity of the basic algorithm is straightforward. The giant-step and baby-step loops (i.e., Lines 3–6 and 8–14) from Algorithm 3.1 are respectively modified to iterate over only the subsets and for each in turn. In particular, the algorithm populates a lookup table H in the giant-step loop using only the , and then it searches for a collision within H in the baby-step loop using only the ; if the baby-step loop for \(i=1\) generates no collisions, then the algorithm clears the lookup table and repeats the process for \(i=2\), and so on up to . Observation 6 guarantees that the algorithm finds a collision and halts at some point prior to completing the baby-step loop for , provided a DL with the specified Hamming weight and bit length exists. Pseudocode for the above-described algorithm is included in our extended technical report [11, Sect. B.2].

The next theorem follows easily from Corollary 7 and by inspection.

Theorem 8

Coppersmith’s deterministic algorithm is correct; moreover, its storage cost scales as group elements and its (worst-case) computation cost as group exponentiations.Footnote 4

Remark

The average-case complexity requires a delicate analysis, owing to the fact that there may be several indices i for which and the algorithm will always halt upon encountering the first such index. Interested readers can find a detailed analysis of the average-case complexity in Stinson’s paper [29, Sect. 3]. Stinson’s paper also proposes a generalization of Coppersmith’s deterministic algorithm utilizing a family of combinatorial set systems called splitting systems [29, Sect. 2.1] (of which the Coppersmith–Seroussi set system defined in Observation 6 and Corollary 7 is an example). A discussion of splitting systems and Stinson’s improvements to the above algorithm is included in our extended technical report [11, Sect. C].

3.4 Improved Complexity via Pascal’s Lemma

A methodical analysis of the Coppersmith–Seroussi set system suggests an optimization to Coppersmith’s deterministic algorithm that yields an asymptotically lower computation complexity than that indicated by Theorem 8. Indeed, the resulting optimized algorithm has a worst-case computation complexity of just group operation, which is asymptotically lower than that of any low-Hamming-weight DL algorithm in the literature. Moreover, the hidden constant in the optimized algorithm (i.e., ) seems to be about as low as one could realistically hope for. Our improvements follow from Observation 9, an immediate consequence of Pascal’s Lemma for binomial coefficients, which states that .

Observation 9

Let be the Coppersmith–Seroussi set system, as defined in Observation 6 and Corollary 7. For each , we have that .

A simple corollary to Observation 9 is that the baby-step and giant-step loops for in a naïve implementation of Coppersmith’s deterministic algorithm each recompute values that were also computed in the immediately preceding invocation, or, equivalently, that these loops each produce just new values. Carefully avoiding these redundant computations can therefore reduce the per-iteration computation cost of all but the first iteration of the outer loop to group operations. The first (i.e., \(i=1\)) iteration of the outer loop must, of course, still produce values; thus, in the worst case, the algorithm must produce distinct group elements. Note that in order to avoid all redundant computations in subsequent iterations, it is necessary to provide both the giant-step and baby-step loops with access to the \((y_1,Y_1)\) and \((y_2,Y_2)\) pairs, respectively, that arose in the immediately preceding invocation. Coppersmith’s deterministic algorithm already stores each \((y_1,Y_1)\) pair arising in the giant-step loop, but it does not store the \((y_2,Y_2)\) pairs arising in the baby-step loop; hence, fully exploiting Observation 9 doubles the storage cost of the algorithm (in a similar vein to interleaving the loops). The upshot of this increased storage cost is a notable asymptotic improvement to the worst-case computation cost, which we characterize in Lemma 10 and Corollary 11. A proof of Lemma 10 is located in Appendix A.1.

Lemma 10

Let be the Coppersmith–Seroussi set system, as defined in Observation 6 and Corollary 7. We have

To realize the speedup promised by Lemma 10, the optimized algorithm must do some additional bookkeeping; specifically, in each iteration , it must have an efficient way to determine which of the and —as well as the associated \(y_1=g^{\mathrm{val}(Y_1)}\) and \(y_2=h\cdot g^{-\mathrm{val}(Y_2)}\)—arose in the iteration, and which of them arise will for the first time in the ith iteration. To this end, the algorithm keeps two sequences of hash tables, say \(H_1,\ldots ,H_{m}\) and \(I_1,\ldots ,I_m\), one for the giant-step pairs and another for the baby-step pairs. Into which hash table a given \((Y_1,y_1)\) pair gets stored is determined by the smallest integer in \(Y_1\): a \((Y_1,y_1)\) pair that arose in the iteration of the outer loop will also arise in the ith iteration if and only if the smallest element in \(Y_1\) is not \(i-1\); thus, all values from the iteration not in the hash table \(H_{i-1}\) can be reused in the next iteration. Moreover, each \((Y_1,y_1)\) pair that will arise for the first time in the ith iteration has a corresponding \((Y_1',y_1')\) pair that is guaranteed to reside in \(H_{i-1}\) at the end of the iteration. Indeed, one can efficiently “update” each such \((Y_1',y_1')\) in \(H_{i-1}\) to a required \((Y_1,y_1)\) pair by setting and . Note that because \(Y_1\) no longer contains \(i-1\), the hash table in which the updated \((Y_1,y_1)\) pair should be stored changes from \(H_{i-1}\) to \(H_j\) for some \(j\ge i\). An analogous method is used for keeping track of and “updating” the \((Y_2,y_2)\) pairs arising in the baby-step loop. Pseudocode for the above-described algorithm is included as Algorithm B.1 in Appendix B. The following corollary is an immediate consequence of Lemma 10.

Corollary 11

Algorithm B.1 is correct; moreover, its storage cost scales as group elements and its worst-case computation cost as group exponentiations.

Note that the worst-case complexity obtained in Corollary 11 improves on a naïve implementation of Coppersmith’s algorithm by a factor \(\tfrac{m}{t}\) (and it improves on the previously best-known lower bound, due to Stinson [29, Theorem 4.1], by a factor \(\sqrt{t}\lg {m}\)). As with the basic algorithm, one can leverage pre-computation and a minimal-change ordering to replace all but two of the exponentiations counted by Corollary 11 with a single group operation each; hence, the worst-case computation complexity is in fact just group operations.

4 From Low Hamming Weight to Low Radix-\({{\varvec{b}}}\) weight

In this section, we introduce and analyze a simple transformation that allows us to generalize each of the low-Hamming-weight DL algorithms from the preceding section to a low-radix-b-weight DL algorithm, where the radix \(b>1\) can be arbitrary. The transformation is deceptively simple: essentially, it entails modifying the low-Hamming-weight algorithm to iterate over all possible inputs to a \(\mathrm{val}_b\) function, rather than over all possible inputs to an “unqualified” \(\mathrm{val}\) function (or, equivalently, to a \(\mathrm{val}_2\) function). Algorithm 4.1 provides pseudocode for the simplest possible form of our radix-b algorithm; that is, for the transformation applied to Algorithm 3.1. We illustrate the transformation as it applies to this most basic form of the low-Hamming-weight DL algorithm purely for ease of exposition; indeed, we do not recommend implementing this particular variant in practice—rather, we recommend applying the transformation to Algorithm B.1 or to the randomized algorithm (see our extended technical report [11, Sect. D]) as outlined below.

figure b

Theorem 12

Algorithm 4.1 is correct: If there exists an integer x with radix-b length m and radix-b weight t such that \(g^x=h\), then the algorithm returns a DL of h to the base g.

Remark

When the radix is \(b=2\), the inner giant-step and baby-step loops (i.e., Lines 4–7 and 11–18) execute only once and Algorithm 4.1 reduces to Algorithm 3.1, an observation which bares out in the following theorem. If the radix is \(b>2\) yet all digits are bounded above by some \(c<b\), then the inner loops need only iterate over the tuples in , thus reducing the cost by a factor .

Theorem 13

The storage cost and (both average- and worst-case) computation cost of the above algorithm, counted respectively in group elements and group exponentiations, each scale as .

Remark 14

As with the low-Hamming-weight algorithms, it is possible to reduce each of the exponentiations counted by Theorem 13 to a single group operation, in this case by using a minimal-change ordering for the outer loop and a Gray code [19, Sect. 2.2.2] for the inner loop.

More efficient radix-b variants. Every one of the algorithm variants we described in Sect. 3 generalizes similarly to an algorithm for radix b, by simply including an inner loop over each within the giant-step and baby-step loops. In each case, the expressions for storage and worst-case computation complexity pick up an additional factor ; however, the reader should bear in mind that this newfound exponential factor is at least partially offset by a corresponding decrease in the radix-b length and (presumably) weight that appear in the binomial term. In particular, an exponent \(x\equiv \log _gh\bmod {q}\) with bit length \(m_2\) has a radix-b length of . Specifically, applying the transformation to Algorithm B.1 yields a radix-b algorithm with worst-case running time of , where m and t respectively denote the radix-b length and radix-b weight of the DL sought.

In Theorem 15, we (partially) characterize one condition under which it is beneficial to switch from a baby-step giant-step algorithm for radix b to the corresponding baby-step giant-step algorithm for some larger radix. In this theorem, the radix-b density of x refers to the ratio of its radix-b weight to its radix-b length. For example, if m and t respectively denote the radix-b length of x and the radix-b weight of x, then its radix-b density is .

Theorem 15

Fix a radix \(b>1\) and an exponent x with radix-b density d. There exists a constant (with \(k_0>1\)) such that, for all \(k>k_0\), if the radix-\(b^k\) density of x is less than or equal to d, then a radix-\(b^k\) algorithm has lower cost than the corresponding radix-b algorithm.

Theorem 15 implies that, for a fixed algorithm variant, switching to using a higher radix is beneficial (essentially) whenever the change to the radix does not increase the density of the DL being computed. We emphasize that the exponent k in the theorem need not be an integer;Footnote 5 thus, the theorem addresses cases like that of switching from a radix-2 representation to a radix-3 representation (\(k=\lg 3\)) or from a radix-4 representation to a radix-8 representation (). For example, the (decimal) number 20871 has radix-4 representation 11012013 and density 0.75, whereas it has radix-8 representation 50607 and density 0.6.

A proof sketch for Theorem 15 is included in Appendix A.2. We only sketch the main idea behind the proof, and only for the “basic” radix-b algorithm (i.e., for Algorithm 4.1). The reason for this is twofold: first, the details of the relevant calculations are both unenlightening and rather messy (owing to the fact that , which can make our relevant inequalities go the wrong way for small values of k); and, second, nearly identical calculations illustrate why the theorem holds for the more efficient algorithm variants.

5 Related Work

The problem of solving “constrained” variants of the DL problem has received considerable attention in the cryptographic literature, with the most well-known and widely studied such variant being that of computing DLs that are “small” [32] or known to reside in a “short” interval [22, 31, 33]. Existing algorithms can compute such DLs using an expected group operations when the exponent is known to reside in the interval .

In addition to the basic Heiman–Odlyzko [8] and Coppersmith–Stinson [29] low-Hamming-weight algorithms discussed earlier, a handful of papers have considered the problem of computing DLs that have low Hamming weight [20, 23] or those expressible as a product whose multiplicands each have low Hamming weight [3, 16, 17]. Efficient algorithms for these computations have applications in attacking encryption schemes that leverage such low-weight [1, 18, 21] and product-of-low-weight-multiplicand [7, 9] exponents as a means to reduce the cost of public-key operations. They have also been adapted to attack so-called secure human identification protocols [10], which leverage low-Hamming-weight secrets to improve memorability for unassisted humans.

Specifically, Cheon and Kim [3] proposed a baby-step giant-step algorithm to compute DLs expressible as a product of three low-Hamming-weight multiplicands in groups with known order. The use of such “product-of-three” exponents was proposed by Hoffstein and Silverman [9] to allow for low-cost exponentiations in groups that permit fast endomorphism squaring (which includes the Galois fields and the so-called “Koblitz” elliptic curves) while seeking to resist meet-in-the-middle attacks. Subsequently, Kim and Cheon [16, 17]Footnote 6 improved on those results using a “parametrized” variant of splitting systems, while Coron, Lefranc, and Poupard [6] proposed a related algorithm that works in groups with unknown composite order (e.g., in the multiplicative group of units modulo \(n=pq\), where p and q are large primes and the factorization of n into p and q is not provided to the algorithm).

Meanwhile, Muir and Stinson [23] studied generalizations of Coppersmith’s deterministic algorithm to compute DLs known to have a non-adjacent form (NAF) representation with low weight. (In the latter context, “low weight” means a small numbers of \(\pm 1\) digits in the NAF representation.) More recently, May and Ozerov [20] revisited the low-Hamming-weight DL problem in groups of composite order (where a factorization of the order is known), proposing an algorithm that combines aspects of the Silver–Pohlig–Hellman [26] algorithm with any of the basic low Hamming weight algorithms to obtain lower complexity than either approach in isolation.

The algorithms we have presented in this work (i) offer improved complexity relative to existing low-Hamming-weight algorithms, and (ii) generalized to the low-radix-b-weight case for arbitrary \(b\ge 2\). This is a (mathematically) natural generalization of the low-Hamming-weight DL problem that has not been explicitly considered in prior work. We suspect that our modifications will “play nice” with some or all of the above-mentioned low-weight DL algorithm variants, and we slate a formal investigation of this interplay for future work.

6 Cryptanalytic Applications

We now turn our attention to the cryptanalytic applications of our new algorithms. Specifically, we demonstrate how to use a low-radix-b-weight DL algorithm to attack any one of several verifier-based password-authenticated key exchange (VPAKE) protocols from the cryptographic literature. Briefly, a password-authenticated key exchange (PAKE) protocol is an interactive protocol enabling a client to simultaneously authenticate itself to, and establish a shared cryptographic key with, a remote server by demonstrating knowledge of a password. The security definitions for PAKE require that the interaction between the client and server reveals at-most a negligible quantity of information about the client’s password (and the shared key): a man-in-the-middle who observes (and possibly interferes with) any polynomial number of PAKE sessions between a given client and server should gain at most a negligible advantage in either hijacking an authenticating session or impersonating the client (e.g., by guessing her password). VPAKE protocols extend PAKE with additional protections against the server, ensuring that an attacker who compromises the server cannot leverage its privileged position to infer the client’s password using less work than would be required to launch a brute-force attack against the password database (even after engaging in any polynomial number of PAKE sessions with the client).

In recent work [12], Kiefer and Manulis proposed a VPAKE protocol with the novel property of allowing the client to register its password without ever revealing that password to the server. Their idea, at a high level, is to have the client compute a “fingerprint” of the password and then prove in zero-knowledge that the fingerprint was computed correctly; subsequent authentications involve a proof of knowledge of the password encoded in a given fingerprint. To make the zero-knowledge proofs practical, the password fingerprints are computed using a structure-preserving map. Benhamouda and Pointcheval [2, Sect. 1.2] note that the Kiefer–Manulis VPAKE construction, as originally presented, falls easily to a short-interval variant Pollard’s Kangaroo algorithm [22]. In response to this observation, Kiefer and Manulis released an updated version of their paper (as a technical report [13]) that attempts to thwart the sort of short-interval attacks pointed out by Benhamouda and Pointcheval. A handful of subsequent papers [14, 15, 34] have built on their algorithm, sharing the same basic framework (and, hence, similarly susceptible to the attack described below).

The Kiefer–Manulis Protocol

Before presenting our attack, we briefly summarize the relevant portions of Kiefer and Manulis’ VPAKE construction. Passwords in their construction consist of any number of printable ASCII characters (of which there are 94 distinct possibilities that are each assigned a label in ) up to some maximum length, which we will denote by m; thus, there is a natural mapping between valid passwords and the set of radix-94 integers with length at most m. This yields possible passwords (although the authors incorrectly give the number as just \(94^m\)).

The client maps her password pw to \({\mathbb {Z}}\) via the structure-preserving map

where is the numeric label assigned to the ith character in pw. Here \(b\ge 94\) is an integer parameter, which the authors refer to as the “shift base”.

The client computes a fingerprint of her password pw by selecting two random values, and (the so-called “pre-hash” and “post-hash” salts) and using them to produce a Pedersen-like commitmentFootnote 7

and then outputting the tuple \((s,\tilde{g},C)\). As the post-hash salt s in this construction is output as part of the fingerprint, it does not serve a clear purpose; indeed, any party (including the attacker) can trivially compute . Thus, recovering the client’s password (at least, modulo q) from a fingerprint is equivalent to solving for .

The Benhamouda–Pointcheval attack. The original protocol used \(b=94\), which yields ; hence, as noted by Benhamouda and Pointcheval [2, Sect. 1.2], an attacker can recover from \((s,\tilde{g},C)\) in around \(\sqrt{94^{m}}\lesssim 10^{m}\) steps using the Kangaroo algorithm. (Note that m here is a password length, and not a cryptographic security parameter.) This is a mere square root of the time required to launch a brute-force attack, which falls far short of satisfying the no-better-than-brute-force requirement for a VPAKE protocol.

Kiefer and Manulis’ defense. To protect against Kangaroo attacks, Kiefer and Manulis suggested to increase the shift base. Specifically, as exponents in their scheme have the form with each , they solve for the smallest choice of b that causes the “largest” possible password of length |pw| to induce an exponent that satisfies the inequality \(94^{2m}<93\sum _{i=1}^{m} b^{i-1}\). Doing so means that exponents are distributed throughout the range , which is (ignoring constants) necessary and sufficient to ensure that a straightforward application of Pollard’s Kangaroo algorithm will fail to solve the DL in fewer steps than are required to brute-force the password, on average. If one supposes that the Kangaroo algorithm is the best possible DL-based attack possible, the defense seems reasonable. Kiefer and Manulis suggest \(b=10^5\), which they state “should be a safe choice”.

Revised attack from the deterministic low-radix- \(\mathbf{10}^\mathbf{5}\) -weight DL algorithm. Using our optimized form of Coppersmith’s algorithm (together with the remarks following Theorem 12), one can solve for any password up to, say \(m=12\) characters long, using fewer than

group operations, as compared with

guesses for a brute-force attack, thus rendering Kiefer and Manulis’ defense completely ineffective.

7 Conclusion

The DL problem is a cornerstone of modern cryptography. Several prior works have studied “constrained” variants of the DL problem in which the desired exponent is known either to have low Hamming weight or to be expressible as a product whose multiplicands each have low Hamming weight. In this work, we have focused on the related problem of computing DLs that have low radix-b weight for arbitrary \(b\ge 2\). This is a (mathematically) natural generalization of the low-Hamming-weight DL problem that has not been explicitly considered in prior work. We emphasize that a significant part of our contribution was to minimize the hidden constants in the low-Hamming-weight algorithms (improving the best-known complexity for the radix-2 case) and, by extension, in their radix-b generalizations. We expect that our modifications will “play nice” with prior efforts to solve other low-Hamming-weight and product-of-low-weight-multiplicand DL problem variants, and we slate a formal investigation of this interplay for future work. To showcase the cryptanalytic applications of our new algorithms, we demonstrated an attack against several Verifier-Based Password Authenticated Key Exchange (VPAKE) protocols from the cryptographic literature.