Abstract
We introduce a technique for component-based program synthesis that relies on searching for a target program and its proof of correctness simultaneously using a purely constraint-based approach, rather than exploring the space of possible programs in an enumerate-and-check loop. Our approach solves a synthesis problem by checking satisfiability of an \(\exists \exists \) constraint \(\phi \), whereas traditional program synthesis approaches are based on solving an \(\exists \forall \) constraint. This enables the use of SMT-solving technology to decide \(\phi \), resulting in a scalable practical approach. Moreover, our technique uniformly handles both functional and nonfunctional criteria for correctness. To illustrate these aspects, we use our technique to automatically synthesize several intricate and non-obvious cryptographic constructions.
A. Gascón—Part of this work was done while the author was at the University of Edinburgh, supported by the SOCIAM Project under EPSRC grant EP/J017728/2.
A. Tiwari—Supported in part by the National Science Foundation under grant CCF 1423296.
1 Introduction
Automated program synthesis has a rich history in computer science. This problem has been studied from several perspectives, and currently lies at the intersection between logic, artificial intelligence, and software engineering. The seminal work by Manna and Waldinger [23], commonly referred to as deductive synthesis, is based in the observation that a program with input x and output y, specified by a formula \(\phi (x,y)\), can be extracted from a constructive proof of \(\forall x\exists y:\phi (x,y)\), as this formula is equivalent to a second-order formula of the form \(\exists f~\forall x: \phi (x, f(x))\).
More recently, program synthesis has taken the form of inductive synthesis, where programs are not deduced, but synthesized iteratively by finding candidate programs that work correctly on an ever-increasing input space. In practice, this search is implemented using powerful constraint solvers, typically Boolean Satisfiability (SAT) solvers and Satisfiability Modulo Theory (SMT) solvers. The various choices in the synthesis approach, correctness specification, and restrictions on the program search space, have been explored extensively [2, 16, 19, 28, 29].
Manna and Waldinger [23] foresee the possibility that the user could suggest program segments, i.e. snippets, to the synthesizer, which it could use to construct a full solution. This idea was pursued in the work on component-based program synthesis [16, 32], where the target program is constructed from a set of predefined components (library calls). The component-based synthesis problem is also naturally encoded as an \(\exists \forall \) problem: there exists some placement of components on the different program lines such that for all inputs, the function computed by the resulting program satisfies the given property.
Although it comes in many flavors, a synthesis problem is essentially parameterized by a target language, i.e. the language of the target program to be synthesized, and a specification language, i.e. the formalism in which the functionality of the target program is expressed. Moreover, synthesis may be subject to nonfunctional constraints, such as optimizing for certain metrics like program size or power consumption, or enforcing security properties. Examples of target languages include MapReduce-style programs [27], bit-vector manipulations [16], recursive programs [20], high-level circuit descriptions [13], and domain-specific languages for cryptographic constructions [18, 22]. On the other hand, examples of specification languages include formulas in several modal temporal logics, input-output examples [19, 27], flattened verilog circuits given as SMT/Boolen formulas [13], and assertions in imperative program sketches [29].
Generally speaking, the synthesis problem seeks to find a program P in the target language, such that P satisfies the specification \(\phi \) given in the specification language. As \(\phi \) is often a relation between input-output pairs, this naturally corresponds to an exists-forall check. In fact, inductive synthesis algorithms often consist of two procedures: a procedure to generate candidate programs from the target language, and a procedure for checking \(\phi \) on a given candidate. In that setting, synthesis consists of an enumerate-and-check feedback loop, similar to a Counterexample-Guided Abstraction Refinement (CEGAR) loop, that continues until a valid candidate, i.e. a candidate satisfying \(\phi \), is found, or no more candidates can be generated. Note that, to achieve scalability, the verification check is often conservative, i.e. \(\phi \) is replaced by a sufficient, but not necessary condition. This is often the case with security properties, as they are costly to check in a sound and complete way.
The paradigm of “enumerate-and-check” for solving synthesis problems is fairly widespread in literature [13, 16, 22, 27]. In this context, the general idea of looking for a program and its proof simultaneously has been also considered in previous work on type-directed synthesis [12, 26], where program candidates that are not type correct are pruned early in the enumerate-and-check. Also, the deductive phase of the Leon synthesizer [20] is also based on proof search.
In this paper, we pursue a framework that enables synthesis using a single search, and avoids quantifier alternation. Inspired by the challenge posed by Manna and Waldinger [23] pertaining to incorporation of user-defined proof systems in synthesis, and building on the framework of component-based synthesis [16], we present an approach for synthesis that allows users to define simple proof systems, in the form of constraint-generation rules for the components of the synthesized program. This framework, which we call decorated-component-based synthesis, provides a way for the user to not only easily encode nonfunctional properties, as in [32], but also replace the validity check in the synthesis process by a search for a proof in the provided proof system. This effectively removes a quantifier alternation and hence turns the enumerate-and-check approach into a search problem in which, intuitively, the space of target programs and their corresponding correctness proofs are explored simultaneously, resulting in a much more scalable approach.
Contributions. We make three key contributions in this work. First, we formulate the decorated-component-based program synthesis problem, which allows users to encode a bounded search for proof (in a user-picked proof system) in the synthesis problem. Second, we show that decorated-component-based synthesis problem reduces to an exists-forall constraint in general, just as the component-based synthesis problem [16]. However, the additional “decorations” enable users to either augment a synthesis constraint with additional existential parts (similar to the work in [32]), or entirely replace the forall by a existential in the synthesis constraint. This second application of decorations is appealing because solving a purely existential constraint is significantly faster than solving an exists-forall constraint. Third, we demonstrate that security is an ideal domain for application of automated program synthesis technology, thus solidifying preliminary evidence in this regard [3, 7, 18, 22, 32]. Decorated-component-based synthesis eases the task of specifying security requirements. Our synthesizer and all the examples mentioned in this paper, as well as instructions for running them, are available at the SYNUDIC project’s public repository [14].
Outline. We start by illustrating our approach (with complete details) on an example synthesis problem (Sect. 2). We then formally define decorated-component-based synthesis (Sect. 3), and show that it reduces to an \(\exists \forall \) constraint (Sect. 4). We then present our main result that enables conservative replacement of the \(\forall \) by an \(\exists \) in the synthesis constraint (Sect. 5). Finally, we show its application to synthesis of cryptographic schemes (Sect. 6).
2 An Illustrative Example
Secure Multi-party Computation (MPC) is a subfield of cryptography with the goal of creating protocols for multiple parties to jointly compute a function over their inputs without disclosing the inputs to each other. Here we consider the problem of designing an information secure two-party multiplication protocol, which is a basic component in many privacy-preserving algorithms [5, 10].
Our problem is as follows: find a protocol for Alice and Bob to compute an additive share of the product of Alice’s and Bob’s private input values. Let Alice’s (private input) value be \(\mathtt {input}(A)\in \mathbb {Z}_q\), and Bob’s value be \(\mathtt {input}(B)\in \mathbb {Z}_q\), for some natural q, say \(2^{32}\). For correctness, our functional requirement is that
In other words, each party computes a share of the result. We assume that Alice and Bob can rely on a third untrusted party Carol that aids in the computation.
Now that we have the functional correctness requirement, let us consider the nonfunctional security requirement. Informally, the main security requirement is that Alice and Carol should not learn the value \(\mathtt {input}(B)\), and Bob and Carol should not learn the value \(\mathtt {input}(A)\). In the static honest-but-curious adversary model, one assumes that the parties – Alice, Bob, and Carol – have an incentive to deduce as much information as possible from the transcripts of the protocol, but they do not deviate from it nor collude. (Formally, the security requirement is formulated in the so-called “simulation paradigm”, see [21] for details.)
How do we use our new decorated-component-based synthesis technique to discover a secure multiplication protocol? We first identify the components that we could use in the protocol. These are the three arithmetic operations \(\mathtt {plus}\), \(\mathtt {minus}\), and \(\mathtt {times}\), along with a few calls to a pseudo-random generator, say \({\mathtt {genx}}\), \({\mathtt {geny}}\), \({\mathtt {genr}}\), and \({\mathtt {genu}}\) (that generate random numbers x, y, r, u), and the identity function \({\mathtt {identity}}\).
Next, let us fix a communication schedule between the parties. The structure of the protocol, depicted in Fig. 1(left) is as follows: (1) C computes some values \((v_1, \ldots , v_4)\) first (5 lines), (2) C sends \((v_1, v_2)\) to A and \((v_3, v_4)\) to B, (3) A computes \(v_5\) (1 line) and sends it to B, (4) B computes some values (3 lines), sends a pair (\(v_6\), \(v_7\)) to A and picks one value as its output, (5) A computes its output (2 lines). Note that instances of this template are constant-round protocols, as opposed to approaches to secure multiplication based on Oblivious Transfer [15]. Our tool [14] takes a description of the library and a template of a straight-line program similar to the one in Fig. 1(right) as input.
If we give each of the components its natural interpretation (that is, all variables are integer valued, \(\mathtt {plus}\) is arithmetic addition, and so on), then the correctness requirement is simply the arithmetic equality \(\mathtt {output}(A) + \mathtt {output}(B) = \mathtt {input}(A)*\mathtt {input}(B)\). Now, synthesis can be performed as in [16] or [32] – the synthesis problem is reduced to an \(\exists \forall \) formula over a suitable theory, where the \(\exists \) quantifier searches over the space of possible programs, and the \(\forall \) quantifier checks correctness over the space of all possible inputs.
Our new decorated-component-based synthesis framework allows us to (a) conservatively turn the validity check above into a satisfiability check over an alternate theory, and (b) provide a natural way of also specifying a nonfunctional security requirement. The key idea behind our approach is associating a constraint with each use of a component in the (yet to be discovered) program, by defining a decoration, or a constraint generation rule, for every component.
As a first step in defining the decorations, consider an abstract domain \(\mathcal{A}\) that consists of symbolic polynomial expressions of degree at most 2 over the six variables: the two inputs a, b and the four random numbers x, y, r, u.
where polynomials in \(\mathcal{A}\) are represented in canonical form as a sum of ordered monomials.
Let us say we wish our program (that is yet to be discovered) to have a functional correctness proof in this abstract domain. We can design proof rules that essentially perform symbolic execution to check the correctness assertion. Let \(\theta : {V}\mapsto \mathcal{A}\) map a program variable \(v\in {V}\) to the symbolic polynomial value of v. We can compute \(\theta \) by starting with any substitution, and updating it using the rules in Fig. 2. For example, the rule \({\mathtt {genx}}\) says that after execution of \(v := {\mathtt {genx}}\), we get a new substitution that maps v to the symbolic polynomial x. Similarly, the rule \({\mathtt {plus}}\) handles program lines of the form \(v := {\mathtt {plus}}(x, y)\) by setting \(\theta (v)\) to the polynomial \((\theta (x) + \theta (y))\in \mathcal{A}\). In Fig. 2 we omitted rules for \({\mathtt {minus}}\), \({\mathtt {geny}}\), \({\mathtt {genr}}\), \({\mathtt {genu}}\), and \({\mathtt {identity}}\). Once we have computed the substitution \(\theta \) at the end of the program, we use the rule \(\mathtt {check}\) to prove an assertion \(v+w=a*b\) by checking if \(\theta (v)+\theta (w)\) and \(\theta (a)*\theta (b)\) are syntactically equal (recall elements of \(\mathcal{A}\) are represented in canonical form). The proof rule for \({\mathtt {times}}\) in Fig. 2 has a condition that allows multiplication to be used only on linear polynomials (so that the result is atmost quadratic).
Note that this proof system for checking correctness turns the validity question over integers into an evaluation over \(\mathcal{A}\). Our goal now is to have our synthesizer use this proof system to, instead of searching for a program satisfying the postdondition \(\mathtt {output}(A) + \mathtt {output}(B) = \mathtt {input}(A)*\mathtt {input}(B)\), search for a program while simultaneously searching for its correctness proof. To see how this is done, first note that the elements in the chosen abstract domain can be uniquely identified by 27 parameters – namely, the coefficients of all degree 1 and degree 2 monomials over the six variables (\(C(6,1) + C(6,2) + C(6,1) = 6 + 15 + 6 = 27\)). Let \(L = \{x,y,r,u,a,b\}\) and \(NL = \{ij \mid i\ne j, i\in L, j\in L\}\cup \{ii \mid i\in L\}\). Hence, every program variable v is associated with 27 new variables, namely, \(v_i\), where index i ranges over the set \(I = L\cup NL\) of all indices. If variable v gets a symbolic value \(p\in \mathcal{A}\) and p is quadratic, then the new variables \(v_i\)’s get the value of the coefficient of the monomial i in p. Now, let us say we could prove our functional requirement using quadratic symbolic values. Then, there exists a value of the new variables that witnesses this proof. The converse is even more important for our goal: if we find a consistent valuation for the new variables, then we would establish our functional requirement. Restricting \(\mathcal{{A}}\) to contain quadratic polynomials makes the set of new variables, and the proof search, finite.
A sound proof rule application (on the abstract values) induces certain constraints on the new variables. Hence, as mentioned above, we associate to every component a constraint generation rule, also called a decoration, that produces the suitable constraint to encode the corresponding proof rule. The generated constraint essentially says what combinations of the 27 parameters for its inputs and outputs are consistent with the proof rule. For example, the Column 3 in Fig. 2 shows such constraints for selected components.
Note that the correctness requirement was an equivalence of polynomial expressions, and in our abstract domain \(\mathcal{A}\), this maps to equality of coefficients (see right column on last row in Fig. 2). Thus, ignoring the security requirement, the synthesis problem is reduced to finding \(27*l\) values, where l is the length of the program, that satisfy some big constraint generated using decorations. This is an \(\exists \exists \) problem: we are finding a program (first \(\exists \)) and a proof of its correctness over the chosen abstract domain (second \(\exists \)). Decorations have enabled us to replace the \(\forall \) check by an \(\exists \) check. Note that, although the task of finding a proof system requires human intuition, the process of designing constraint generation rules, i.e. parameterizing the abstract domain and building the constraint for every component of the library, is systematic (Sect. 5.1).
We have not yet solved our original problem because we still need to include the security requirement. The formal security requirement, which is based on showing a simulation of the ideal functionality in the actual functionality (see [21] for details on this proof technique), is difficult to capture precisely. We take a very practical approach here: we replace the security requirement by another easily checkable requirement that is sufficient (but not necessary) for security. The new check can itself be described by proof rules, and we can again search for a bounded-size proof to establish security. The sufficiency of the proof rules may itself be proved as a meta-theorem by hand. In our example, we use ideas from [32], which were in turn inspired by [22], to synthesize block cipher modes of operation. Essentially, the decorations rely on a simple type system that propagates a qualifier stating whether a variable always has a “random” value on any program line, in a sound, and possibly incomplete, way.
Using this encoding of the security requirement, our sketch is complete, we run our synthesis tool, and it returns the following protocol:
-
1.
C generates random numbers x, y, r, and computes xy and \(r-xy\).
-
2.
C sends (x, r) to A and \((r-xy,y)\) to B.
-
3.
A computes \(\mathtt {input}(A)-x\) and sends it to B.
-
4.
B computes \(\mathtt {output}(B) = (\mathtt {input}(A)-x)b + (r-xy)\) and sends \(y+\mathtt {input}(B)\) to A.
-
5.
A computes \(\mathtt {output}(A) = (y+\mathtt {input}(B))x-r\).
We were not aware of this protocol before it was synthesized by our tool. Note that the protocol did not use the fourth random number (u), whereas we were expecting the synthesized protocol to need it.
3 Decorated-Component-Based Program Synthesis
We define the component-based synthesis problem in this section, as introduced in [16]. We then extend it to decorated-component-based synthesis, where components are additionally allowed to be associated with certain constraint-generation rules.
A component library \(\varSigma \) is a set of symbols. Each symbol is associated with an arity, but without loss of generality and for simplicity, we will often implicitly assume that the arity of each symbol in \(\varSigma \) is two. The symbols in \(\varSigma \) should be regarded as functions that can be invoked by a program.
The functions in \(\varSigma \) compute over some values. For simplicity again, let us say these values come from a domain \(\mathtt{{Domp}}\) of all values. The semantics of the functions in \(\varSigma \) is given over the domain \(\mathtt{{Domp}}\) by \(\mathtt{{Semp}}\).
That is, if \(f\in \varSigma \), then \(\mathtt{{Semp}}(f)\) is a ternary relation on \(\mathtt{{Domp}}\). Intuitively, \(c = f(a,b)\) iff \((a,b,c)\in \mathtt{{Semp}}(f)\).
We want to synthesize straight-line programs (SLPs) using calls to functions in \(\varSigma \). A generic template of such a 9-line program is shown in Fig. 3. The semantics \(\mathtt{{Semp}}\) of one component can be extended to semantics of a straight-line program P (such as the one shown in Fig. 3) that takes one input \(x_0\) and produces one output \(x_9\) so that \(\mathtt{{Semp}}(P) \subseteq \mathtt{{Domp}}^2\) contains all pairs (a, b) where \(x_9=b\) is reachable starting with \(x_0=a\).
A specification, \({\phi _\mathtt{{fspec}}}\), of a program P that takes one input and produces one output is given as binary relation on \(\mathtt{{Domp}}\).
Definition 1
(Component-Based Synthesis, or CoS [16]). A CoS problem is a tuple \((\varSigma , \mathtt{{Domp}}, \mathtt{{Semp}}, {\phi _\mathtt{{fspec}}}, n)\) consisting of a library \(\varSigma \) of functions, a domain \(\mathtt{{Domp}}\) of values, a semantics function \(\mathtt{{Semp}}: \varSigma \mapsto 2^{\mathtt{{Domp}}^3}\), a specification relation \({\phi _\mathtt{{fspec}}}\subseteq \mathtt{{Domp}}^2\), and an integer n. The goal is to find a straight-line program P of length n that only calls functions in \(\varSigma \) to compute a function that refines \({\phi _\mathtt{{fspec}}}\); that is, \(\forall {x,y}: \mathtt{{Semp}}(P)(x,y) \Rightarrow {\phi _\mathtt{{fspec}}}(x,y)\).
3.1 Decorated-Component-Based Synthesis
We now allow the library components \(f\in \varSigma \) to be associated with additional constraint-generation rules, and introduce the problem of synthesizing straight-line programs (SLPs) that use such decorated components.
Let V denote all program variables. The semantics \(\mathtt{{Semp}}\) interpreted V as elements in \(\mathtt{{Domp}}\). Now, let \(\mathtt{{Domd}}\) be an alternate domain of values, and consider valuations \(\sigma : V\mapsto \mathtt{{Domd}}\) that interpret V in this new domain \(\mathtt{{Domd}}\). Each function \(f\in \varSigma \) is given an alternate meaning:
That is, if \(f\in \varSigma \), then \(\mathtt{{Semd}}(f)\) is a ternary relation on \(\mathtt{{Domd}}\). Intuitively, if we use the statement \(z = f(x,y)\) in a Program P, then we would require the existence of three values in \(\mathtt{{Domd}}\) – one value tx associated with x, a value ty associated with y, and a value tz associated with z – such that \((tx,ty,tz)\in \mathtt{{Semd}}(f)\). The alternate meaning of a SLP P is simply the conjunction of the alternate meaning of each statement.
Definition 2
(Decorated-CoS, or DCoS) A DCoS problem is an 8-tuple \((\varSigma , \mathtt{{Domp}}, \mathtt{{Semp}}, {\phi _\mathtt{{fspec}}}, n, \mathtt{{Domd}}, \mathtt{{Semd}}, {\phi _\mathtt{{dspec}}})\), where \((\varSigma , \mathtt{{Domp}}, \mathtt{{Semp}}, {\phi _\mathtt{{fspec}}}, n)\) is a CoS problem, \(\mathtt{{Domd}}\) is an alternate domain of values, \(\mathtt{{Semd}}\) is a mapping \(\varSigma \mapsto 2^{\mathtt{{Domd}}^3}\), and \({\phi _\mathtt{{dspec}}}\subseteq \mathtt{{Domd}}^2\) is an additional constraint on input x and output y. The goal is to synthesize both a straight-line program P and a valuation \(\sigma : V\mapsto \mathtt{{Domd}}\) such that P solves the component-based synthesis problem and \(\sigma \) is a model of \(\mathtt{{Semd}}(P)\) and \((\sigma (x),\sigma (y))\in {\phi _\mathtt{{dspec}}}\).
In a DCoS problem, the \(\mathtt{{Semp}}\) part could be redundant (if \({\phi _\mathtt{{fspec}}}= \mathtt{{Domp}}^2\)), or the \(\mathtt{{Semd}}\) part could be redundant (if \(\mathtt{{Semd}}(f) = \mathtt{{Domd}}^3\) and \({\phi _\mathtt{{dspec}}}= \mathtt{{Domd}}^2\)). Hence, DCoS generalizes CoS, and supports \(\mathtt{{Semd}}\)-only problem formulations too.
Note that decorations are useful to enforce nonfunctional constraints on the target program, such as a bound on the number of a component function to use.
4 Solving the Synthesis Problems
We solve the synthesis problems by converting them to an \(\exists \forall \) constraint and using an off-the-shelf \(\exists \forall \) SMT solver to solve the constraint. This approach was used in earlier work on component-based synthesis [16]. We note here that the decorated components introduce additional existential constraints, and hence, the overall synthesis constraint continues to be an \(\exists \forall \) formula.
4.1 Component-Based Program Synthesis as \(\exists \forall \)
Consider an instance of the CoS problem, depicted in Fig. 3, where, for notational convenience, we fixed \(n=9\). Synthesizing the program amounts to finding values for the 9 variables \(f_1,\ldots ,f_9\) from the set \(\varSigma \), and values for the 18 variables \(a_{11},a_{12},\ldots ,a_{91},a_{92}\) from the set \(\{0,1,\ldots ,8\}\). If the value of \(a_{ij}\) is k, then it means the j-th argument of the function call on Line i is equal to \(x_k\).
We have the following well-formedness constraint on the \(a_{ij}\) variables, which guarantees that the synthesized programs will indeed be a SLP.
With each left-hand side variable \(x_1,\ldots ,x_9\) in the program sketch in Fig. 3, we associate one first-order variable \(vx_i\), which denotes the value in \(\mathtt{{Domp}}\) of \(x_i\). The following constraint imposes consistency of \(vx_i\) values with respect to the semantics \(\mathtt{{Semp}}\).
The constraint above says that if the first argument of the functional call on Line i comes from Line j, the second argument comes from Line k, and the function on Line i is \(f\in \varSigma \), then the value \(vx_i\) should be such that \((vx_j,vx_k,vx_i)\in \mathtt{{Semp}}(f)\).
We are now ready to write our \(\exists \forall \) synthesis constraint \(\varPhi _{\exists \forall }\):
The satisfiability of \(\varPhi _{\exists \forall }\) is equivalent to the existence of an instance of the sketch in Fig. 3 that satisfies the functional requirement \(f_\mathtt{{spec}}\). Thus, we can solve the CoS problem by generating the above formula and solving it using an \(\exists \forall \) solver, as described in [16].
4.2 Decorated-Component-Based Program Synthesis as \(\exists \forall \)
Let \(tx_0, tx_1,\ldots ,tx_9\) denote new variables (interperted over \(\mathtt{{Domd}}\)) corresponding to the 10 lines in the program sketch shown in Fig. 3. (We assume program P does not assign twice to the same variable, so there is a 1–1 correspondence between program variables and program lines.) The following constraint imposes consistency of the \(\mathtt{{Domd}}\) values (assigned to the new variables) with respect to the semantics \(\mathtt{{Semd}}\).
Now, the decorated-component-based synthesis problem reduces to satisfiability of the following exists-forall formula \(\varPsi _{\exists \forall }\):
where \({\phi _\mathtt{{fspec}}}(vx_0,vx_9)\) captures the functional requirement and \({\phi _\mathtt{{dspec}}}(tx_0,tx_9)\) captures the alternate requirement.
The following claim follows from definition of the two synthesis problems and noting that \(\phi _2\) captures \(\mathtt{{Semp}}(P)\) and \(\phi _3\) captures \(\mathtt{{Semd}}(P)\).
Proposition 1
The CoS problem, respectively DCoS problem, has a solution (a desired program) iff the constraint \(\varPhi _{\exists \forall }\), respectively \(\varPsi _{\exists \forall }\), is satisfiable.
5 Component-Based Synthesis Using \(\exists \) Solving
Our main result is that, in some cases, given a CoS problem, one can design a decoration for the components that is an “abstraction” of its primary semantics. Such a decoration allows us to completely ignore the main functional specification while performing synthesis. Since the function specification was the only source of \(\forall \) in the synthesis constraint, the synthesis constraint simplifies to an \(\exists \) constraint, which can be solved using standard SMT solvers [24, 30].
Consider any program P. Let V be the set of program variables in P, and let V be partitioned into \(I\uplus O\), where I are the input variables, and O are the variables defined in P. Let \(\mathtt{{PSp}}\) denote the set of all program states \(\mathtt{{Domp}}^V\), and let \(\mathtt{{PSd}}\) denote the set of all alternate states \(\mathtt{{Domd}}^V\). A concretization function \(\gamma \) is a mapping from the set \(\mathtt{{PSd}}\) to the powerset \(2^\mathtt{{PSp}}\).
Definition 3
A set \(Sd \subseteq \mathtt{{PSd}}\) is an abstraction of a set \(Sp \subseteq \mathtt{{PSp}}\) with respect to a concretization function \(\gamma \) and a subset \(W\subseteq V\) of variables, if
where \(X|_Y\) denotes the projection of the set X onto the Y components (that is, we consider assignments to the variables in Y and ignore the other variables).
Remark 1
In sharp contrast to Definition 3, recall that in the usual notion of abstraction, we say \(Sp_2\) is an abstraction of \(Sp_1\) if \(Sp_1 \subseteq \bigcup _{s\in Sp_2}\gamma (s)\).
Definition 4
The alternate semantics \(\mathtt{{Semd}}\) is an abstraction of the primary semantics \(\mathtt{{Semp}}\) in a program P if there exists a concretization function \( \gamma : \mathtt{{PSd}}\rightarrow 2^{\mathtt{{PSp}}} \) such that for every \(Sp\subseteq \mathtt{{PSp}}\), for every \(Sd\subseteq \mathtt{{PSd}}\), if Sd is an abstraction of Sp w.r.t \(\gamma \) and I, then \(\mathtt{{Semd}}(P)(Sd)\) is an abstraction of \(\mathtt{{Semp}}(P)(Sp)\) w.r.t \(\gamma \) and \(I\uplus O\).
Remark 2
The definition of abstraction of programs given in Definition 4 is similar to the usual notion of abstraction: if we start with an abstraction of initial states, and apply the abstract transformer, we should get back an abstraction of the concretely transformed initial states. Definition 4 says the same thing, but with the difference that we restrict to the set I when checking abstraction on the initial states, and use the new notion of when a set of alternate program states is said to abstract a set of primary program states (Definition 3).
We note that Definition 4 allows us to compose programs while preserving abstractions if the composed programs modify a disjoint set of variables. More precisely, if the decoration of \(P_1\) is an abstraction, and the decoration of \(P_2\) is an abstraction, then the decoration of \(P_1;P_2\) is an abstraction too, under the assumption that \(P_2\) does not change the value of any variable in \(P_1\) (and only treats those values as its inputs).
The main point of having a decoration that is an abstraction is that now, if we can find an interpretation for the program in the alternate semantics, then we know the program is functionally correct in its primary semantics.
Theorem 1
Let \({\phi _\mathtt{{fspec}}}\) and \({\phi _\mathtt{{dspec}}}\) be primary and alternate specifications such that \(\gamma (\{\sigma \mid (\sigma (x_0),\sigma (x_9))\in {\phi _\mathtt{{dspec}}}\}) \subseteq \{\sigma \mid (\sigma (x_0),\sigma (x_9))\in {\phi _\mathtt{{fspec}}}\}\). If \(\mathtt{{Semd}}\) is an abstraction of \(\mathtt{{Semp}}\) (as in Definition 4) with respect to the concretization function \(\gamma \), then, whenever \({\phi _\mathtt{{dspec}}}\) holds in P, then \({\phi _\mathtt{{fspec}}}\) holds in P.
The main consequence of Theorem 1 is that now we can solve a CoS problem, which is an \(\exists \forall \) problem, by checking satisfiability of an existentially quantified constraint (no quantifier alternation). We can do this only if we have a decoration \(\mathtt{{Semd}}\) that is an abstraction of \(\mathtt{{Semp}}\). Given such an \(\mathtt{{Semd}}\), we can solve the CoS problem by checking satisfiability of the following existential formula \(\varPhi _{\exists \exists }\):
This formula is the same as \(\varPsi _{\exists \forall }\), but with all references to \(vx_0,\ldots ,vx_9\) removed. Since these were the only universally quantified variables, we get rid of the \(\forall \) and get the above quantifier-alternation-free synthesis constraint, which can be solved using existing Satisfiability Modulo Theory (SMT) solvers [24, 30].
Theorem 1 can be viewed a “weak” form of duality because it constructs an \(\exists \) formula that implies a \(\forall \), but not vice-versa. Also, it must be understood as a template for meta-theorems that argue that a given decoration enables \(\exists \exists \) synthesis, such as the one that we used in our example of Sect. 2.
If we use enumeration over all possible values to check a \(\forall \) verification condition, we may find a violation of the \(\forall \) formula after some finite search and thus, we may find a bug. If we use enumeration over all possible values to check the sufficient \(\exists \) formula, we may find a suitable valuation of the exists-variables after some finite search, and thus we may find a proof (for the \(\forall \) formula). Hence, our notion of abstraction here, and the resulting weak duality in Theorem 1 has an interesting use in program verification: it replaces a “search for bugs” approach (violation of \(\forall \)) by a “search for proofs” approach (satisfiability of the dual \(\exists \)). One may wonder if program analysis community has ever implicitly used Theorem 1 to perform verification. The answer is yes: template-based methods for verification, also called constraint-based verification [17, 31], are an instance of the weak duality principle. We next outline a template-based technique to construct abstract decorations, which can be used to solve CoS problems.
5.1 Constructing Abstract Decorations
We describe a generic approach for constructing abstract decorations. Note that we followed this recipe when constructing the decoration for our secure multiplication example in Sect. 2. Let us assume we have an abstract domain \(\mathtt{{PSa}}\); for example, one over which we could have created an abstract interpreter, or performed predicate abstraction. Let us see how we would generate decorations from \(\mathtt{{PSa}}\). Let us say we have proof rules that generate valid Hoare triples \(\{\phi _1\}P\{\phi _2\}\) over the abstract states, where P is a program, \(\phi _1,\phi _2\) are elements of \(\mathtt{{PSa}}\). Now, to define the decoration \(\mathtt{{Semd}}\), we first parameterize the elements of \(\mathtt{{PSa}}\). Say, we have a template \(\varPhi (\varvec{u})\) that contains parameters \(\varvec{u}\) such that
In other words, we can generate all abstract program states by instantiating the parameters \(\varvec{u}\) from the set \(\mathtt{{Domd}}\). The set \(\mathtt{{Domd}}\) forms our alternate domain. If \(l1,l2,\ldots \) are all the program locations (nodes in the program graph), then \(\varvec{u}_{l1},\varvec{u}_{l2},\ldots \) are our new program variables that are interpreted over \(\mathtt{{Domd}}\). Finally, we need to define the alternate meaning \(\mathtt{{Semd}}\) for each program statement. This is achieved by considering proof rules comprising of valid Hoare triples \(\{\phi _1\}z:=f(x,y)\{\phi _2\}\), and trying to generate a constraint \(\psi _f(\varvec{u},\varvec{v})\) such that
If we can find such a \(\psi _f\) (not equivalent to \( false \)) forall \(f\in \varSigma \), then \((\psi _f)_{f\in \varSigma }\) defines \(\mathtt{{Semd}}\). By construction, \(\mathtt{{Semd}}\) is an abstraction of \(\mathtt{{Semp}}\). An example of this process of constructing an abstract \(\mathtt{{Semd}}\) can be found in Fig. 2.
We would like to emphasize two points here. First, the task of constructing an abstract decoration \(\mathtt{{Semd}}\) will not succeed always, because we may not find such \(\psi _f\). Second, while abstract decorations are a powerful concept, decorations that are not abstractions of \(\mathtt{{Semp}}\) also prove to be immensely useful, especially in the application to synthesis, where they can be used to capture nonfunctional properties. This latter use of decorations was explored in [32], and reused here.
6 Cryptographic Schemes
In this section, we present some examples of cryptographic schemes that we synthesized using the DCoS framework. These are summarized in Table 1. Our synthesis tool takes as input a program sketch, such as the one in Sect. 2, multiple primal semantics (\(\mathtt{{Semp}}\)), and multiple decorations (\(\mathtt{{Semd}}\)) on components, along with requirements specified on these semantics. It solves the synthesis problem by generating and solving either the \(\varPhi _{\exists \forall }\) formula (in case there are some primal semantics) or the \(\varPhi _{\exists \exists }\) formula (in case there are only decorations on components). Our tool, along with all the examples and corresponding SMT instances, is available at [14].
6.1 Block Cipher Modes
Block ciphers are keyed, invertible functions that map a fixed length bit string (say 128 bits) to a random bit string of the same length. A block cipher mode Enc uses a block cipher to encrypt messages longer than this fixed length. We have two requirements: correctness of Enc, which is captured by the existence of a decryption algorithm Dec such that \(\forall k, m : Dec_k( Enc_k( m ) ) = m\), and security, which is expressed by the fact that no adversary with oracle access to Enc is able to learn anything about random ciphertexts.
Malozemoff et al. [22] proposed an algorithm for synthesis of block cipher modes that follows the enumerate-and-check paradigm. The algorithm proceeds by carefully enumerating candidate straight-line programs and checking correctness and security for each of them. The security property is approximated using a labeling system that guarantees that if a candidate straight-line program can be labeled satisfying certain constraints, then it implements a secure block cipher. The search for the existence of a correct labeling is then implemented using an SMT solver. Regarding correctness, the authors propose a fix-point algorithm analogous to our encoding of decryptability check as a decoration; see Fig. 4.
We used both the \(\exists \forall \) and the \(\exists \exists \) approach to synthesize block cipher modes, which highlights the flexibility of the DCoS framework. Our formulation of the problem is analogous to the one in [22]; that is, our sketches do not provide additional “hints” to the synthesis tool. In the \(\exists \forall \) approach, we specify correctness directly using a primal semantics; that is, we synthesize (\(\exists \)) both an encryption scheme Enc and a decryption scheme Dec such that for all (\(\forall \)) input messages m, \(Dec( Enc( m ) ) = m\). By having a primal semantics for specifying correctness, and a decoration for specifying security, we solved the synthesis problem by generating and solving the \(\varPsi _{\exists \forall }\) formula shown in Sect. 4. The \(\exists \forall \) approach has two main drawbacks: first, solving \(\varPsi _{\exists \forall }\) turned out to be expensive because it required us to synthesize two programs at once, Dec and Enc. Moreover, it requires us to specify primal semantics for the block cipher function itself. This is not ideal, since a bad choice might be a source of unsoundness in the decryptability check.
The new \(\exists \exists \) approach, enabled by Theorem 1, addresses both these issues. The crucial observation is that the correctness check used in [22] is in fact an instance of the weak duality of Theorem 1, and hence it can be encoded as a second decoration. Hence, to ensure correctness, it is not necessary to synthesize a decryption scheme, but instead check for “decryptability” (Fig. 4). The new \(\exists \exists \) approach resulted in a reduction in running time from \(\sim \! 100\) s (using the \(\varPsi _{\exists \forall }\) approach) to \(\sim \! 1\) s, to synthesize well-known encryption schemes such as CBC, OFB, CFB, OFB, and PCBC. Moreover, another benefit is that we can leverage the incremental solving capabilities of SMT solvers, such as Yices and Z3, to efficiently find hundreds of variants of block cipher modes. Our \(\exists \exists \) approach found hundreds of correct modes of operation in less than 5 min on a regular laptop, including all the common ones mentioned above.
6.2 Secure Multiplication
In Sect. 2, we presented an application of our synthesis methodology to synthesize a secure 2-party computation multiplication protocol. The synthesis time for the sketch described in Sect. 2 is \(\sim \! 50\) s. As explained above, our synthesis tool takes as input a sketch of the solution, i.e. a description of a finite family of protocols \(\mathcal{F}\) in this case, and searches for a protocol \(P\in \mathcal{F}\) that satisfies the requirements.
Figure 5 reports running times and approximated search space size, i.e. \(|\mathcal{F}|\), for 30 variants of our sketch presented in Sect. 2.
The first 15 variants of our sketch are satisfiable (blue trace in the plot), and were obtained in the following way: we started from a sketch whose only completion is the solution reported in Sect. 2; that is, \(|\mathcal{F}| = 1\), and then increasingly relaxed it until we obtained a most general one.
Hence, the leftmost data point of the satisfiable instances corresponds to simply a verification check. The second one corresponds to a sketch where everything is fixed but the first line of A’s program. In subsequent data points (3)–(15), the part of the protocol to be determined is (3) messages from C, (4) messages from C and B, (5) arithmetic operations in A and B, (6) arithmetic operations in C, and messages from C and B, (7) arithmetic operations, (8) arithmetic operations and messages from B, (9) arithmetic operations and messages from C and A, (10) arithmetic operations and messages from C and B, (11) arithmetic operations in A and B, and program for C, (12) arithmetic operations in A, and programs for C and B, (13) everything but first line of A’s program, and programs for C and B, (14) programs for A, B, and C, (15) programs for A, B, and C, letting A have a total of 4 lines.
The unsatisfiable instances are obtained from (1)–(15) by adding the additional restriction that C cannot use multiplication. This prevents C from generating appropriately correlated random data, which results in unsatisfiable sketches.
Although it is difficult to make definitive statements about the behaviour of SMT solvers, the plot in Fig. 5 confirms a tendency that we have often observed: our approach scales well for satisfiable instances, and hence using general sketches spanning a large \(\mathcal{F}\) is fine as long as \(\mathcal{F}\) contains a solution. On the other hand, if the synthesis problem is not realizable, proving so for large families of programs may not scale well.
6.3 Oblivious Transfer
In the two-party version of oblivious transfer (OT), one party, the Sender, has two messages \(m_0\) and \(m_1\), and the other party, the Chooser, can pick which message she wants to receive. The goal of oblivious transfer is to achieve this transfer of message from Sender to the Chooser, but with the requirements that (a) the Sender does not learn the choice made by the Chooser, and (b) the Chooser does not learn the content of the other message (that was not chosen).
We wish to base the protocol on the decisional Diffie-Hellman (DDH) assumption [6]: given a cyclic group with generator g, the DDH assumption states that \((g^a,g^b,g^{ab})\) is computationally indistinguishable from \((g^a,g^b,g^c)\) for randomly and independently chosen elements a, b, c from \(\mathbb {Z}\). We provide a sketch to the synthesis tool that consists of four blocks of straight-line code (executed by Sender, Chooser, Sender, Chooser in turns), where the Sender and the Chooser are allowed use of upto 3 random numbers each.
While approches based on \(\exists \forall \) paradigm timed out due to the complexity of the protocol, we were able to perform \(\exists \exists \) synthesis by using only suitably designed decorations. We synthesized two different OT protocols: the first one was also recently reported in [8], and the second one is the well-known Naor-Pinkas protocol [25]. The solutions were obtained in about 1 and 100 seconds, respectively, on a regular laptop using Yices as backend solver.
Due to technical difficulties in formalizing the security requirements, we used approximate requirements that eliminated a large number insecure protocols, but not necessarily all of them. Consequently, there is a need here for a posteriori verification of security of the synthesized scheme (using other verification tools; such as, Easycrypt [4]). Program synthesis, however, remains a fast and effective tool to quickly generate plausible schemes.
7 Conclusion
We formulated the decorated-component-based synthesis framework and showed how component decorations can be used to enable a weak duality principle, which allows us to replace a desired \(\forall \) check by a stronger \(\exists \) check. Besides its applications to speed up program synthesis, it is important to recognize the use of this duality principle in different verification techniques, such as constraint-based verification [17, 31]. Decorations can abstract the concrete meaning, and thus provide sufficient checks for functional properties. They can also be unrelated to the concrete meaning, and encode nonfunctional properties of programs.
It is worth emphasizing that decorations are not abstract interpreters [9]: in abstract interpretation, assertion checking is still a “forall” check (just over abstract values). In contrast, decorations on components behave as constraints, and hence our extension of primal semantics with decorations has flavors of constraint programming [11] and combining inductive and co-inductive constructs [1].
Exploring extension of DCoS to programs with loops, designing decorations to encode more sophisticated proof systems, and studying algebraic properties of decorations remain future challenges.
References
Abel, A., Pientka, B., Thibodeau, D., Setzer, A.: Copatterns: programming infinite structures by observations. In: 40th ACM Symposium Principles of Programming Languages POPL (2013)
Alur, R., Bodík, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-guided synthesis. In: Formal Methods in Computer-Aided Design, FMCAD, pp. 1–17 (2013)
Barthe, G., Crespo, J.M., Kunz, C., Schmidt, B., Gregoire, B., Lakhnech, Y., Zanella-Beguelin, S.: Fully automated analysis of padding-based encryption in the computational model (2013). http://www.easycrypt.info/zoocrypt/
Barthe, G., Dupressoir, F., Grégoire, B., Kunz, C., Schmidt, B., Strub, P.-Y.: EasyCrypt: A Tutorial. In: Aldini, A., Lopez, J., Martinelli, F. (eds.) FOSAD 2012-2013. LNCS, vol. 8604, pp. 146–166. Springer, Cham (2014). doi:10.1007/978-3-319-10082-1_6
Bogdanov, D., Laur, S., Willemson, J.: Sharemind: a framework for fast privacy-preserving computations. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 192–206. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88313-5_13
Boneh, D.: The decision Diffie-Hellman problem. In: Buhler, J.P. (ed.) ANTS 1998. LNCS, vol. 1423, pp. 48–63. Springer, Heidelberg (1998). doi:10.1007/BFb0054851
Carmer, B., Rosulek, M.: Linicrypt: a model for practical cryptography. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9816, pp. 416–445. Springer, Heidelberg (2016). doi:10.1007/978-3-662-53015-3_15
Chou, T., Orlandi, C.: The simplest protocol for oblivious transfer. Cryptology ePrint Archive, Report 2015/267 (2015). http://eprint.iacr.org/
Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: 4th ACM Symposium on Principles of Programming Languages, POPL, pp. 238–252 (1977)
Du, W., Atallah, M.J.: Protocols for secure remote database access with approximate matching. In: Ghosh, A.K. (ed.) E-Commerce Security and Privacy, pp. 87–111. Springer, Heidelberg (2001)
Felgentreff, T., Millstein, T., Borning, A., Hirschfeld, R.: Checks and balances: constraint solving without surprises in object-constraint programming languages. In: Proceedings Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA (2015)
Frankle, J., Osera, P., Walker, D., Zdancewic, S.: Example-directed synthesis: a type-theoretic interpretation. In: POPL, pp. 802–815. ACM (2016)
Gascón, A., Subramanyan, P., Dutertre, B., Tiwari, A., Jovanovic, D., Malik, S.: Template-based circuit understanding. In: Formal Methods in Computer-Aided Design, FMCAD, pp. 83–90. IEEE (2014)
Gascón, A., Tiwari, A.: Synudic: synthesis using dual interpretation on components (2016). https://github.com/adriagascon/synudic
Gilboa, N.: Two party RSA key generation. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 116–129. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1_8
Gulwani, S., Jha, S., Tiwari, A., Venkatesan, R.: Synthesis of loop-free programs. In: Proceedings of ACM Conference on Programing Language Design and Implementation PLDI, pp. 62–73 (2011)
Gulwani, S., Srivastava, S., Venkatesan, R.: Program analysis as constraint solving. In: Proceedings of ACM Conference on Programming Language Design and Implementation, PLDI, pp. 281–292 (2008)
Hoang, V., Katz, J., Malozemoff, A.: Automated analysis and synthesis of authenticated encryption schemes. In: ACM CCS (2015)
Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided component-based program synthesis. In: Proceedings of ICSE, vol. 1, pp. 215–224. ACM (2010)
Kneuss, E., Kuraj, I., Kuncak, V., Suter, P.: Synthesis modulo recursive functions. In: OOPSLA, pp. 407–426. ACM (2013)
Lindell, Y.: How to simulate it - a tutorial on the simulation proof technique. Cryptology ePrint Archive, Report 2016/046 (2016). http://eprint.iacr.org/2016/046
Malozemoff, A.J., Katz, J., Green, M.D.: Automated analysis and synthesis of block-cipher modes of operation. In: IEEE 27th Computer Security Foundations Symposium, CSF, pp. 140–152. IEEE (2014)
Manna, Z., Waldinger, R.J.: Toward automatic program synthesis. Commun. ACM 14(3), 151–165 (1971)
Microsoft Research: Z3: an efficient SMT solver. http://research.microsoft.com/projects/z3/
Naor, M., Pinkas, B.: Efficient oblivious transfer protocols. In: Proceedings of 12th ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 448–457 (2001)
Polikarpova, N., Kuraj, I., Solar-Lezama, A.: Program synthesis from polymorphic refinement types. In: PLDI, pp. 522–538. ACM (2016)
Smith, C., Albarghouthi, A.: Mapreduce program synthesis. In: PLDI, pp. 326–340. ACM (2016)
Solar-Lezama, A., Rabbah, R.M., Bodík, R., Ebcioglu, K.: Programming by sketching for bit-streaming programs. In: PLDI (2005)
Solar-Lezama, A., Tancau, L., Bodík, R., Saraswat, V., Seshia, S.: Combinatorial sketching for finite programs. In: ASPLOS (2006)
SRI International: Yices: an SMT solver. http://yices.csl.sri.com/
Srivastava, S., Gulwani, S., Foster, J.S.: Template-based program verification and program synthesis. STTT 15(5–6), 497–518 (2013)
Tiwari, A., Gascón, A., Dutertre, B.: Program synthesis using dual interpretation. In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 482–497. Springer, Cham (2015). doi:10.1007/978-3-319-21401-6_33
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gascón, A., Tiwari, A., Carmer, B., Mathur, U. (2017). Look for the Proof to Find the Program: Decorated-Component-Based Program Synthesis. In: Majumdar, R., Kunčak, V. (eds) Computer Aided Verification. CAV 2017. Lecture Notes in Computer Science(), vol 10427. Springer, Cham. https://doi.org/10.1007/978-3-319-63390-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-63390-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63389-3
Online ISBN: 978-3-319-63390-9
eBook Packages: Computer ScienceComputer Science (R0)