Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Software Product Line (SPL) [5] is an efficient method for systematic development of a family of related programs, known as variants (valid products), from a common code base. Each variant is specified in terms of features (statically configured options) selected for that particular variant. While there are different implementation strategies, many popular SPLs from system software (e.g. Linux kernel) and embedded software (e.g. cars, phones) domains [18] are implemented using a simple form of two staged computation in preprocessor style, where the programming language is extended with conditional compilation constructs (e.g. \(\texttt {\#ifdef}\) annotations from C preprocessor). At build time, the program family is first configured and a variant describing a particular product is derived by selecting a set of features relevant for it, and only then the derived variant is compiled or interpreted. One of the advantages of preprocessors is that they are mostly independent of the object language and can be applied across paradigms.

Benefits from using program families (SPLs) are multiple: productivity gains, shorter time to market, and greater market coverage. Unfortunately, the complexity created by program families (variability) also leads to problems. The simplest brute-force approach to verify such program families is to use a preprocessor to generate all valid products of an SPL, and then apply an existing single-program verification technique to each resulting product. However, this approach is very costly and often infeasible in practice since the number of possible products is exponential in the number of features. Therefore, we seek for new approaches that rely on finding compact mathematical structures, which take the variability within the family into account, and on which specialized variability-aware verification algorithms can be applied.

In this work, we address the above challenges by using game semantics models. Game semantics [1, 14] is a technique for compositional modelling of programming languages, which gives models that are fully abstract (sound and complete) with respect to observational equivalence of programs. It has mathematical elegance of denotational semantics, and step-by-step modelling of computation in the style of operational semantics. In the last decade, a new line of research has been pursued, known as algorithmic game semantics, where game semantics models are given certain kinds of concrete automata-theoretic representations [8, 12, 16]. Thus, they can serve as a basis for software model checking and program analysis. The most distinctive property of game semantics is compositionality, i.e. the models are generated inductively on the structure of programs. This is the key to achieve scalable (modular) verification, where a larger program is broken down into smaller program fragments which can be modeled and verified independently. Moreover, game semantics yields a very accurate model for any open program with free (undefined) identifiers such as calls to library functions.

In [9], a symbolic representation of algorithmic game semantics has been proposed for second-order Idealized Algol (IA\(_2\)). It redefines the (standard) regular-language representation [12] at a more abstract level by using symbolic values instead of concrete ones. This allows us to give a compact representation of programs with infinite integers by using finite-state symbolic automata. Here, we extend the symbolic representation of game semantics models, obtaining so-called featured symbolic automata, which are used to compactly represent and verify safety properties of program families.

Motivating Example. To better illustrate the issues we are addressing in this work, we now present a motivating example. Table 1 shows a simple program family M that contains two \(\texttt {\#if}\) commands. They increase and decrease the local variable x by the value of a non-local expression n, depending on the enabled features. The program uses features \(\mathbb {F}=\{ A, B \}\) and we assume it has the following set of valid configurations \(\mathbb {K}= \{ A \!\wedge \! B, A \!\wedge \! \lnot B, \lnot A \!\wedge \! B, \lnot A \!\wedge \! \lnot B \}\). For each valid configuration a different single program can be generated by appropriately resolving the \(\texttt {\#if}\) commands. For example, the single program corresponding to the valid configuration \(A \!\wedge \! B\) will have both features A and B enabled (set to true), which will make both assignment commands in \(\texttt {\#if}\)-s to be present in the program. Programs for \(A \!\wedge \! \lnot B\) and for \(\lnot A \!\wedge \! B\) are different in one assignment command only, the earlier has the feature A enabled and the command \(x :=x+n\), whereas the latter has the feature B enabled and the command \(x :=x-n\). Programs corresponding to all valid configurations are illustrated in Table 1. Thus, to verify our family M we need to build and analyze models of four distinct, but very similar, programs.

Table 1. Motivating example: the program family M and its valid products
Fig. 1.
figure 1

Automata for valid products of M.

We show in Fig. 1, the standard regular-language representation of game semantics for these four programs where concrete values are used [12]. We can see that we obtain regular-languages with infinite summations (i.e. infinite-state automata), since we use infinite integers as data type. Hence, they can be used for automatic verification only if the attention is restricted to finite data types. For example, the model for the product \(A \!\wedge \! \lnot B\) in Fig. 1c illustrates the observable interactions of this term of type \(\mathsf {com}\) with its environment consisting of free identifiers n and abort. So in the model are only represented moves associated with types of n and abort (which are tagged with superscripts n and abort, respectively) as well as with the top-level type \(\mathsf {com}\) of this term. The environment (Opponent) starts the execution of the term by playing the move \(\textit{run}\); when the term (Player) asks for the value of n with the move \(\textit{q}^n\), the environment can provide any integer as answer. If the answer is 1, the abort is run; otherwise the term terminates successfully by reaching the accepting state (shown as double circle in the model). Note that each move represents an observable action that a term of a given type can perform. Thus, for commands we have a move \(\textit{run}\) to initiate a command and a move \(\textit{done}\) to signal successful termination of a command, whereas for expressions we have a move \(\textit{q}\) to ask for the value of an expression and an integer move to answer the question \(\textit{q}\).

If we represent the data at a more abstract level and use symbolic values instead of concrete ones, the game models of these four programs can be represented more compactly by finite-state symbolic automata (SA) as shown in Fig. 2. Every letter (label of transition) contains a move and a Boolean condition which represents a constraint that needs to be fulfilled in order for the corresponding move to be performed. Note that so-called input symbols of the form ?N are used for generating new fresh symbolic names, which bind all occurrences of the symbol N that follow in the play until a new input symbol ?N is met. The symbol X is used to keep track of the current value of the local variable x. For example, the answer to the question \(\textit{q}^n\) asked by the term for \(A \wedge \lnot B\) in Fig. 2c now is a newly instantiated symbol N. If the value of N is 1, the abort command is run. We say that “\(X_1=0 \!\wedge \! X_2=X_1\!+\!N \!\wedge \! X_2 = 1\)” is a play condition for the play in Fig. 2c: \([ X_1=0,\textit{run} \rangle \cdot \textit{q}^n \cdot N^n \cdot [ X_2=X_1\!+\!N \!\wedge \! X_2=1, \textit{run}^{abort} \rangle \cdot \textit{done}^{abort} \cdot \textit{done}\). This play is obtained from: \([ ?X\!=\!0,\textit{run} \rangle \cdot \textit{q}^n \cdot ?N^n \cdot [ ?X\!=\!X\!+\!N \!\wedge \! X\!=\!1, \textit{run}^{abort} \rangle \cdot \textit{done}^{abort} \cdot \textit{done}\), after instantiating its input symbols with fresh symbolic names. We say that one play is feasible, only if its play condition is satisfiable (i.e. there exist concrete assignments to symbols that make that condition true). This can be checked by calling an SMT solver.

Fig. 2.
figure 2

Symbolic automata for valid products of M.

Now, by further enriching letters with feature expressions (propositional formulae defined over the set of features), we can give a more compact single representation of the above related programs to exploit the similarities between them. The feature expression associated with a letter denotes for which valid configurations that letter (in fact, the corresponding move) is feasible. Thus, we can represent all products of M by one compact featured symbolic automaton (FSA) as shown in Fig. 3, which is variability-aware extension of the symbolic automata in Fig. 2. From this model, by exploring all states we can determine for each valid product whether an unsafe behaviour (one that contains abort moves) can be exercised. If we find such an unsafe play for a valid product, then we need to check that the play is feasible. If its play condition is satisfiable, the SMT solver will return concrete assignments to symbols which make that condition true. In this way, we will generate a concrete counterexample for a valid product. In our example, we can determine that the product \(\lnot A \!\wedge \! \lnot B\) is safe, whereas products \(A \!\wedge \! B\), \(A \!\wedge \! \lnot B\), and \(\lnot A \!\wedge \! B\) are unsafe with concrete counterexamples: \(\textit{run}\cdot \! \textit{q}^{n} \!\cdot \! 1^{n} \!\cdot \! \textit{q}^{n} \!\cdot \! 0^{n} \!\cdot \! \textit{run}^{abort} \!\cdot \! \textit{done}^{abort} \!\cdot \! \textit{done}\), \(\textit{run}\cdot \! \textit{q}^{n} \!\cdot \! 1^{n} \!\cdot \! \textit{run}^{abort} \!\cdot \! \textit{done}^{abort} \!\cdot \! \textit{done}\), and \(\textit{run}\cdot \! \textit{q}^{n} \!\cdot \! -1^{n} \!\cdot \! \textit{run}^{abort} \!\cdot \! \textit{done}^{abort} \!\cdot \! \textit{done}\), respectively.

Fig. 3.
figure 3

Featured symbolic automaton for the program family M.

Remark. Alternatively, a program family can be verified by generating a so-called family simulator [2], which is a single program where \(\texttt {\#if}\) commands (compile-time variability) are replaced with normal \(\texttt {if}\) commands and available features are encoded as free (undefined) identifiers. Then the classical (single-system) model checking algorithms [8, 9] can be used to verify the generated simulator, since it represents a single program. In case of violation, we will obtain a single counterexample that corresponds to some unsafe products. However, this answer is incomplete (limited) for program families since there might be some safe products and also there might be other unsafe products with different counterexamples. Hence, no conclusive results for all products in a family are reported using this approach. For example, the simulator for the family M is:

figure a

If we generate a (game) model for this term and verify it using algorithms in [8, 9], we will obtain a counterexample corresponding only to the product \(A \wedge \lnot B\).

This leads us to propose an approach that solves the general family-based model checking problem: determine for each product whether or not it is safe, and provide a counterexample for each unsafe product.

Contributions. In this paper, we make the following contributions:

  • We introduce a compact symbolic representation, called featured symbolic automata, which represent game semantics of so-called annotative program families. That is, program families which are implemented by annotating program parts that vary using preprocessor directives.

  • We propose specifically designed (family-based) model checking algorithms for verifying featured symbolic automata that represent program families. This allows us to verify safety for all products of a family at once (in a single execution), and to pinpoint the products that are unsafe (resp., safe).

  • We describe a prototype tool implementing the above algorithms, and we perform an evaluation to demonstrate the improvements over the brute-force approach where all valid products are verified independently one by one.

2 The Language for Program Families

The standard approach in semantics community is to use meta-languages for the description of certain kinds of computational behaviour. The semantic model is defined for a meta-language, and a real programming language (C, ML, etc.) can be studied by translating it into this meta-language and using the induced model. We begin this section by presenting a meta-language for which algorithmic game semantics can be defined, and then we introduce static variability into it.

Writing Single Programs. We consider the meta-language: Idealized Algol (IA) introduced by Reynolds in [17]. It is a compact language which combines call-by-name typed \(\lambda \)-calculus with the fundamental imperative features and locally-scoped variables. We work with its second-order recursion-free fragment (IA\(_2\) for short), because game semantics of this fragment has algorithmic properties.

The data types D are integers and booleans (\(D\,{:}{:=}\, \mathsf {int} \, \mid \, \mathsf {bool}\)). We have base types B (\(B \,{:}{:=}\, \mathsf {exp} D \, \mid \, \mathsf {com} \, \mid \, \mathsf {var} D \)) and first-order function types T (\(T \,{:}{:=}\, B \mid B \rightarrow T\)). The syntax of the language is given by:

$$ \begin{array}{l} M \,{:}{:=}\, x \! \mid \! v \! \mid \! \mathsf {skip} \! \mid \! \mathsf {diverge} \! \mid \! M\, \mathsf {op} \, M \mid M ; \, M \mid \! \mathsf {if}\,M\,\mathsf {then}\,M \,\mathsf {else}\, M \! \mid \! \mathsf {while}\,M\,\mathsf {do}\,M \\ \qquad \ \mid M := M \mid \, !M \mid \mathsf {new}_D \, x\!:=\!v \, \mathsf {in} \, M \mid \mathsf {mkvar}_D MM \mid \! \lambda x . M \mid MM \end{array} $$

where v ranges over constants of type D, which includes integers (n) and boolean (ttff). The standard arithmetic-logic operations \(\mathsf {op}\) are employed, as well as the usual imperative and functional constructs. Well-typed terms are given by typing judgements of the form \(\varGamma \vdash M : T\), where \(\varGamma \) is a type context consisting of a finite number of typed free identifiers. Typing rules are given in [1, 17].

The operational semantics is defined by a big-step reduction relation: \(\varGamma \vdash M,\mathrm {s} \Longrightarrow V,\mathrm {s}'\), where \(\varGamma \vdash M:T\) is a term in which all free identifiers from \(\varGamma \) are variables, and \(\mathrm {s}\), \(\mathrm {s}'\) represent the state before and after reduction. The state is a function assigning data values to the variables in \(\varGamma \). Canonical forms (values) are defined by \(V \,{:}{:=} \, x \, \mid \, v \, \mid \, \lambda x. M \, \mid \, \mathsf {skip} \, \mid \, \mathsf {mkvar}_{D}MN\). Reduction rules are standard (see [1, 17] for details). If M is a closed term (with no free identifiers) of type \(\mathsf {com}\), then we abbreviate the relation \(M, \emptyset \Longrightarrow \mathsf {skip},\emptyset \) with \(M \Downarrow \). We say that a term \(\varGamma \vdash M:T\) is an approximate of a term \(\varGamma \vdash N:T\), written , if and only if for all terms-with-hole \(C[-]:\mathsf {com}\), such that \(\vdash C[M]:\mathsf {com}\) and \(\vdash C[N]:\mathsf {com}\) are well-typed closed terms of type \(\mathsf {com}\), if \(C[M]\Downarrow \) then \(C[N]\Downarrow \). If two terms approximate each other they are considered observationally-equivalent, denoted by \(\varGamma \vdash M \cong N\).

Writing Program Families. We use a simple form of two-staged computation to lift IA\(_2\) from describing single programs to program families. The first stage is controlled by a configuration k, which describes the set of features that are enabled in the build process. A finite set of Boolean variables describes the available features \(\mathbb {F}= \{ A_1, \ldots , A_n \}\). A configuration k is a truth assignment (a mapping from \(\mathbb {F}\) to \(\mathsf {bool}=\{tt,ff\}\)) which gives a truth value to any feature. If a feature \(A \in \mathbb {F}\) is enabled (included) for the configuration k, then \(k(A)=tt\). Any configuration k can also be encoded as a conjunction of propositional formulas: \(k(A_1) \cdot A_1 \wedge \ldots \wedge k(A_n) \cdot A_n\), where \(tt \cdot A = A\) and \(ff \cdot A = \lnot A\). We write \(\mathbb {K}\) for the set of all valid configurations defined over \(\mathbb {F}\) for a program family. The set of valid configurations is typically described by a feature model [5, 18], but in this work we disregard syntactic representations of the set \(\mathbb {K}\).

The language \(\overline{\text {IA}_2}\) extends IA\(_2\) with a new compile-time conditional term for encoding multiple variations of a program, i.e. different valid products. The new term “\({\texttt {\#if}}^{} ~{\phi }~ {\texttt {then}} ~{M}~ {\texttt {else}} ~{M'}\)” contains a presence condition \(\phi \) over features \(\mathbb {F}\), such that if \(\phi \) is satisfied by a configuration \(k \in \mathbb {K}\) then M will be included in the resulting product, otherwise \(M'\) will be included. The new syntax is:

$$ \begin{array}{c} M \,{:}{:=} \ldots \mid {\texttt {\#if}}^{} ~{\phi }~ {\texttt {then}} ~{M}~ {\texttt {else}} ~{M'} \qquad \qquad \phi \,{:}{:=}\, A \in \mathbb {F}\mid \lnot \phi \mid \phi \wedge \phi \end{array} $$

We add a new syntactic category of feature expressions (i.e. propositional logic formulae over \(\mathbb {F}\)), \(FeatExp(\mathbb {F})\), ranged over by \(\phi \), to write compile-time conditions over features \(\mathbb {F}\). Well-typed term families are given by typing judgements of the form \(\varGamma \vdash _{\mathbb {F}} M : T\), where \(\mathbb {F}\) is a set of available featuresFootnote 1. Typing rules are those of IA\(_2\) extended with a rule for the new construct:

$$ \frac{\varGamma \vdash M : T}{\varGamma \vdash _{\mathbb {F}} M:T} \qquad \frac{\varGamma \vdash _{\mathbb {F}} M : T \qquad \varGamma \vdash _{\mathbb {F}} M' : T \qquad \phi : FeatExp(\mathbb {F})}{\varGamma \vdash _{\mathbb {F}} {\texttt {\#if}}^{} ~{\phi }~ {\texttt {then}} ~{M}~ {\texttt {else}} ~{M'} : T} $$

The semantics of \(\overline{\text {IA}_2}\) has two stages: first, given a configuration k compute a single IA\(_2\) term without \(\texttt {\#if}\)-s; second, evaluate the IA\(_2\) term using the standard IA\(_2\) semantics. The first stage of computation (also called projection) is a simple preprocessor from \(\overline{\text {IA}_2}\) to IA\(_2\) specified by the projection function \(\pi _k\) mapping an \(\overline{\text {IA}_2}\) term family into a single IA\(_2\) term corresponding to the configuration \(k \in \mathbb {K}\). The projection \(\pi _k\) copies all basic terms of \(\overline{\text {IA}_2}\) that are also IA\(_2\) terms, and recursively pre-processes all sub-terms of compound terms. For example, \(\pi _{k}(\mathsf {skip}) = \mathsf {skip}\) and \(\pi _{k}(M ;\! M') = \pi _{k}(M) ;\! \pi _{k}(M')\). The interesting case is for the compilation-time conditional term, where one of the two alternative branches is included in the generated valid product depending on whether the configuration k satisfies (entails) the feature expression \(\phi \), denoted as \(k \,\models \,\phi \). We have: \(\pi _{k}({\texttt {\#if}}^{} ~{\phi }~ {\texttt {then}} ~{M}~ {\texttt {else}} ~{M'}) = {\left\{ \begin{array}{ll} \pi _{k}( M ) &{} \text {if} \ k \, \models \,\phi \\ \pi _{k}( M' ) &{} \text {if} \ k \, \not \models \,\phi \end{array}\right. }\). The variant of a term family \(\varGamma \vdash _{\mathbb {F}} M : T\) corresponding to the configuration \(k \in \mathbb {K}\) can now be defined as: \(\varGamma \vdash \pi _k(M) : T\).

3 Symbolic Representation of Game Semantics

In this section we first recall symbolic representation of algorithmic game semantics for IA\(_2\) [9], and then we extend this representation for \(\overline{\text {IA}_2}\).

3.1 Symbolic Models of IA\(_2\)

Let Sym be a countable set of symbolic names, ranged over by upper case letters X, Y, Z. For any finite \(W \subseteq Sym\), the function new(W) returns a minimal symbolic name which does not occur in W, and sets \(W:=W \cup new(W)\). A minimal symbolic name not in W is the one which occurs earliest in a fixed enumeration \(X_1, X_2, \ldots \) of all possible symbolic names. Let Exp be a set of expressions, ranged over by e, inductively generated by using data values (\(v \in D\)), symbols (\(X \in Sym\)), and standard arithmetic-logic operations (\(\mathsf {op}\)). We use a to range over arithmetic expressions (AExp) and b over boolean expressions (BExp).

Let \(\mathcal {A}\) be an alphabet of letters. We define a symbolic alphabet \(\mathcal {A}^{sym}\) induced by \(\mathcal {A}\) as follows: \(\mathcal {A}^{sym} = \mathcal {A} \cup \{ ?X, e \mid X \in Sym, e \in Exp \}\). The letters of the form ?X are called input symbols. They represent a mechanism for generating new symbolic names, i.e. ?X means \(\mathsf {let} \, X=new(W) \, \mathsf {in} \, X \, \ldots \). We use \(\alpha \) to range over \(\mathcal {A}^{sym}\). Next we define a guarded alphabet \(\mathcal {A}^{gu}\) induced by \(\mathcal {A}\) as the set of pairs of boolean conditions and symbolic letters: \(\mathcal {A}^{gu} = \{ [ b, \alpha \rangle \mid b \in BExp, \alpha \in \mathcal {A}^{sym} \}\). A guarded letter \([ b, \alpha \rangle \) means that \(\alpha \) occurs only if b evaluates to true, i.e. \(if \, (b=tt) \, then \, \alpha \, else \, \emptyset \). We use \(\beta \) to range over \(\mathcal {A}^{gu}\). We will often write only \(\alpha \) for the guarded letter \([ tt, \alpha \rangle \). A word \([ b_1, \alpha _1 \rangle \cdot [ b_2, \alpha _2 \rangle \ldots [ b_n, \alpha _n \rangle \) over \(\mathcal {A}^{gu}\) can be represented as a pair \([ b, w \rangle \), where \(b=b_1 \wedge b_2 \wedge \ldots \wedge b_n\) is a boolean condition and \(w=\alpha _1 \cdot \alpha _2 \ldots \alpha _n\) is a word of symbolic letters.

Now, we show how IA\(_2\) terms in \(\beta \)-normal form are interpreted by symbolic regular languages and automata, which will be specified by extended regular expressions R. Each type T is interpreted by a guarded alphabet of moves \(\mathcal A_{[\! [T ]\! ]}^{gu}\) induced by \(\mathcal A_{[\! [T ]\! ]}\), which is defined as follows:

$$ \begin{array}{l} \mathcal {A}_{[\! [\mathsf {exp}D ]\! ]} \!\!=\!\! \{ \textit{q}\} \!\cup \! \mathcal {A}_{[\! [D ]\! ]}, \mathcal {A}_{[\! [\mathsf {com} ]\! ]} \!\!=\!\! \{ \textit{run},\! \textit{done}\}, \mathcal {A}_{[\! [\mathsf {var}D ]\! ]} \!\!=\!\! \{ \textit{write}(a), \textit{read}, \textit{ok}, a \! \mid \! a \!\!\in \!\! \mathcal {A}_{[\! [D ]\! ]} \!\} \\ \mathcal {A}_{[\! [B_1^{\langle 1 \rangle } \rightarrow \ldots \rightarrow B_k^{\langle k \rangle } \rightarrow B ]\! ]}^{gu} = \displaystyle {\sum _{1 \le i \le k}} \mathcal {A}_{[\! [B_i ]\! ]}^{gu \, \langle i \rangle } + \mathcal {A}_{[\! [B ]\! ]}^{gu} \end{array} $$

where \(\mathcal {A}_{[\! [\mathsf {int} ]\! ]} = \mathbb {Z}\), \(\mathcal {A}_{[\! [\mathsf {bool} ]\! ]} = \{ tt, ff \}\), and \(+\) denotes a disjoint union of alphabets. Function types are tagged by a superscript \(\langle i \rangle \) to keep record from which type, i.e. which component of the disjoint union, each move comes from. The letters in the alphabet \(\mathcal A_{[\! [T ]\! ]}\) represent the moves (observable actions) that a term of type T can perform. Each of moves is either a question (a demand for information) or an answer (a supply of information). For expressions in \(\mathcal {A}_{[\! [\mathsf {exp}D ]\! ]}\), there is a question move \(\textit{q}\) to ask for the value of the expression, and values from \(\mathcal {A}_{[\! [D ]\! ]}\) to answer the question. For commands, there is a question move \(\textit{run}\) to initiate a command, and an answer move \(\textit{done}\) to signal successful termination of a command. For variables, there are question moves for writing to the variable, \(\textit{write}(a)\), which are acknowledged by the answer move \(\textit{ok}\); and there is a question move \(\textit{read}\) for reading from the variable, which is answered by a value from \(\mathcal {A}_{[\! [D ]\! ]}\).

For any (\(\beta \)-normal) term, we define a (symbolic) regular-language which represents its game semantics, i.e. its set of complete plays. Every complete play represents the observable effects of a completed computation of the given term. It is given as a guarded word \([ b,w \rangle \), where b is also called play condition. Assumptions about a play (computation) to be feasible are recorded in its play condition. For infeasible plays, the play condition is inconsistent (unsatisfiable), thus no assignment of concrete values to symbolic names exists that makes the play condition true. If the play condition is inconsistent, this play is discarded from the final model of the corresponding term. The regular expression for \(\varGamma \vdash M : T\), denoted as \([\! [\varGamma \vdash M : T ]\! ]\), is defined over the guarded alphabet: \(\mathcal A_{[\! [\varGamma \vdash T ]\! ]}^{gu} = \big ( \sum _{x : T' \in \varGamma } \mathcal {A}_{[\! [T' ]\! ]}^{gu \, \langle x \rangle } \big ) + \mathcal {A}_{[\! [T ]\! ]}^{gu} \), where moves corresponding to types of free identifiers are tagged with their names.

The representation of constants is standard:

$$ [\! [\varGamma \! \vdash \! v \! : \! \mathsf {exp}D ]\! ] \! = \! \textit{q}\cdot v \qquad [\! [\varGamma \! \vdash \! \mathsf {skip} \! : \! \mathsf {com} ]\! ] \! = \! \textit{run}\cdot \textit{done}\qquad [\! [\varGamma \! \vdash \! \mathsf {diverge} \! : \! \mathsf {com} ]\! ] \! = \! \emptyset $$

Free identifiers are represented by the so-called copy-cat regular expressions, which contain all possible behaviours of terms of that type. For example:

$$\begin{aligned}&[\! [\varGamma , x:\mathsf {exp}D_1^{\langle x,1 \rangle } \!\rightarrow \! \ldots \mathsf {exp}D_k^{\langle x,k \rangle } \!\rightarrow \mathsf {exp}D^{\langle x \rangle } \! \vdash \! x : \mathsf {exp}D_1^{\langle 1 \rangle } \!\rightarrow \! \ldots \mathsf {exp}D_k^{\langle k \rangle } \!\rightarrow \mathsf {exp}D ]\! ] \\&\qquad \qquad \qquad \qquad \qquad = \textit{q}\cdot \textit{q}^{\langle x \rangle } \cdot \big ( \sum _{1 \le i \le k} \textit{q}^{\langle x,i \rangle } \cdot \textit{q}^{\langle i \rangle } \cdot ?Z^{\langle i \rangle } \cdot Z^{\langle x,i \rangle } \big )^* \cdot ?X^{\langle x \rangle } \cdot X \end{aligned}$$

When a call-by-name non-local function x is called, it may evaluate any of its arguments, zero or more times, in an arbitrary order and then it returns any allowable answer from its result type. Note that whenever an input symbol ?X (\(\mathsf {let} \, X=new(W) \, \mathsf {in} \, X \, \ldots \)) is met in a play, the mechanism for fresh symbol generation is used to instantiate it with a new fresh symbolic name, which binds all occurrences of X that follow in the play until a new ?X is met which overrides the previous one. For example, consider a non-local function \(f : \mathsf {expint}^{\langle 1 \rangle } \rightarrow \mathsf {expint}\). Its symbolic model is: \(\textit{q}\cdot \textit{q}^{\langle f \rangle } \cdot \big ( \textit{q}^{\langle f,1 \rangle } \cdot \textit{q}^{\langle 1 \rangle } \cdot ?Z^{\langle 1 \rangle } \cdot Z^{\langle f,1 \rangle } \big )^* \cdot ?X^{\langle f \rangle } \cdot X\). The play corresponding to f which evaluates its argument two times after instantiating its input symbols is given as: \(\textit{q}\cdot \textit{q}^{\langle f \rangle } \cdot \textit{q}^{\langle f,1 \rangle } \cdot \textit{q}^{\langle 1 \rangle } \cdot Z_1^{\langle 1 \rangle } \cdot Z_1^{\langle f,1 \rangle } \cdot \textit{q}^{\langle f,1 \rangle } \cdot \textit{q}^{\langle 1 \rangle } \cdot Z_2^{\langle 1 \rangle } \cdot Z_2^{\langle f,1 \rangle } \cdot X^{\langle f \rangle } \cdot X\), where \(Z_1\) and \(Z_2\) are two different symbolic names used to denote values of the argument when it is evaluated the first and the second time, respectively.

Table 2. Symbolic representations of some language constructs

The representations of some language constructs are given in Table 2. Note that letter conditions different than tt occur only in plays corresponding to “\(\mathsf {if}\)” and “\(\mathsf {while}\)” constructs. In the case of “\(\mathsf {if}\)” command, when the value of the first argument given by the symbol Z is true then its second argument is run, otherwise if \(\lnot Z\) is true then its third argument is run. A composite term \(\mathsf {c}(M_1, \ldots , M_k)\) built out of a language construct “\(\mathsf {c}\)” and subterms \(M_1, \ldots , M_k\) is interpreted by composing the regular expressions for \(M_1, \ldots , M_k\) and the regular expression for “\(\mathsf {c}\)”. Composition of regular expressions () is defined as “parallel composition plus hiding in CSP” [1]. Conditions of the shared (interacting) guarded letters in the composition are conjoined, along with the condition that their symbolic letters are equal [9]. The \(\mathsf {cell}_{v}^{\langle x \rangle }\) regular expression in Table 2 is used to impose the good variable behaviour on a local variable x introduced using \(\mathsf {new}_D \, x\!:=\!v \, \mathsf {in} \, M\). Note that v is the initial value of x, and X is a symbol used to track the current value of x. The \(\mathsf {cell}_{v}^{\langle x \rangle }\) plays the most recently written value in x in response to \(\textit{read}\), or if no value has been written yet then answers \(\textit{read}\) with the initial value v. The model \([\! [\mathsf {new}_D \, x\!:=\!v \, \mathsf {in} \, M ]\! ]\) is obtained by constraining the model of M, \([\! [\mathsf {new}_D \, x \vdash M ]\! ]\), only to those plays where x exhibits good variable behaviour described by \(\mathsf {cell}_{v}^{\langle x \rangle }\), and then by deleting (hiding) all moves associated with x since x is a local variable and so not visible outside of the term [9].

The following formal results are proved in [9]. We define an effective alphabet of a regular expression to be the set of all letters that appear in the language denoted by that regular expression. The effective alphabet of a regular expression representing any term \(\varGamma \vdash M:T\) contains only a finite subset of letters from \(\mathcal A_{[\! [\varGamma \vdash T ]\! ]}^{gu}\), which includes all constants, symbols, and expressions used for interpreting free identifiers, language constructs, and local variables in M.

Theorem 1

For any IA\(_2\) term, the set \(\mathcal {L} [\! [\varGamma \vdash M : T ]\! ]\) is a symbolic regular-language over its effective (finite) alphabet. Moreover, a finite-state symbolic automata \(\mathcal {A} [\! [\varGamma \vdash M : T ]\! ]\) which recognizes it is effectively constructible.

Suppose that there is a special free identifier \(\mathsf {abort}\) of type \(\mathsf {com}\). We say that a term \(\varGamma \vdash M\) is safe iff ; otherwise we say that a term is unsafe. Hence, a safe term has no computation that leads to running \(\mathsf {abort}\). Let \(\mathcal {L} [\! [\varGamma \vdash M:T ]\! ]^{CR}\) denotes the (standard) regular-language representation of game semantics for a term M obtained as in [12], where concrete values are used. Since this representation is fully abstract, and so there is a close correspondence with the operational semantics, the following holds [7].

Proposition 1

A term \(\varGamma \vdash M:T\) is safe iff \(\mathcal {L} [\! [\varGamma \vdash M:T ]\! ]^{CR}\) does not contain any play with moves from \(\mathcal A_{[\! [\mathsf {com} ]\! ]}^{\langle \textit{abort} \rangle }\), which we call unsafe plays.

The following result [9] confirms that symbolic automata (models) can be used for establishing safety of terms.

Theorem 2

\(\mathcal {L} [\! [\varGamma \!\vdash \! M\!:\!T ]\! ]\) is safe (all plays are safe) iff \(\mathcal {L} [\! [\varGamma \!\vdash \! M\!:\!T ]\! ]^{CR}\) is safe.

For example, \([\! [\mathsf {abort:com^{abort}} \vdash \mathsf {skip \, ; abort:com } ]\! ] = \textit{run}\, \cdot \, \textit{run}^{abort} \, \cdot \, \textit{done}^{abort} \, \cdot \, \textit{done}\), so this term is unsafe.

Since symbolic automata are finite state, we can use model-checking techniques to verify safety of IA\(_2\) terms with integers. The verification procedure proposed in [9] searches for unsafe plays in the symbolic automata representing a term. If an unsafe play is found, it calls an external SMT solver (Yices) to check consistency (satisfiability) of its play condition. If the condition is consistent, then a concrete counterexample is reported. We showed in [9] that the procedure is correct and semi-terminating (terminates for unsafe terms, but may diverge for safe terms) under assumption that constraints generated by any program can be checked for consistency by some (SMT) solver.

Example 1

Consider the term family M from Table 1 in Sect. 1. The symbolic model for the term \(A \!\wedge \! B\) is given in Fig. 2a. The term asks for a value of the non-local expression n with the move \(\textit{q}^n\) two times, and the environment provides as answers symbols N and M. When the difference \(N-M\) is 1, then abort command is run. The symbolic model in Fig. 2a contains one unsafe play:\([ ?X\!=\!0,\textit{run} \rangle \cdot \textit{q}^n \cdot ?N^n \cdot [ ?X\!=\!X\!+\!N, \textit{q}^n \rangle \cdot ?M^n \cdot [ ?X\!=\!X\!-\!M \!\wedge \! X\!=\!1, \textit{run}^{abort} \rangle \cdot \textit{done}^{abort} \cdot \textit{done}\), which after instantiating its input symbols with fresh names becomes: \([ X_1\!=\!0,\textit{run} \rangle \cdot \textit{q}^n \cdot N^n \cdot [ X_2\!=\!X_1\!+\!N, \textit{q}^n \rangle \cdot M^n \cdot [ X_3\!=\!X_2\!-\!M \!\wedge \! X_3\!=\!1, \textit{run}^{abort} \rangle \cdot \textit{done}^{abort} \cdot \textit{done}\). An SMT solver will inform us that its play condition (\(X_1\!=\!0 \!\wedge \! X_2\!=\!X_1\!+\!N \!\wedge \! X_3\!=\!X_2\!-\!M \!\wedge \! X_3\!=\!1\)) is satisfiable, yielding a possible assignment of concrete values to symbols: \(X_1=0, N=1, X_2=1, M=0\), and \(X_3=1\). Thus, the corresponding concrete counterexample will be: \(\textit{run}\cdot \textit{q}^n \cdot 1^n \cdot \textit{q}^n \cdot 0^n \cdot \textit{run}^{abort} \cdot \textit{done}^{abort} \cdot \textit{done}\). Similarly, concrete counterexamples for terms \(A \!\wedge \! \lnot B\) and \(\lnot A \!\wedge \! B\) can be generated; and it can be verified that the term \(\lnot A \!\wedge \! \lnot B\) is safe.    \(\square \)

3.2 Symbolic Models of \(\overline{\text {IA}}_2\)

We extend the definition of guarded alphabet \(\mathcal {A}^{gu}\) as the set of triples of feature expressions, boolean conditions and symbolic letters:

$$ \mathcal {A}^{gu+f} = \{ [ \phi , b, \alpha \rangle \mid \phi \in FeatExp(\mathbb {F}), b \in BExp, \alpha \in \mathcal {A}^{sym} \} $$

Thus, a guarded letter \([ \phi , b, \alpha \rangle \) means that \(\alpha \) is triggered only if b evaluates to true in valid configurations \(k \in \mathbb {K}\) that satisfy \(\phi \). That is, every letter is labelled with a feature expression that defines products able to perform the letter. As before, we write only \(\alpha \) for \([ tt, tt, \alpha \rangle \). A word \([ \phi _1, b_1, \alpha _1 \rangle \cdot [ \phi _2, b_2, \alpha _2 \rangle \ldots [ \phi _n, b_n, \alpha _n \rangle \) over \(\mathcal {A}^{gu+f}\) can be written as a triple \([ \phi _1 \wedge \ldots \wedge \phi _n, b_1 \wedge \ldots \wedge b_n, \alpha _1 \cdot \ldots \cdot \alpha _n \rangle \). Its meaning is that the word \(\alpha _1 \cdot \ldots \cdot \alpha _n\) is feasible only if the condition \(b_1 \wedge \ldots \wedge b_n\) is satisfiable, and only for valid configurations that satisfy \(\phi _1 \wedge \ldots \wedge \phi _n\). Regular languages and automata defined over \(\mathcal {A}^{gu+f}\) are called featured symbolic.

We can straightforwardly extend the symbolic representation of all IA\(_2\) terms in the new setting by extending all guarded letters with the value tt for the first (feature expression) component. Now, we are ready to give representation of the compile-time conditional term:

where is the composition operator, and the interpretation of the compile-time conditional construct parameterized by the feature expressions \(\phi \) is:

$$\begin{aligned}&[\! [\texttt {\#if}: \mathsf {com}^{\langle 1 \rangle } \times \mathsf {com}^{\langle 2 \rangle } \rightarrow \mathsf {com} ]\! ]^{\phi } = \\&\qquad \qquad \qquad \textit{run}\cdot \big ( [ \phi , tt, \textit{run}^{\langle 1 \rangle } \rangle \cdot \textit{done}^{\langle 1 \rangle } + [ \lnot \phi , tt, \textit{run}^{\langle 2 \rangle } \rangle \cdot \textit{done}^{\langle 2 \rangle } \big ) \cdot \textit{done}\end{aligned}$$

That is, the first argument of #if is run for those configurations that satisfy \(\phi \), whereas the second argument of #if is run for configurations satisfying \(\lnot \phi \).

Again, the effective alphabet of \([\! [\varGamma \vdash _{\mathbb {F}} M:T ]\! ]\) for any \(\overline{\text {IA}_2}\) term is a finite subset of \(\mathcal {A}^{gu+f}_{[\! [\varGamma \vdash _{\mathbb {F}} T ]\! ]}\). Hence, the automata corresponding to \([\! [\varGamma \vdash _{\mathbb {F}} M:T ]\! ]\) is effectively constructible, and we call it featured symbolic automata (FSA). Basically, an FSA is a SA augmented with transitions labelled (guarded) with feature expressions. We denote it as \(\mathcal {FSA} [\! [\varGamma \vdash _{\mathbb {F}} M : T ]\! ] = (Q, i, \delta , F)\), where Q is a set of states, i is the initial state, \(\delta \) is a transition function, and \(F \subseteq Q\) is the set of final states. The purpose of an FSA is to model behaviours (computations) of the entire program family and link each computation to the exact set of products able to execute it. From an FSA, we can obtain the model of one particular product through projection. This transformation is entirely syntactical and consists in removing all transitions (moves) linked to feature expressions that are not satisfied by a configuration \(k \in \mathbb {K}\).

Definition 1

The projection of \(\mathcal {FSA} [\! [\varGamma \vdash _{\mathbb {F}} M : T ]\! ] = (Q, i, \delta , F)\) to a configuration \(k \in \mathbb {K}\), denoted as \(\mathcal {FSA} [\! [\varGamma \vdash _{\mathbb {F}} M : T ]\! ] \mid _k\), is the symbolic automaton \(A=(Q, i, \delta ', F)\), where \(\delta ' = \{ (q_1, [ b,a \rangle , q_2) \mid (q_1, [ \phi ,b,a \rangle , q_2) \in \delta \wedge k \,\models \,\phi \}\).

Theorem 3

(Correctness). \(\mathcal {FSA} [\! [\varGamma \vdash _{\mathbb {F}} M : T ]\! ] \mid _k = \mathcal {A} [\! [\varGamma \vdash \pi _k{(M)} : T ]\! ]\).

Example 2

Consider the term family M from Introduction. Its FSA is given in Fig. 3. Letters represent triples, where the first component indicates which valid configurations can enable the corresponding move. For example, we can see that the unsafe play obtained after instantiating its input symbols with fresh names: \([ tt,X_1=0,\textit{run} \rangle \cdot [ A,tt,\textit{q}^n \rangle \cdot N^n \cdot [ B,X_2=X_1\!+\!N, \textit{q}^n \rangle \cdot M^n \cdot [ tt,X_3=X_2\!-\!M \!\wedge \! X_3=1, \textit{run}^{abort} \rangle \cdot \textit{done}^{abort} \cdot \textit{done}\), is feasible when \(N=1\) and \(M = 0\) for the configuration \(A \!\wedge \! B\).    \(\square \)

4 Model Checking Algorithms

The general model checking problem for program families consists in determining which products in the family are safe, and which are not safe. The goal is to report all products that are not safe and to provide a counterexample for each.

A straightforward but rather naive algorithm to solve the above problem is to check all valid programs individually. That is, compute the projection of each valid configuration, generate its model, and verify it using standard algorithms. This is so-called brute force approach, and it is rather inefficient. Indeed, all programs (exponentially many in the worst case) will be explored in spite of their great similarity. We now propose an alternative algorithm, which explores the set of reachable states in the FSA of a program family rather than the individual models of all its products. We aim to take advantage of the compact structure of FSAs in order to solve the above general model checking problem.

A model checker is meant to perform a search in the state space of the FSA \((Q, i, \delta , F)\) and to indicate safe and unsafe products. This boils down to checking if an ‘unsafe’ state \(q' \in Q\) with \(q \mathop {\longrightarrow }\limits ^{[ \phi ,b,\textit{run}^{\textit{abort}} \rangle } q'\) is reachable in the FSA. This can be accomplished with a Breadth-First Search (BFS) in the FSA that encounters all states that are reachable from the initial state and checks whether one of them is ‘unsafe’. In this way, the BFS finds the shortest unsafe play for any product. The algorithm is shown in Fig. 4. It maintains a reachability relation \(R \subseteq Q \times \mathcal P (\mathbb {K})\) that stores a set of pairs (qpx) where q is marked as a visited state for the valid products from \(px \subseteq \mathbb {K}\); a queue Queue that keeps track of all states that still have to be visited (explored), and a set of counterexamples unsafe. R is first initialized by the initial state \(i \in Q\) that is reachable for all valid products, i.e. \((i,\mathbb {K}) \in R\). For \((q,px) \in R\), we write R(q) for the set of products px. Queue supports the operations: remove which returns and deletes the first element of Queue, and put which inserts a new element at the back of Queue. In Queue along with each state q, we store a trace, trace(q), that shows how q is reached from the initial state i. For each visited state, it is checked whether that state is unsafe (line 6). Each time an unsafe state q is reached, the pair \((e,px')=complete(q,px,trace(q))\) is added to unsafe where: ‘e’ is a complete counterexample generated by looking at the trace kept on Queue along with q and by finding the shortest trace from q to an accepting state (by performing an embedded BFS); and \(px'\) is the corresponding set of unsafe products. Note that e represents the shortest unsafe play for the products in \(px'\). At each iteration, the BFS calculates the set new of unvisited successors of the current state, filtering out states and products that are already visited in R. Assuming that we have a transition \(q \mathop {\longrightarrow }\limits ^{[ \phi ,b,\alpha \rangle } q'\) and the source state q is reachable by products in px, the target state \(q'\) is reachable for products in \(\{ k \in px \mid k \,\models \,\phi \}\). Given an FSA as input, the algorithm in Fig. 4 calculates all reachable states from the initial state i. When the search finishes and unsafe is empty, the algorithm returns true; otherwise it returns false and the set unsafe.

Fig. 4.
figure 4

Model checking algorithm for verifying safety based on specialized BFS

After having found an unsafe state, our algorithm will continue exploration until the entire model is explored. Since the aim is to identify violating products, it can ignore products that are already known to violate, \(\mathbb {K}_{unsafe}\), that is \(\cup _{(e,px) \in unsafe} px\). In the BFS, this can be achieved by filtering out states with products \(px \subseteq \mathbb {K}_{unsafe}\) as part of the calculation of new. This can only eliminate newly discovered states, not those that are already on the queue. States on the queue can be filtered out by removing elements (qpx) for which \(px \subseteq \mathbb {K}_{unsafe}\) (line 9).

Compared to the standard BFS, where visited states are marked with Boolean visited flags and no state is visited twice, in our algorithm visited states are marked with sets of products (for which those states are visited) and a state can be visited multiple times. This is due to the fact that when the BFS arrives at a state s for the second time, such that \(R(q)=px\), \((q,px') \in new\), and \(px' \nsubseteq px\), then s although already visited, has to be re-explored since transitions that were disallowed for px during the first visit of q might be now allowed for \(px'\).

The complete verification procedure for checking safety of term families is described in Fig. 5. In each iteration, it calls the BFS from Fig. 4 and finds some safe products and (unsafe) products for which a genuine (consistent) counterexample is reported. To prevent the model checker to consider these products (for which conclusive results are previously found), the BFS in the next iteration is called with updated arguments, i.e. only for configurations with no conclusive results. We first show that the projection \(\pi _k\) commutes with our “lifted” verification procedure, which is applied directly on the level of program families.

Fig. 5.
figure 5

Verification procedure (VP)

Theorem 4

(Correctness). \(\varGamma \vdash \pi _k(M)\) is safe iff \(\mathcal {FSA} [\! [\varGamma \vdash _{\mathbb {F}} M ]\! ] \mid _k\) is safe.

As a corollary of Theorems 234 we obtain that the VP in Fig. 5 returns correct answers for all products. Moreover, it terminates for all unsafe products by generating the corresponding unsafe plays. The VP will find the shortest consistent unsafe play t for each unsafe product after finite number of calls to the BFS, that will first find all inconsistent unsafe plays shorter than t (which are finitely many [9]). However, the VP may diverge for safe products, producing in each next iteration longer and longer unsafe plays with inconsistent conditions.

5 Implementation

We have extended the prototype tool developed in [9] to implement the VP in Fig. 5. That tool [9] converts any single (IA\(_2\)) term into a symbolic automata representing its game semantics, and then explores the automata for unsafe plays. The extended tool takes as input a term family, and generates the corresponding FSA, which is then explored based on the procedure described in Fig. 5. The tool is implemented in Java along with its own library for working with featured symbolic automata. The tool calls an external SMT solver Yices to determine consistency of play conditions. We now illustrate our tool with an example. The tool, further examples and detailed reports how they execute are available from: http://www.itu.dk/~adim/symbolicgc.htm. Consider the following version of the linear search algorithm, Linear \(_3\).

figure b

The term family contains three features A, B, and C, and hence 8 products can be produced. In the above, first depending on which features are enabled some value from \(-1\) to 3 is assigned to j, and then the input expression y is copied into the local variable p. The non-local array x is searched for an occurrence of the value stored in p. If the search succeeds j-times (for \(j>0\)), \(\mathsf {abort}\) is executed.

The arrays are implemented in the symbolic representation by using a special symbol (e.g., k with an initial constraint \(k>0\)) to represent the length of an array. A new symbol (e.g., I) is also used to represent the index of the array element that needs to be de-referenced or assigned to (see [9] for details). If we also want to check for array-out-of-bounds errors, we can include in the representation of arrays plays that perform \(\mathsf {abort}\) moves when \(I \ge k\).

If the value read from the environment for y occurs j-times (for \(j>0\)) in the array x, then an unsafe behaviour is found in the FSA of the above term family. Hence, all products for which the value assigned to j is less than 1 are safe: \(\lnot A \!\wedge \! \lnot B \!\wedge \! \lnot C\), \(\lnot A \!\wedge \! B \!\wedge \! \lnot C\), and \(A \!\wedge \! B \!\wedge \! \lnot C\). All other products are unsafe. For example, for products \(A \!\wedge \! B \!\wedge \! C\) and \(\lnot A \!\wedge \! \lnot B \!\wedge \! C\) (for which j is set to 2) the tool reports a counterexample that corresponds to a term with an array of size \(k=2\), where the values read from the environment for x[0], x[1], and y are the same, i.e. the following counterexample is generated: \(\textit{run}\cdot \textit{q}^y \cdot 0^y \cdot \textit{read}^{x[0]} \cdot 0^{x[0]} \cdot \textit{read}^{x[1]} \cdot 0^{x[1]} \cdot \textit{run}^{abort} \cdot \textit{done}^{abort} \cdot \textit{done}\). This counterexample is obtained after 2 iterations of the VP, and it corresponds to a computation which runs the body of ‘\(\mathsf {while}\)’ two times. In the first iteration, an inconsistent unsafe play is found (its condition contains \(J=2 \!\wedge \! J=1\), where the symbol J tracks the current value of j). A consistent counterexample is obtained in the first iteration for products \(A \!\wedge \! \lnot B \!\wedge \! \lnot C\) and \(\lnot A \!\wedge \! B \!\wedge \! C\) (for which j is 1), whereas for \(A \!\wedge \! \lnot B \!\wedge \! C\) (j is assigned to 3) in the third iteration. The tool diverges for safe terms, producing longer and longer (inconsistent) unsafe plays in each next iteration.

Fig. 6.
figure 6

Performance comparison for verifying program families.

We ran our tool on a 64-bit Intel\(^\circledR \)Core\(^{TM}\) i5 CPU and 8 GB memory. All times are reported as averages over five independent executions. For our experiments, we use four families: Intro is the family from Table 1 in Sect. 1; Linear \(_3\) is the above family for linear search with three features; Linear \(_4\) is an extended version of Linear \(_3\) with one more feature D and command: \(\texttt {\#if} \, (D) \, \mathsf {then} \, j:=j+3\); and Linear \(_5\) is Linear \(_4\) extended with one additional feature E and command: \(\texttt {\#if} \, (E) \, \mathsf {then} \, j:=j-2\). We restrict our tool to work with bounded number of iterations (10 in this case) since the VP loops for safe terms. For Linear\(_4\) the tool reports 13 unsafe products with corresponding counterexamples, whereas for Linear\(_5\) 21 unsafe products are found. Figure 6 compares the effect (in terms of Time, the number of states in the maximal model generated during analysis Max, and the number of states in the final model Final) of verifying benchmarks using our family-based approach vs. using brute-force approach. In the latter case, we first compute all products, generate their models, and verify them one by one by using the tool for single programs [9]. In this case we report the sum of number of states for the corresponding models in all individual products. We can see that the family-based approach is between 2.3 and 3.6 times faster (using considerably less space) than the brute-force. We expect even bigger efficiency gains for families with higher number of products.

6 Related Work and Conclusion

Recently, many so-called lifted techniques have been proposed, which lift existing single-program analysis techniques to work on the level of program families (see [18] for a survey). This includes lifted type checking [3], lifted model checking [4], lifted data-flow analysis [11], etc. Classen et al. have proposed featured transition systems (FTSs) in [4] as the foundation for behavioural specification and verification of variational systems. An FTS, which is feature-aware extension of the standard transition systems, represents the behaviour of all instances of a variational system. They show how the family-based model checking algorithms for verifying FTSs against fLTL properties are implemented in the SNIP model checker. The input language to SNIP is fPromela, which is a feature-aware extension of Promela. In this work, we also propose special family-based model checking algorithms. However, they are not applied on models of variational systems, but on game semantics models extracted from concrete program fragments with #ifdef-s.

The first application of game semantics to model checking was proposed by Ghica and McCusker in [12]. They show how game semantics of IA\(_2\) with finite data-types can be represented in a remarkably simple form by regular-languages. Subsequently, several algorithms have been proposed for model checking IA\(_2\) with infinite data types [7, 9]. The automata-theoretic representation of game semantics have been also extended to programs with various features: concurrency [13], third-order functions [16], probabilistic constructs [15], nondeterminism [6], etc.

To conclude, in this work we introduce the featured symbolic automata (FSA), a formalism designed to describe the combined game semantics models of a whole program family. A specifically designed model checking technique allows us to verify safety of an FSA. The proposed approach can be extended to support multi-features and numeric features. So-called variability abstractions [10, 11] can also be used to define abstract family-based model checking.