A framework for assume-guarantee regression verification of evolving software
Introduction
In the last three decades, component-based software engineering (CBSE) has emerged as one of the important approaches in software engineering. This approach has shown a number of advantages such as increasing effectiveness and efficiency, lowering cost, shortening product time-to-market, improving maintainability [52]. As a result, component-based software (CBS) quality assurance plays a critical role in software production life cycles due to the increasing demand for high-quality products. Due to the high-quality standard test procedure in software industry, the verification process in CBSs ensures that certain properties are not violated at all times.
There are two approaches to the verification of modern software: theorem proving which is semi-automatic, requires the interaction of domain experts [21], [20], [30], [37], [38], [51], and costs a lot of effort [5]; model checking which is automatic and does not require the interaction of domain experts [7], [18]. Although the model checking has gained considerable attention due to its fully automatic characteristic, the approach suffers from the problem of state space explosion [15], [18], [48], [16]. The assume-guarantee framework [17], [19], [24], [46], which performs modular verification of CBS, has been considered a promising solution for dealing with the state space explosion problem during model checking. The framework uses the “divide-and-conquer” strategy to verify whether a given system satisfies a predefined property. Therefore, it can potentially be applied to large-scale systems in practice. The key problem of the framework is to generate assumptions that satisfy the assume-guarantee rules [19], [29], [33]. If such an assumption exists, the given system satisfies the required property. Although the framework can be applied to large-scale systems effectively, it does not consider the system under check in the context of software evolution.
Modern software applications are continually evolving, and any verification has to be revisited repeatedly. A reduction in the cost of this repeated verification would offer significant benefits for industry: improving the quality of software through application of verification techniques in situations where this is currently infeasible. Progress has been made using approaches such as labeled transition systems [12], [19], [31], [33], [34], [35], implicit representation of transition systems [13], [27], timed transition systems [3], [28], [40], [41], [42]. The following two solutions have been used in reducing the verification costs for evolving software.
The first solution is to generate a new assumption each time software evolves at a lower cost. For software modeled by exploiting labeled transition systems, assumptions with small sizes (i.e., assumptions with small numbers of states) can be used effectively to recheck modified software leading to reduced verification cost. In a series of papers, Hung et al. proposed a method to generate minimized assumptions for CBS verification [31], [34], [35] and a framework to perform modular verification of evolving CBS [33]. However, the cost for generating minimal assumptions can be high [34]. The reason is that the investigated assumption generation problem [19], [33], [34], [35], [31] is formulated as an automata learning problem using the algorithm [4]. As a result, it is difficult to apply this approach to large-scale systems. On the other hand, for the faster assumption generation speed, another verification method, which uses CDNF (Conjunction of Disjunctive Normal Form) algorithm [10] and implicit representation of software, was proposed in 2010 by Chen et al. [13]. Later, in 2016, this method was improved by He et al. and applied in CBS regression verification [26] by introducing a fine–grained learning technique. However, with modified software, some of the subpredicates of the new version of components can be different, which requires the regression verification progress to regenerate the assumptions for every small change in the software component.
The second solution to reduce the verification cost for modified software is to increase assumption reuse as much as possible. This is because the software development cycle involves daily change. Therefore, the less time required to regenerate assumptions, the greater the cost savings when verifying modified software. Moreover, from the analysis in Section 5 below, weak assumptions (i.e., assumptions with large languages) can help to achieve this purpose and play a key role in the verification of modified software. On the other hand, to our knowledge, no research has been conducted on generating assumptions that have the weakest languages and use implicit specification. As a result, this research focuses on improving the learning algorithm proposed by Chen [13] to generate local weakest assumptions that can be used more efficiently to reduce the cost of software regression verification during software evolution.
To achieve the above goal, we first improve the technique to answer membership queries for the two ι (i.e., the initial predicate) and τ (i.e., transition relation) CDNF learning instances. Based on this improved answering technique, we can generate weaker assumptions than those generated by the algorithm proposed by Chen et al. [13] (hereafter, we refer to as CBAG algorithm) using a proposed backtracking learning algorithm (referred to as LWAG algorithm). This leads to an important result in the context of software evolution: LWAG algorithm can reduce the number of times assumptions must be regenerated when verifying modified software. The improved answering technique and LWAG algorithm are integrated into a framework to effectively reduce the number of times assumption regeneration is required for evolving software.
Using assumption generation algorithms which employ the implicit representation, we can not only benefit from the fast learning process but we can also obtain several advantages of implicit software representation over explicit representation. First, the contextual assumptions represented implicitly using Boolean functions have fewer states than do assumptions modeled using deterministic finite automata because implicit representations are equivalent to nondeterministic finite automata, which are exponentially more succinct than deterministic ones. As a result, our generated assumptions can have an exponentially smaller number of states than do assumptions generated from explicit representations. The second advantage is the scalability of the verification method using implicit representations, which occurs because the algorithm requires a polynomial number of queries in the number of states of the target finite automaton [4], [49]. In contrast, the CDNF algorithm requires a polynomial number of queries in the number of Boolean variables of the target Boolean function [10]. Because implicit assumptions can be exponentially more succinct than explicit ones, the learning algorithms for implicit assumptions can be exponentially better than automata-theoretic ones.
To our knowledge, the first paper that proposed using the algorithm to learn assumptions for the assume-guarantee reasoning algorithm was Cobleigh et al. [19]. Following this paper, several studies improved the method, including adoption of the assume-guarantee rules [6], [26], [39], [45], symbolic implementation for assume-guarantee rules [8], [9], [45], several improvements proposed in [1], [2], [12], [14], [25], [50], [53], and an extension to support liveness properties [22]. However, these papers all use the algorithm to learn an automaton as the required contextual assumption. Hence, they all have the same disadvantages as described above compared to the algorithm proposed in Chen's paper [13]. Hence, we based our paper on Chen's algorithm [13] to verify modified software.
The remainder of this paper is organized as follows. Section 2 presents the background for this paper. We review CBAG algorithm for generating assumptions using the CDNF algorithm in Section 3, followed by the proposed algorithms to improve the answers to membership queries and generate assumptions in Section 4. Section 5 presents a framework for verifying modified CBSs using assumptions generated by the proposed learning algorithm. Section 6 shows the preliminary experimental results. Related papers are presented in Section 7. Finally, we conclude the paper in Section 8.
Section snippets
Background
In this section, we present some basic concepts used in this paper. We use to denote the Boolean domain, which is a set that consists of exactly two elements whose interpretations are T (true) and F (false) (i.e., ). Given a set of Boolean variables X, we call the size of X, where is the number of variables inside X.
Let X be a finite set of Boolean variables. Consider a function over X, which is a function from to the Boolean domain , is called a
The CDNF algorithm
Let X be a fixed set of Boolean variables and be a Boolean function over X. CDNF is an incremental learning algorithm that can learn the exact representation of in a finite number of steps [10]. Sharing the same ideas as the algorithm [4], CDNF is based on a (which knows ) when performing the learning process. The must be able to answer the following two types of queries:
- •
: Given a valuation v over X, if , the returns yes
An improved technique for answering membership queries
As shown above in Section 3.1, in CDNF algorithm, the generated Boolean function depends on how the answers membership queries and whether yes or no (i.e., or , respectively) are returned to the . As a result, to improve the CDNF–based assumption generation method, we first need to focus on improving the technique by which of the answers the .
After analyzing OMQ algorithm together with Table 2, we observe that the answering technique in this algorithm
A framework for modular verification of evolving CBS
In practice, when software verification cost increases daily because of software evolution which can happen all time during software life cycle, more reusable assumptions, such as weak assumptions, play an important role in reducing verification cost by being used in the framework presented in this section. The empirical results shown in Section 6 clearly indicates the effectiveness of using weak assumptions when rechecking evolved software.
Consider a CBS M that contains two components and
Experiments
To evaluate the effectiveness of LWAG algorithm, experiments are performed to highlight two key points: (i) a comparison between CBAG algorithm and LWAG algorithm and their corresponding generated assumptions; and (ii) a comparison of the framework in Section 5.1 between the cases using the assumptions generated by CBAG algorithm and LWAG algorithm after the software has been modified. Algorithms presented in Section 3 and Section 4 are implemented in and Microsoft Visual Studio
Related works
Several existing papers on evolving software verification are relevant to our research [11], [13], [23], [27], [31], [32], [33], [34], [35], [44].
In 2010, Chen et al. proposed a purely implicit solution to the contextual assumption generation problem in assume-guarantee reasoning [13]. However, this paper did not consider the case in which the software component has been modified. Instead, when a component has been modified, the assumption–generation method must be executed again from the
Conclusion
In this paper, we presented an effective framework for rechecking evolving software using LWAG algorithm with an improved technique for answering membership queries during the assumption learning process. Although LWAG algorithm has a greater time complexity than does CBAG algorithm, it can generate local weakest assumptions to reduce the number of assumption regenerations when rechecking evolving software. An implemented tool and experimental results are also presented that allows comparing
Acknowledgements
This work is supported by the Vietnam's National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.03-2015.25.
References (54)
- et al.
Event-clock automata: a determinizable class of timed automata
Theor. Comput. Sci.
(January 1999) Learning regular sets from queries and counterexamples
Inf. Comput.
(November 1987)Exact learning Boolean functions via the monotone theory
Inf. Comput.
(1995)- et al.
Temporal proof methodologies for timed transition-systems
Inf. Comput.
(August 1994) - et al.
Twenty-eight years of component-based software engineering
J. Syst. Softw.
(January 2016) - et al.
Formal component-based modeling and synthesis for plc systems
Comput. Ind.
(October 2013) - et al.
Automated circular assume-guarantee reasoning with n-way decomposition and alphabet refinement
- et al.
Automated circular assume-guarantee reasoning
Form. Asp. Comput.
(September 2018) - et al.
Developing user strategies in pvs: a tutorial
- et al.
Proof rules for automated compositional verification through learning