An approach to solving non-linear real constraints for symbolic execution

https://doi.org/10.1016/j.jss.2019.07.045Get rights and content

Highlights

  • A new approach to symbolic execution for test data generation is introduced.

  • A new technique for solving satisfiability of nonlinear constraints is presented.

  • The new technique is comparable to other ones regarding speed.

  • The new technique outperforms other ones regarding correctness.

Abstract

Constraint solvers are well-known tools for solving many real-world problems such as theorem proving and real-time scheduling. One of the domains that strongly relies on constraint solvers is the technique of symbolic execution for automatic test data generation. Many researchers have tried to alleviate the shortcomings of the available constraint solvers to improve their applications in symbolic execution for test data generation. Despite many recent improvements, constraint solvers are still unable to efficiently deal with certain types of constraints. In particular, constraints that include non-linear real arithmetic are among the most challenging ones. In this paper, we propose a new approach to solving non-linear real constraints for symbolic execution. This approach emphasizes transforming constraints into functions with specific properties, which are named Satisfaction Functions. A satisfaction function is generated in a way that by maximizing it, values that satisfy the corresponding constraint are obtained. We compared the performance of our technique with three constraint solvers that were known to be able to solve non-linear real constraints. The comparison was made regarding the speed and correctness criteria. The results showed that our technique was comparable with other methods regarding the speed criterion and outperformed these methods regarding the correctness criterion.

Introduction

As an automatic code analysis and software test data generation technique, Symbolic Execution (King, 1976, King, 1975) is mainly dependent on the capabilities of constraint solvers. Test data generation using symbolic execution consists of two main steps: Constraint generation and constraint solving. The constraint generation step builds a unique path constraint over the symbolic inputs for every desired execution path of the Software Under Test (SUT). These constraints are built in such a way that they are satisfied only by input values that execute their corresponding paths. This is where constraint solving plays an important role in Symbolic Execution; by solving the corresponding constraint of a path, concrete assignments to SUT’s input parameters are generated that can be used for testing the path. Fig. 1 depicts this process for a simple example with the goal of reaching a possible bug. Symbolic Execution was introduced nearly three decades ago, but only recent advances in constraint solving tools and techniques has made it a practical approach (Anand, Burke, Chen, Clark, Cohen, Grieskamp, Harman, Harrold, McMinn, Bertolino, Li, Zhu, 2013, Braione, Denaro, Mattavelli, Pezzè, 2017).

Despite the advances of constraint solving techniques, one of the biggest obstacles of utilizing Symbolic Execution is still the constraint solvers’ inability to deal with complex constraints (Anand, Burke, Chen, Clark, Cohen, Grieskamp, Harman, Harrold, McMinn, Bertolino, Li, Zhu, 2013, Thomé, Shar, Bianculli, Briand, 2017). Particularly, constraints built on undecidable theories, such as non-linear real arithmetic, are problematic. Also, solving some of the constraints built on decidable theories are expensive. Table 1 shows the complexity of satisfying some types of constraints (Zhang, 2008).

As a result, current constraint solvers have problems in dealing with constraints that include non-linear real or integer arithmetic. This is a big issue, since non-linear arithmetic is extensively used in many programs (Haller et al., 2012). Solving constraints built on non-linear arithmetic is undecidable; consequently, constraint solvers usually need relatively long run-times to solve constraints built on such theories, compared to other less complex theories. Another issue is that current constraint solvers don’t necessarily return correct solutions (i.e. solutions that satisfy the given constraint) (Anand et al., 2013). Accordingly, two important performance criteria of the constraint solvers in this regard are speed and correctness.

In this paper, we wish to improve the applicability of constraint solvers in Symbolic Execution by introducing a new approach to solving satisfiability modulo non-linear real arithmetic. The constraints obtained from programs that have floating-point input parameters usually have non-linear real arithmetic sub-constraints. Our goal is to propose a new approach for solving such sub-constraints. We show this new approach, namely Smooth Modeling, is a relatively fast method that is more correct in solving constraints, compared to other related methods and tools (see Section 3).

This new approach to constraint solving is based on the definition of satisfaction functions. A satisfaction function is defined in correspondence to a constraint. The most important characteristic of a satisfaction function is that it returns 1 for input values that satisfy its corresponding constraint. Similarly, a satisfaction function returns 0 for input values that don’t satisfy its corresponding constraint. Consider an arbitrary constraint like C. We call the intersection of its mathematical function’s domains DC. The corresponding satisfaction function of C is a function from DC → {0, 1}.

Approximative satisfaction functions are very similar to satisfaction functions but with few differences. These functions are smooth in bigger intervals. To be more specific, an approximative satisfaction function is non-differentiable or has discrete derivatives only where the mathematical functions of its corresponding constraint are non-differentiable or have discrete derivatives. Unlike satisfaction functions, approximative satisfaction functions return values nearly equal to 1 for input values that satisfy their related constraint. Similarly, they return values nearly equal to 0 for input values that don’t satisfy their related constraint. Accordingly, the approximative satisfaction function of the constraint C is from DC → [0, 1]. As an example, Fig. 2 shows the satisfaction function and an approximative satisfaction function of the constraint ¬(f*ff)(ln(f*f+1)2).

The presented approximative satisfaction function in Fig. 2 is:exp(f2f)*exp(2ln(f2+1))(exp(2ln(f2+1))+1)*(exp(f2f)+1)

It can be inferred that an input value satisfies the given constraint, if it is in the interval [exp(2)1,0][1,exp(2)1]. As can be seen in the figure, the output of the approximative satisfaction function is (nearly) equal to 1 in the same interval. We will explain how to create satisfaction functions and their corresponding approximative variants from constraints, later in this paper.

Although a little unusual, we can look at the satisfaction function of the constraint C as the probability distribution of its satisfaction over DC. Having this in mind, we can say it is possible to maximize the satisfaction function to find values that satisfy the corresponding constraint.

Since for any constraint C, any member of DC either satisfies the constraint or doesn’t satisfy it, the mentioned distribution is going to be a piece-wise non-smooth function which its range is the set of {0, 1}. Solving non-smooth optimization problems are difficult; instead, we maximize the approximative satisfaction functions which are smooth. It is worth mentioning that optimizing smooth approximations of non-smooth objective functions is a known practice and proven to be effective (Nesterov, 2005).

Accordingly, to generate test data for a specific execution path of the SUT using Smooth Modeling, we perform the constraint generation step of the symbolic execution first. This gives us the path constraint. Then, we generate the corresponding satisfaction function and its approximative version. Finally, we generate test data by maximizing the resulting approximative satisfaction function. Fig. 3 depicts the described process for a simple example.

Subsequently, the following research questions will be investigated in this paper:

  • RQ1: Is it possible to design an algorithm for automatically generating the satisfaction function (and its approximative version) from a constraint?

  • RQ2: What is the performance of Smooth Modeling in terms of speed, compared to other related constraint solvers?

  • RQ3: What is the performance of Smooth Modeling in terms of correctness, compared to other related constraint solvers?

To investigate these research questions, the following tasks were designed and performed:

  • 1.

    Designing the algorithm mentioned in RQ1

  • 2.

    Choosing an optimizer to be used in maximizing approximative satisfaction functions.

  • 3.

    Preparing sets of constraints to evaluate Smooth Modeling’s performance.

  • 4.

    Investigating other constraint solvers that are known to be able to deal with non-linear real arithmetic and mathematical functions. Three constraint solvers were chosen as competitors of Smooth Modeling: dReal, CORAL and CHOCO. The rationales behind this selection will be presented in Section 3.

  • 5.

    Comparing the performance of Smooth Modeling to that of other three constraint solvers in terms of speed and correctness (RQ2 and RQ3).

The collected results demonstrate that Smooth Modeling outperforms all other three constraint solvers in terms of correctness while it is comparable to them in terms of speed.

The rest of this paper is organized as follows: Section 2 explains Symbolic Execution in details. Section 3 reviews the related works of this research. Section 4 presents the approach of Smooth Modeling in full details. Section 5 addresses the evaluation of Smooth Modeling including methodology, results and discussions. Finally, Section 6 mentions the possible future works and concludes the paper.

Section snippets

Symbolic execution

Since our work depends on the symbolic execution of the source code, in this section, we review this technique.

The key idea behind symbolic execution (King, 1976) is to use symbolic values instead of concrete values for input parameters, and to represent values of program variables as symbolic expressions. Symbolic execution retains a symbolic state σ, which maps variables to symbolic expressions, and a path constraint PC, a first order quantifier free formula over symbolic expressions, for

Related works

In recent years, constraint solving has significantly advanced which made Symbolic Execution a hot topic in the field of test data generation, after three decades since its introduction. However, as mentioned before, the ability of constraint solvers to deal with complex constraints is still limited, which subsequently affects the applicability of Symbolic Execution in general. In this section, we mention tools and techniques that are related to solving complex constraints, explicitly, those

Smooth Modeling

In this section, we explain Smooth Modeling and its application in test data generation. Initially, we present an example to give an overview of this method. Later, we will describe the algorithms used in Smooth Modeling. Lastly, we describe how this method can generate test data for programs with more complicated arithmetic constraints.

Evaluation

In this section, we evaluate Smooth Modeling. In Section 5.1, we describe the evaluation method. In Section 5.2, we investigate the performance of Smooth Modeling in solving different types of constraints. In this regard, we compare the performance of Smooth Modeling with that of three other constraint solvers.

As said before, we have to use an optimization algorithm to find maximizing values for satisfaction functions. These values satisfy the corresponding path constraints and can be used as

Conclusion and future works

Symbolic Execution generates path constraints for programs under test. These constraints are solved by constraint solvers and the solutions are used as the test data for the related execution paths. In this process, solving non-linear constraints is challenging and problematic. This paper proposes a new approach, called Smooth Modeling, for dealing with constraints that involve non-linear arithmetic and have mathematical functions.

The core idea of Smooth Modeling is to model every constraint as

Saeed Amiri-Chimeh received his B.S. degree in computer science from the Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran, in 2014. He received his M.Sc. degree in software engineering from the Faculty of Computer Science and Engineering at Shahid Beheshti University in 2017. He is currently a Ph.D. student in the Faculty of Computer Science and Engineering at Shahid Beheshti University. His research interests are in the areas of procedural content generation,

References (20)

  • S. Anand et al.

    An orchestrated survey of methodologies for automated software test case generation

    J. Syst. Softw.

    (2013)
  • P. Ammann et al.

    Introduction to Software Testing

    (2008)
  • C. Barrett et al.

    CVC4

  • C. Barrett et al.

    The SMT-LIB Standard: Version 2.6

    Technical Report

    (2017)
  • C. Barrett et al.

    The SMT-LIB Standard: Version 2.0

  • P. Braione et al.

    Combining symbolic execution and search-based testing for programs with complex heap inputs

    Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10–14, 2017

    (2017)
  • B. Dutertre

    Yices 2.2

  • S. Gao et al.

    dreal: an SMT solver for nonlinear theories over the reals

  • L. Haller et al.

    Deciding floating-point logic with systematic abstraction

    Formal Methods in Computer-Aided Design, FMCAD 2012, Cambridge, UK, October 22–25, 2012

    (2012)
  • D. Jovanović et al.

    Solving non-linear arithmetic

There are more references available in the full text version of this article.

Cited by (2)

  • COSMOS: A comprehensive framework for automatically generating domain-oriented test suite

    2023, Information and Software Technology
    Citation Excerpt :

    Given a formula F in first-order logic, a solver determines whether F is satisfiable; if it is, it also reports a satisfying assignment. Among the solvers, some can handle nonlinear constraints containing mathematical functions, Smooth Modeling [15], dReal [16], CORA [17], and CHOCO [18]. Below, the mentioned solvers are introduced.

Saeed Amiri-Chimeh received his B.S. degree in computer science from the Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran, in 2014. He received his M.Sc. degree in software engineering from the Faculty of Computer Science and Engineering at Shahid Beheshti University in 2017. He is currently a Ph.D. student in the Faculty of Computer Science and Engineering at Shahid Beheshti University. His research interests are in the areas of procedural content generation, artificial intelligence and software testing.

Hassan Haghighi is associate professor at the Faculty of Computer Science and Engineering, Shahid Beheshti University, Iran. He received his Ph.D. degree in software engineering from Sharif University of Technology, Iran, in 2009. His main research interest includes using formal methods in the software development life cycle, and he has more than 50 papers in this area.

View full text