Boosting input data sequences generation for testing EFSM-specified systems using deep reinforcement learning

https://doi.org/10.1016/j.infsof.2022.107114Get rights and content

Abstract

Context:

Input data sequence (IDS) is an important component of test sequences for testing from the Extended Finite State Machine (EFSM) model. During test generation, frequent IDS derivation is time-consuming. Therefore, it has become one of the key factors that restrict the efficiency of test sequences generation.

Objective:

To address this issue, this paper introduces deep reinforcement learning (DRL) to propose a novel approach named IDSG-DRL to accelerate IDS derivation.

Method:

Our method first formalizes the problem of generating IDS for EFSM-based testing as a Markov decision problem. It then incorporates the DRL algorithm to learn experience from previous input data generation and train a decision-making model to significantly enhance the efficiency of subsequent data derivation for the newly generated test sequences. To improve the convergence of DRL algorithm and ensure the success rate of data generation, a state representation based on variable deviation and action formulation using adaptive exploration are elaborately designed. Finally, a DRL-based algorithm for efficiently yielding IDS is presented for any subject test sequence.

Results:

We evaluate the proposed approach against the random method, GA-based method as well as a particle swarm optimization (PSO) based method. Experimental statistics show that IDSG-DRL significantly outperforms the baselines in terms of iteration steps, runtime cost, and the success rate of input data derivation. Specifically, compared to random, GA-based and PSO-based methods, IDSG-DRL can reduce the average number of iteration steps by up to 87.09%, 78.57%, and 56.35%, respectively. Regarding the average runtime, our approach is about 3.52 and 1.58 times faster than the GA-based and PSO-based methods. Additionally, given a larger input range, we observed that the performance of IDSG-DRL is more stable and its advantages are more significant.

Conclusion:

The experimental results suggest that our method is very promising to speed up IDS generation for EFSM-based testing.

Introduction

Testing plays an indispensable role in guaranteeing the quality of software systems. Model-based test (MBT) is one of the important methods to achieve automated testing [1]. Concretely, MBT relies on a formal system model to describe the core behaviors of the system under test (SUT) and then automatically derives test cases. An extend finite state machine (EFSM) model is the extended version of finite state machine (FSM). Compared to the FSM, EFSM incorporates variables, parameters, and guards, which enrich the ability to describe both the data portion and the control portion of a SUT. Hence, the EFSM model has been widely used in MBT to represent reactive systems [2], [3], [4], [5], such as network protocols, embedded systems, and circuit systems, etc.

In EFSM-based testing, a test sequence (case) consists of a sequence of input/output pairs (a transition path) associated with suitable input data. Input data sequence (IDS) of a transition path refers to a sequence of values of all input parameters referenced by transitions in the path. A certain test coverage criterion is usually described as a suite of transition paths. Test sequences using various coverage criteria correspond to different test paths, although some transitions between paths may overlap. For these test paths, we need to yield corresponding input data sequences respectively. In addition, due to the existence of prediction conflicts or data dependence between transitions, some generated test paths may suffer from the infeasible problem [6], [7]. That is, execution of preceding transitions may result in unsatisfied guards of subsequent ones in the test path. Eventually, these transitions in a path may fail to be executed sequentially. To check if a yielded test path is feasible or not, we can judge by finding at least one IDS to trigger the subject path. Consequently, feasible IDS generation has become a critical step that occurs frequently in test cases derivation for EFSM-based testing. Efficiently generating input data sequences can significantly speed up the derivation of test cases and ultimately improve the efficiency of software development.

In the past decades, there are substantial methods proposed to generate input data sequences for testing from EFSMs. Random data generation [7], [8] is the most straightforward approach. However, for some large value ranges or conditional expressions that are difficult to satisfy, randomly generating input data is relatively time-consuming, and the success rate is not high. Compared with random methods, symbolic execution and constraint solver is a class of more efficient methods. It expresses the IDS generation as a problem of solving the constraints contained in a symbolic expression. Unfortunately, in practical applications, this type of method faces some challenges. For instance, some constraints involving nonlinear operations (e.g. multiplication) are undecidable [9]. Furthermore, it is also difficult to deal with interacting with untraceable libraries [10].

In contrast, search-based methods have better universality and therefore have attracted widespread attention [11], [12], [13]. They provide an effective way to automatically generate test data using various meta-heuristic algorithms (e.g. simulated annealing, particle swarm and evolutionary search). Most of the proposed methods usually employ some forms of evolutionary search. In essence, these algorithms commonly search the solution through repetitive trial and error. The selection of one candidate is based on its fitness function value in the evolutionary algorithm. Improvements of fitness value represent the indications of correction of search direction, which also can be considered as rewards for decisions in evolutionary computation. If one candidate with bigger fitness value, we will take it as the new candidate to continue the new evolution. Otherwise, we view this direction operation as an error and then the meta-heuristic algorithm will terminate this search branch. Some researchers [14] argued that the single use of genetic algorithms might limit results and stifle creativity. Additionally, these algorithms need to be manually elaborated before searching test data, which determines the quality and efficiency of test data generation. Most importantly, for each IDS derivation, the conventional methods must initiate a new iterative search process for the lack of an effective mechanism for reusing historical input data generation experience. Consequently, it reduces the opportunity to further improve the efficiency and success rate of input data generation. This therefore prompts our research question: can we learn and use historical search experience to construct a decision model, based on which to efficiently direct the generation of subsequent IDS?

Deep reinforcement learning (DRL) [15] is the combination of deep learning and reinforcement learning. The main objective of DRL is to train an agent through interacting with the environment to learn optimal behaviors for solving sequential decision-making problems, in a trial and error manner. Relying on the powerful perception ability of deep learning, DRL breaks through the scale limitation of RL methods and can be applied to decision-making tasks with high-dimensional state and action spaces. For example, the outstanding success in playing games (i.e, Atari [16] and AlphaGo) shows its super-human level. Essentially, search-based input data generation process using meta-heuristic algorithms similar to reinforcement learning is also based on trial and error. The meta-heuristic algorithm just like an agent to interact with the environment that formulates the test situation. Individual search taken by the meta-heuristic algorithm can be considered as one-step decision based on the trained model, and then the feedback from chosen action constitutes fitness value. If the decision is correct, a positive reward will be given, otherwise the corresponding punishment will be carried out. Inspired by DRL, it is interesting and promising to reformulate IDS generation as a DRL problem. Instead of meta-heuristic algorithms, DRL can provide a novel way to efficiently automate IDS generation.

In this paper, we present an input data sequence generation method based on deep reinforcement learning (IDSG-DRL) for EFSM-specified systems. Compared with traditional search-based methods, the proposed method is not limited to manually design or choose a certain meta-heuristic algorithm. What is more, our approach can automatically learn the underlying relation between testing scenario and input data derivation behavior from historical input data generation experience. On this basis, a decision model is constructed. Consequently, based on the decision-making model, new IDSes can be generated efficiently for various objective test paths. To the best of our knowledge, this is the first attempt to use DRL for input data generation in the context of EFSM-based testing. The main contributions of this paper can be summarized as follows:

  • A DRL-based input data sequences derivation approach is presented for testing from EFSM models. It incorporates the DRL technique to learn experience from previous input data generation and train a decision-making model to significantly speed up IDS derivation for new objective test paths.

  • An environment state (observation) formulation is designed to accurately characterize the context in which input data is generated. It contains the current subject transition whose input data is to be generated and an estimate of the plausibility of all current values of input parameters involved in the transition.

  • It proposes an adaptive exploration algorithm for agent action, which can select the appropriate behavior of updating input parameters according to the current environment observation. Based on this, we can accelerate the convergence of deep reinforcement learning while ensuring the success rate of IDS generation.

  • The effectiveness and efficiency of IDSG-DRL are verified by carrying out comparative experiments based on 3038 test paths from five classic EFSM models.

The remainder of this paper is organized as follows. Section 2 briefly introduces related work. In Section 3, some backgrounds are provided, including EFSM model, feasible transition path, IDS, and DRL. Section 4 presented the new approach in detail. Experimental results and discussions are provided in Section 5. In Section 6, we draw concluding remarks.

Section snippets

Related work

A number of techniques [3] have been proposed for input data sequence generation for transition paths on an EFSM over the past decade. These approaches can be divided into two main camps: symbolic execution and search-based methods.

Symbolic execution [17], as a powerful program analysis technique, is widely used in test data generation. In symbolic execution, for every input parameter referenced by transitions in a test path, it can finally be expressed as a symbolic expression. A constraint

EFSM model

In general, an EFSM model M is a 6-tuple M=S,s0,V,I,O,T. S and s0S denote a finite set of states and the initial state, respectively. V=VipVopVc is a variable set where Vip, Vop, and Vc correspond to the set of input parameters, output parameters, and context variables, respectively. Vip and Vop refer to those parameters respectively involved in parameterized input and output events, while Vc denotes the internal variables of the model. A state associated with a valuation of the context

Problem formulation

During test generation, once a candidate test sequence is yielded, the search process for a suitable input data sequence will be stimulated immediately. Given a test sequence Pathk with an initial state configuration sc0, IDS generation problem involves efficiently finding at least one set of input parameters valuations for each transition in Pathk. These input data can exercise each transition one by one in order. Accordingly, this problem can be further decomposed into a series of

Experiment

In this section, we describe the experiments in which the proposed IDSG-DRL method was utilized to generate input data for feasible test paths from five EFSM models. Taking a random algorithm, a GA-based algorithm and a PSO-based algorithm as the benchmarks for comparison, the performance of our approach is evaluated in detail.

Conclusion

This paper proposed IDSG-DRL to efficiently accelerate input data sequences derivation for testing from EFSM models. Firstly, we formalized the problem of yielding input data sequences in the context of EFSM-based testing as a DRL problem. Based on this, a state representation based on variable deviation and action formulation using adaptive exploration were elaborately designed for accurately characterizing this DRL problem and speeding up the convergence of deep reinforcement learning.

CRediT authorship contribution statement

Ting Shu: Conceptualization, Methodology, Validation, Writing – original draft, Writing – review & editing. Cuiping Wu: Software, Formal analysis, Investigation. Zuohua Ding: Resources, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LY22F020019, the Zhejiang Science and Technology Plan Project under Grant No. 2022C01045, and the National Natural Science Foundation of China under Grants (No. 61101111 and 62132014).

References (41)

  • DualeA.Y. et al.

    A method enabling feasible conformance test sequence generation for EFSM models

    IEEE Trans. Comput.

    (2004)
  • YangR. et al.

    EFSM-based test case generation: Sequence, data, and oracle

    Int. J. Softw. Eng. Knowl. Eng.

    (2015)
  • PetrenkoA. et al.

    Confirming configurations in EFSM testing

    IEEE Trans. Softw. Eng.

    (2004)
  • DerderianK. et al.

    Estimating the feasibility of transition paths in extended finite state machines

    Autom. Softw. Eng.

    (2010)
  • BourhfirC. et al.

    Automatic executable test case generation for extended finite state machine protocols

  • HuangC.-M. et al.

    Executable EFSM-based data flow and control flow protocol test sequence generation using reachability analysis

    J. Chin. Inst. Eng.

    (1999)
  • ZhangJ. et al.

    Path-oriented test data generation using symbolic execution and constraint solving techniques

  • McMinnP.

    Search-based software testing: Past, present and future

  • ZhaoR. et al.

    Diversity-oriented test suite generation for EFSM model

    IEEE Trans. Reliab.

    (2020)
  • NúñezA. et al.

    Using genetic algorithms to generate test sequences for complex timed systems

    Soft Comput.

    (2013)
  • Cited by (0)

    View full text