Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Multiple efforts at automated static analysis of large software repositories exist, and have had a measurable impact on software quality. Notably, the experiment described in [9] has resulted in hundreds of bug reports, many of which have since been closed as fixed; and the DebileFootnote 1 effort has provided developers with the results of running several static analysers through a uniform interface.

We wished to have a tool with several features that would enable similar experiments to be run with dynamic program analyses. These features are:

  • Automation. We require the ability to automatically drive the user interfaces of computer programs.

  • Wide Applicability. We require that our tool can be used to drive a wide variety of software, regardless of the high-level user interface toolkit used.

  • Reproducibility. If a certain sequence of user inputs gives rise to an interesting behaviour (e.g. a crash), we require the ability to ‘play back’ that sequence in order to reproduce the behaviour.

  • Realism. We do not wish to spam target programs with random inputs, but would like to be able to exercise target programs with only those sequences of user inputs that a real user might issue.

  • Conciseness. On the other hand, we do not wish to define every possible use case as a set of user interaction ‘scripts’, but would rather have a concise definition of the desired user interactions with the target program.

1.1 Overview

We have implemented smid (the state machine interface driver)—an open-sourceFootnote 2 tool that autonomously interacts with desktop computer programs, driving them with user input in the same way that a real user might. In this paper, we outline the aims and motivation of the tool in Sect. 1; describe smid ’s input language in Sect. 2; and describe the tool’s usage and how we used it to reproduce a bug in Sect. 3. A video demo of the tool can be viewed on YouTubeFootnote 3.

With smid, the space of valid interactions with a program is understood as a state machine—that is, a directed graph whose nodes are states and whose transitions are sequences of user interactions. A state is a point during the program’s execution where a certain set of transitions are possible; these are often referred to as ‘modes’ in the user interaction literature [12].

Example: In a graphical web browser such as Firefox or Chromium, the main browser window, as well as each of the auxiliary windows (download window, print dialog, etc.) can be understood as states. Transitions (key presses and mouse events) may cause the state to remain the same (e.g. when pressing Ctrl+R to reload a web page), or change the state (e.g. clicking on the ‘Options’ menu item changes the state to the Options window). Different views in the same window may also be understood as separate states.

To use smid, users describe such a state machine by writing a file in the smid language. The file is parsed by the smid tool, which outputs a single ‘run’ (a finite path through the state machine, from the initial to a final state). smid then autonomously ‘drives’ the program by ‘playing back’ this run—that is, by sending the sequence of user interactions to the target program. By playing back a large number of runs, the target program can be rigorously tested.

1.2 Comparison with Existing Tools

Tools that we have explored, both in academia and in the wider software development community, fell short of one or more of the requirements listed above.

In both academia and industry, the majority of black-box driving tools are only able to interact with programs using one particular user interface toolkit (UIT). The tools Selenium [15], Mozmill [14], Watir [11] and Canoo [8] can only be used to autonomously interact with web applications. EXSYST [7] is a Java GUI automator, although its search-based technique is not limited to Java programs. A tool that does not have the single-UIT limitation is Jubula [16], which is able to drive applications using one of several web and desktop UITs.

Jubula and the web application drivers mentioned above work by executing a ‘script’ of user interactions, which corresponds to a single use case. To test the target program on all use cases, one must write a large number of scripts; changes in the underlying program then require updates to all affected scripts. We find this method to be fragile, and mention how smid negates the need for redundant specification in Sect. 3. There has been some work in academia on automatically determining the behaviour of the target program. Amalfitano et al. [1, 2] describe techniques for determining the behaviour of Android applications and web applications, respectively. A formal approach to designing, specifying, and subsequently generating tests for graphical user interfaces is described in [3]. There has also been work to automatically understand user interface behaviour using techniques from artificial intelligence [10] or machine learning [5]. Nevertheless, these techniques have each been implemented for only a single UIT. One possibility for future work is to use the Linux Desktop Testing Project [4] (LDTP) to provide the highly semantic information that is needed for these ‘behaviour-learning’ tools, for a wide range of UITs and platforms.

In contrast, smid allows the specification of the set of user interactions that are sensible for the target program using the language described in Sect. 2, without burdening the user with writing interaction ‘scripts’ and repeatedly specifying common interaction sequences. The fact that smid sends events directly to the X Window System—which underlies all graphical user interface toolkits on desktop Linux and BSD operating systems—means that smid is able to drive all interactive desktop applications, as well as console-based applications (through interaction with a terminal emulator), that run on an X-based operating system. The smid language itself is UIT-agnostic, and we hope to use LDTP in the future in order to provide smid users with interactions that are more semantic than raw interactions with X—for example, by specifying graphical widgets by name rather than by coordinate.

1.3 Scope and Limitations

smid is aimed at running experiments on the large body of software supplied with a Linux distribution. Accordingly, while the language described in Sect. 2 is UIT-agnostic, our implementation of smid targets desktop applications; we did not attempt to implement driving of touch- or voice-based user interfaces. The back-end to our tool is xdotool  [13], an API for sending events to an X server, but we expect that smid can be modified to use a different back-end (like LDTP) in order to make it usable on other systems. While smid is used to drive target programs toward error states, smid does not implement any error detection itself; we leave application-specific error-detection to the user.

2 The SMIDsmid Language

The smid language is used to describe a user interface as a state machine. Files written in the smid language consist mostly of transition statements; in this section, we describe the various forms that transitions can take, as well as other features of the smid language. The most up-to-date guide to the language is the reference manualFootnote 4 found on the smid web site.

Figure 1 is a minimal smid file. It shows the five kinds of statements in the smid language: initial and final state declarations, a transition (containing actions), and region declarations. The states in the state machine are all states declared as the start or end state of a transition (i.e., smid checks that there is a path from the initial state to a final state through every state)—in Fig. 1, those are nice_state and boring_state.

Fig. 1.
figure 1

A minimal smid file

The indented lines from nice_ state to boring_state in Fig. 1 describe a transition. The syntax for transitions is start_states – list of actions –> end_state. At any point during smid ’s execution, smid has a ‘current state’. At each step of the program run, smid chooses a random transition enabled at the current state; it then performs the actions of that transition, and then changes the current state to be the end state of the transition. The possible actions include keys, text and line for sending keypresses to the target program; move, move-rel, click and scroll for mouse events; as well as actions for switching windows, executing shell code, and changing the probability of following transition (smid performs a random walk over the state machine by default). Complex user interfaces beget large numbers of similar transitions, so the smid language includes syntactic sugar to make it possible to represent several transitions in a single statement—shown in Figs. 2 and 3. This allows us to specify large state machines using fewer statements, as noted in Sect. 3.

Fig. 2.
figure 2

The first line is equivalent to the next three taken together, if foo, bar, baz and quit are the only states in the state machine and quit is a final state.

Fig. 3.
figure 3

The first line is equivalent to the next two taken together.

3 SMIDsmid Usage and Case Study

Given a smid file with the syntax described in Sect. 2, one can use the smid tool for several functions:

  • Visualising the State Machine. smid can generate a diagram representing the state machine described by a smid file. Figure 5 shows a state machine diagram generated from the smid file in Fig. 4.

  • Generating a Run. smid can output a list of user actions, called a ‘run,’ by accumulating the actions along a finite-length walk of the state machine.

  • Playing Back a Run. smid can read a run and sends the specified interactions to the target program. Thus, given a run, smid can autonomously interact with the target program in the same way a real user might.

Fig. 4.
figure 4

A smid file containing several states.

Fig. 5.
figure 5

State machine diagram corresponding to the smid file in Fig. 4.

In this section, we describe using smid to pinpoint crashes in the target program.

We wrote a smid file (available on GitHubFootnote 5) for the cmus Footnote 6 console music player. This file had 25 transitions, which smid expanded into a 103-transition state machine. Hence, we find that the smid language allows us to specify a wide range of behaviour in a concise format and with little redundancy.

We used this smid file to reproduce a reportedFootnote 7 segmentation fault. The bug report suggested that the bug was triggered when playing MP4-encoded media files. We set up a large library of audio and video containers, and a smid specification designed to hone in on the reported bug. smid caused cmus to browse to one of the media files, seek to a random point in the file, and play the data for several seconds. Using this setup, smid triggered the segfault in many of the several hundred runs that we ran.

By logging cmus using the SystemTap  [6] instrumentation framework while running it with smid, we were able to discover several scenarios under which this bug was triggered, but which were not described in the original bug report. This demonstrates the utility of our approach—namely, sending a diverse range of inputs to the target program, from the space of ‘sensible’ user interactions.

4 Conclusions

The smid language, described in Sect. 2, is UIT and platform agnostic. The current implementation of the smid tool uses this fact to drive programs by sending user interactions directly to the underlying window system, rather than to a specific UIT. This means that we are able to drive the large variety of applications that can render a window under the X Window System.

Existing approaches either try to learn the state machine—tying them down to particular UITs, or drive the interface using a script—an approach which is not scaleable. The case study in Sect. 3 shows the value of our approach: by specifying all reasonable behaviours of the target program, we were able to quickly hone in on a bug without spamming the target program with unrealistic inputs.