Keywords

1 Introduction

Tools for controllably experimenting with synthetic failures are an essential element of resilience investigation. These tools generally employ some form of software implemented fault injection (SWIFI) since it is highly adaptable, in contrast to hardware based approaches [11]. However, low-level hardware approaches have some advantages for performing tests that can originate at the lowest layers of the system. System-level virtualization has been explored as a way to combine the advantages of SWIFI with the low-level hardware oriented approaches using virtual machines (VMs) [14, 18, 21].

There are several advantages to using virtual machines with fault injection. The use of virtualization allows for strong isolation between the system under test (SUT) and control environment. The VMs provide a basis to customize the target environment and setup repeatable testing configurations. The strong isolation provided by the VMs can be beneficial for resilience experiments that might include tests that compromise the overall investigation environment, e.g., data corruption, high crash rates.

A major challenge of virtual machine based fault injection (VMFI) is providing adequate context about the target to inform site selection choices. Additionally, the target’s context must be sufficiently understood in order to monitor the target’s status and interpret the effects of injected errors. The lack of insight into the target (guest) context is a common issue with virtualization and emerges in many instances where information maintained within the guest’s context would be useful outside the guest VM, e.g., process monitoring. The technique of virtual machine introspection (VMI) was developed to overcome just these types of challenges and has been applied to performance monitoring and security.

We have used VMI methods with VM-based fault injection to bridge the gap between the target (in guest) and controller (outside guest). We describe the approach and demonstrate a proof-of-concept experiment where we can perform fault injection on a process in a VM using commands from the host (outside the VM). This approach maintains the strong isolation of VMFI and leverages VMI methods to gain target context.

The primary contributions of this paper are:

  • The presentation of tools for HPC Resilience investigations that support experiments at both user and kernel levels, which can be performed with strong separation between control and system under test environments;

  • A description of a cooperative VM-based fault-injection (FI) mechanism, which includes a discussion of how VMI can benefit FI;

  • The demonstration of proposed FI mechanism to study soft error resilience in iterative solver benchmark running in user-space of guest VM.

2 Background

2.1 Virtualization

The virtualization of physical hardware enables a privileged software layer to multiplex the underlying physical resources. This management layer is called a virtual machine monitor (VMM), or hypervisor, and is responsible for providing VMs with efficient, controlled access to the physical resources [20, 23]. The VMM runs on a host machine, and a VM runs on the VMM. The VM is often termed the guest and the operating system (OS) running in the VM is termed the guest operating system (or guest OS). There are two categories of VMMs that are distinguished by their position in the software stack with respect to the physical hardware [20]: (i) executes directly on the hardware (type-I), (ii) executes atop or within a host OS (type-II).

There are several open-source and commercial offerings for virtualization. Palacios [13] is a VMM that has been developed specifically for use in high-performance computing (HPC) environments. It can be embedded within the Kitten light-weight kernel or Linux OS. The implementation uses hardware extensions available in modern x86 processors to provide efficient virtualization. Palacios runs on standard x86 commodity clusters and Cray XT/XK supercomputers. Palacios is currently being used as part of the Hobbes OS research project [2].

2.2 Virtual Machine Introspection

Virtual machine introspection allows for a guest’s internal state to be exposed to an external viewer, commonly another VM [19], the VMM [1], or a process on the host [8]. Because the VM is executing on a software or hardware abstraction of physical resources, the amount of state exposed by VMI is extensive, ranging from device registers to the memory of the guest. This allows for the external software to both observe and modify the guest’s state. However, the view of the guest’s state is often difficult to understand because of the “semantic gap” [4]. To overcome this obstacle, researchers often create a bridge across the semantic gap by means of a memory map of a particular process or the guest OS. An example of this bridge with respect to Linux is the System.map file, which holds a significant amount of information including the virtual address of the various functions, data structures, and other data residing within the kernel.

2.3 Fault Injection

Virtualization offers several useful mechanisms for implementing fault injection. Suesskraut et al. [24] used VMs to speed FI campaigns by taking a snapshot of the full execution state before an experiment and then rolling back to the pre-injection state. This also allows for all software dependencies to be fully contained within the guest VM to allow tests to be spread across multiple physical machines. This encapsulation of the experimental environment was noted by Clark et al. [6] as a benefit for reproducing results and performing repeatable research.

DeBardeleben et. al [7, 9] have used virtualization to develop a platform for vulnerability assessments. Their approach is based on the widely used QEMU emulator, which supports a dynamic translation layer for evaluating the instructions executed by the guest VM. Their tool, F-SEFI, can be used to study the effect of soft errors on applications. They have used the tools to simulate soft errors to affect instruction operands (e.g., corruption of operands to FMUL instruction), which can be done randomly or on a per-function basis for an application. They model soft errors as single or multi-bit corruptions and can inject the errors on a deterministic and probabilistic basis. This work uses a different virtualization environment (QEMU) from our type-II virtualization software (Palacios). Also, they introduce errors at the instruction level via the dynamic translation layer of QEMU, whereas our approach introduces errors via a character device that exposes the guest’s memory with VM introspection techniques to identify the full process and memory layout for the target environment.

Le and Tamir [14] highlight advantages and challenges associated with using virtualization for FI based on their experiences developing and using the Gigan tool. They studied the fidelity of software implemented fault injection (SWIFI) running injection campaigns in a virtualized context versus running without virtualization (i.e., on bare hardware) and found the environments are comparable with some clear benefits for SWIFI based studies, i.e., isolation, logging, fast boot and crash detection. Their Xen based tool, Gigan, employed fault injectors at the (a) VMM level for injecting from outside the guest VM, and (b) kernel level for targeting kernel-space data structures and user-space processes within the VM. Lastly, they used the Gigan FI tool to develop a more robust hypervisor (ReHype) [14].

Note, others [5, 22] have investigated the fidelity of SWIFI in comparison to other FI approaches, showing that in some instances the software-based approach may be susceptible to an overestimation of errors in contrast to non-SWIFI approaches. The lessons being that single-bit failures introduced via SWIFI at the program level (in contrast to RTL or environment/hardware) may overestimate the effects of bit-flips. This has a bearing on vulnerability analysis that is derived from synthetic injection campaigns. Koopman [12] cited similar concerns for avoiding pitfalls when using fault-injection as a basis for dependability benchmarking. Therefore, the mechanisms employed in our work may not accurately mirror true hardware vulnerabilities, but have use for application testing and controlled experimentation where the user is mindful of the potential overestimations associated with SWIFI.

Li et al. [15] developed a binary instrumentation fault injection tool for studying soft errors in HPC applications. Their tool, BIFIT, is based on the PIN instrumentation tool and includes failure characterization based on injections into specific symbols/data-structures in the target HPC application based on profiling information for the applications. This work did not employ virtualization, but did study the effects of simulated “soft errors” on three HPC applications (Nek50000, S3D & GTC) by injecting bit-flips into global, heap and stack data objects. They limited the injections to application specific data, i.e., exclude middle-ware libraries, and observed that global data was significant to the influence of all three application’s output and execution state. They also observed time and location of the injection is significant for each application with injections at later stages of application execution seeming to have a greater influence on the application’s output & execution state. These soft error injections also affected the execution duration (walltime) of these applications, often with a 2x or greater increase in execution time. In our experiments, we target a different HPC application but focused on application specific data that is algorithmically important.

3 Cooperative Approach to Fault Injection

The placement of the SUT in a virtual machine enables FI campaigns to maintain a separation between the target and controlling system, regardless of whether the victim resides in user or kernel space. The separation of virtual/physical resources allows the resilience tests to operate within a guest virtual machine. The FI tests can be run from within the VM or from the host level entirely outside of the guest context. This division permits the host to control the guest, and can be an opportunity to modify the state within the guest (e.g., inject virtual device errors, inject data corruption into guest memory). This separation does increase the complexity involved in the experimental environment and requires additional steps to overcome the semantic gap between the host and guest contexts.

3.1 Fault Injection Mechanism

The FI mechanism is implemented using a modification to the Palacios VMM that exports the guest VM’s memory as a character device in the host OS. This device file enables host-level access to the memory of the guest OS and user-space tasks. A VM FI utility (VMFI) that runs on the host is configured with details about primary data-structures of the guest OS, e.g., address of the task structure symbol init_task. This provides details about the context of the kernel running in the guest VM and is similar to techniques used for VMI [17].

In the guest OS, another utility is used to provide a well-known marker to search for within the list of tasks. This is a small launch utility called wrapper that simply starts a command, i.e., fork()/exec(). This wrapper command is used to identify the process to target within the guest context.

On startup the wrapper utility prints its process identifier (PID). This PID can be passed as input to the VMFI utility, running outside of the VM, or the VMFI utility can be used to scan for the wrapper process in the VM. In the case of scanning for the wrapper process, the list of tasks within the guest OS is traversed (from outside the guest OS) to find all instances where the process name matches and the associated PID is displayed. This information (PID) then provides the necessary pointers to obtain the children tasks started by the wrapper and details about memory associated with those children. This lookup procedure results in the VMFI utility knowing the location of the memory associated with the wrapper’s child process, which is the target (victim) application running in the guest OS.

The startup of the wrapper and vmfi are currently manual steps. The other critical data that is necessary for the VMFI utility to function correctly is: the symbol names and addresses for the target application (that will run within the guest OS), and the value to write to the victim’s target address. These target addresses are limited to symbol names in order to simplify the lookup process. The value to inject is provided as input to the VMFI utility. A brief description with example usage information for the VMFI and wrapper utilities is given in Figs. 1 and 2.

Fig. 1.
figure 1

Usage information for wrapper utility that runs within the guest VM context.

Fig. 2.
figure 2

Usage information for VMFI utility that runs on the host (outside VM).

4 Evaluation

When performing fault injection experiments the integrity of the target environment can be corrupted and lead to unexpected behavior. The use of virtualization provides a software layer that strengthens the isolation between the guest (target) and host (control). The following tests were performed to demonstrate the cooperative approach to VM-based experiments that use guest system and application context running in the VM to perform fault injection from the host environment (outside the VM). While not tested here, the VM-based FI approach can be used for tests targetting system software in the VM that operates in a privileged mode and could crash or misbehave, without affecting the controller on the host.

4.1 Setup

The experiment used the Palacios VMM running within a Linux v3.5.0 host OS. The guest OS is a Linux v2.6.33.7 kernel using Busybox v1.20 to create a very small system installation. The guest VM configuration included shadow memory paging. The guest used bridged networking, whereby a Linux virtio network interface in the guest was connected to the host’s network interface. The HPCCG: Simple Conjugate Gradient Benchmark [16] was used as the target application. All tests were performed on a Linux cluster testbed (SAL9000) at ORNL. The machines in the cluster have 1 AMD64 CPU with 24 cores, 64 GB of memory, and dual-bonded 1 Gbps Ethernet. The host operating system was Ubuntu Linux 12.04 LTS.

4.2 Guest Application Errors

To investigate the feasibility of doing host-level injections into a guest-level context, the FI mechanism for Palacios described in Sect. 3 was leveraged. The HPCCG benchmark was used to test this FI functionality. The benchmark performs an iterative refinement until reaching a solution within a given threshold, or until a maximum number of iterations are performed. Previous studies have found iterative algorithms to be resilient to some errors [3], possibly at the cost of taking longer to converge on an appropriate value. The HPCCG benchmark has also been identified as a more representative metric for current scientific applications and was identified by Heroux and Dongarra as an alternate metric for future Top 500 indexes [10]. The HPCCG benchmark was slightly modified to expose the rtrans variable in the HPCCG function to be a global symbol. This was necessary in order for the vmfi utility to locate a target address within the guest OS. The rtrans variable was selected through manual code inspection; the variable is used throughout the life of the iterative application. The only other change to HPCCG was to vary the value of tolerance to allow the algorithm to adjust the solution threshold. For example, tolerance=0.0 results in the algorithm always running to the maximum number of iterations [16], in contrast to setting tolerance=0.0000001 that allows for a slight margin that can satisfy the threshold and (possibly) terminate before the maximum number of iterations. The binary was statically linked and run in serial mode (i.e., no use of MPI or OpenMP).

Fig. 3.
figure 3

Diagram showing the VM+FI setup with an application (e.g., HPCCG) target running in the Guest VM.

The overall layout is shown in Fig. 3. The host level vmfi injects a value into a specified memory address within the context of an application running within the VM. The application used in our tests, HPCCG, is reflected by the orange App (HPCCG) box that resides in the space of the VM (green box). Figure 3 also illustrates the vmfi utility running outside the VM context and injecting an error into the target running within the VM.

Table 1. The effects to the Final residual (normr). These statistics show the results for the serial HPCCG test without (a) and with (b) random data errors. In the error case, values between [1..100] at 1 second intervals were injected into the rtrans variable. The statistics are based on the Final residual at the end of the benchmark. The parameters for the benchmark were \(nx=100\), \(ny=200\), \(nz=100\), and \(tolerance=0.0000001\).

4.3 Discussion and Observations

The guest application error testing confirmed that the host-level injector functioned correctly and caused non-fatal errors in the target application, HPCCG. The intent was to simulate, at a very course-grain, data corruption of a key variable in the HPCCG program. The application was run 30 times both with and without injected errors. The same input parameters were used for all runs, \(nx=100\), \(ny=200\), \(nz=100\), which are the blocks of the matrix in the x/y/z dimensions [16], e.g., wrapper ./test_HPCCG_tol0.0 100 200 100. These values were selected to fit the available memory size and keep the execution time for the benchmark within the VM to a small amount of time to speed testing. The default maximum iterations max_iter=150 was used, and the tolerance was set to tolerance=0.0000001. All non-error cases resulted in identical output for the value of the Residual (rtrans) on each iteration, and the Final residual printed at the end (normr) as shown in Table 1(a). The same tests were re-run with errors injected into the rtrans variable during the execution. The fault injections took place at 1 second intervals and injected a random value between 1..100. This value was written as 4-bytes into the target variable (rtrans) to emulate multiple bit-flips in a single data value. As expected, there were no fatal errors as the changes were controlled to be only in the specific data value of rtrans, but there were slight perturbations due to the data errors as shown by Table 1(b) which did not occur in the non-error case of the benchmark. This experiment verified the ability to perform silent data corruption into an application running in a guest OS context from the host OS. All tests (with and without errors) completed in 74 iterations.

5 Conclusion

The use of VMs offers the ability to strongly separate the target from the hosting environment, which is useful when conducting fault injection experiments. The hosting platform has full access to the virtual guest context, but the details within the guest VM are not transparent from outside the guest’s context. To overcome this issue a cooperative approach was explored where details about the guest OS were made available to tools in the host context. In the guest context, additional wrapper command was added that provides information that host level tools can be leveraged to lookup details within the guest context. Additionally, the symbol maps for the guest kernel and application were made available to the host-level VMFI tools. This cooperative approach helps to reduce the semantic gap between the VM/host contexts.

The VM also provides a reusable execution context to support repeatable test configurations. This is very useful when creating a cooperative testing environment because the guest configuration is well known and customized as appropriate. Therefore assumptions can be made for the purposes of the FI experiments. For example, pre-compiled binaries can be placed in the VM that are also available on the host so symbol information (name/address) can be used for the FI experiments. This holds for the guest OS too, which can be made available at the host level for performing experiments on guest kernel data structures (e.g., via embedded VMM debuggers) or for accessing information about processes within the guest OS. The key insight being that the VM offers a customizable container that can be adapted as needed to simplify and aid FI experiments. The VM also offers full access to the guest context that would otherwise be difficult to achieve from a purely software approach.

A disadvantage of this low-level VMFI approach is an increased level of complexity and an increased semantic gap. This gap emerges because the higher level contextual information about the application (target) is divorced from the lower level VM vantage point. To overcome this challenge additional capabilities may need to be put in place, i.e., cooperative services, that provide additional information about the application context. For example, while the memory region for a guest OS is known by the VMM, the guest OS specific data structures within the VM are opaque. Therefore, a cooperative exchange of data is necessary to inform the host about details associated with the guest OS. For example, providing the VMM with a system map with the symbol names and address of functions and data structures of the guest OS running within the VM.

The prototype VMFI approach that we discussed in this paper was greatly influenced by VMI techniques. As demonstrated in the experiment, we were able to use these techniques to inject errors from outside the VM into specific data structures of a real benchmark (HPCCG) running within the guest VM. The iterative solver (HPCCG) reached the correct result, as expected, but the effects of our silent data corruption were detectable in an increased variance in the final residual (normr). While this experiment is very simplistic, it does show that the VMFI tool is working correctly and is usable for studies on applications running within a VM.

This work used the strong isolation of VMs to separate the FI controller from the FI target. Another approach that would be interesting to explore is the use of container-based virtualization to provide the isolation between the FI controller and target. The failure isolation properties of VMs and containers are not identical, and the container-based environments are restricted to a single OS kernel. Therefore, if the intent was to pursue FI campaigns against low-level system software (e.g., guest OS targets), the VMFI approach would be a better option than a container-based approach. However, if the target is an entirely user-space application, the isolation between containers may be sufficient for the FI experiments. A container-based approach would not suffer the semantic gap problem associated with VMs because there is a single OS kernel and the FI controller (outside container) could have full visibility of all running processes. In general, further investigation is required to better understand the failure isolation properties of these single and multiple kernel approaches to virtualization.