Suitability of the time controlled environment for race detection in distributed applications

https://doi.org/10.1016/S0167-739X(99)00075-8Get rights and content

Abstract

The paper considers testing problem of time dependent errors detection in distributed applications. We determine time conditions for race occurrence and show how to obtain them in distributed environment. Then race detection testing strategies (RDS) following from the concept of a time controlled environment (TCE) is presented. Besides, we prepared some experiments to evaluate suitability of TCE and capability of RDS for improving reliability of distributed software applications.

Introduction

Parallel and distributed software concepts enable increasing of application reliability, performance and fault-tolerance factors, but they generate a lot of problems [15]. One of them refers to the development of programming platforms supporting design of user applications. In many cases graphical approaches are most promising [3]. Other problems are related to efficient execution of user applications in a given computing environment, where reduction of communication and processing delays among processes are often recommended. Then different assignment strategies are widely considered [13]. Besides, many papers concentrate on the dependable behaviour of the application during its runtime, where, testing and debugging play a crucial rule [1], [6]. The purpose of testing is to find bugs, introduced in any part of program life cycle [1]. There are many strategies and tools that help the developers to find bugs. The most popular are structural testing (white box) and functional testing (black box) [1]. Both base on the input data selection, the data is chosen adequate to the coverage paths in the data-flow graph or in the transaction graph representing the analysed application.

In general we can consider three main approaches to program testing:

  • 1.

    Execution with specially selected input data and comparison of obtained results with expected ones.

  • 2.

    Execution with specially instrumented application code and its supporting environment.

  • 3.

    Execution with a specially tuned environment with regular input data and observation of program behaviour.

The general framework for these three approaches is shown in Fig. 1. Approach 1 is the least intrusive, simple the application code is not affected nor modified. The only concern here is to select appropriate input data and observe program behaviour at its standard input. Approach 2 is the most intrusive, since any experiment requires code modification (setting traps, controlling messages, time-stamps). Moreover a specialized tool is needed for collaborating and processing at that data. Such a tool is strongly connected to the supporting environment and is not portable. While in Approach 1 test quality depends on the data set, its size and adequacy, in Approach 2 the quality strongly depends on the scope and depth of observed program parameters. Approach 3 is less intrusive in the sense that it does not directly interfere with the code execution, because it concentrates on modifying environment parameters such as communication delays, process rescheduling, mapping or dynamic load balancing. Owing to this, various extreme conditions for program execution can be established by the system, what is especially important for distributed software applications, although without any severe damage or interference with the program code. Effective implem entation at Approach 2 and 3 require using various monitoring and visualisation techniques delivered elsewhere.

In the paper we concentrate on Approach 3 and show how to modify a typical environment to create conditions for detecting time dependent errors.

The paper describes the time controlled environment (TCE) and evaluates its suitability for race detection strategy (RDS) which need setting the suitable environment parameters. The next section determine how to get the external time conditions in TCE, Section 4 contains RDS procedures and shows how to create TCE for race detection. Section 5 presents the implementation of the TCE using PVM [4] platform. The experiments for evaluation of TCA and RDS are described in Section 6, general comparison of the RDS with other testing strategies is presented in the Section 7 and the last section contains final remarks.

Section snippets

The related works

Testability of distributed applications relies on specific environment and application features, especially on the kind of process creation in the application. A process creation tree (PCT) represents process inheritance (see an example in Fig. 2(a)). Process p1 is a parent of process p2 iff process p1 created process p2. The root of the tree is the system process, which serves the user interactions. Any comparison between two behaviours of program can be made iff their PCT are the same for

Extremal time conditions for race detection

Let us assume basic model of an distributed application. The application is defined as process graph GA=(P,C) with vertices P={p1,p2,…} representing single-threaded processes, and set C⊆{cij∣1≤i,j≤∣P∣} of directed edges representing possible data transfer (messages) between connected processes (see Fig. 2(b)). The cardinality of set P, ∣P∣=n. Edge cij≠0 indicates that one or more messages sent from pi to pj during lifetime of an application and that communication time required to deliver each

Race detection strategies

Below three different RDS’s are described. The first selected race algorithm is an optimal one. It guarantees to detect all races in the application as long as special (described further) conditions are satisfied. In order to describe the algorithm a few definitions are provided.

Let us define receive set of process pk as set R(pk) containing all messages received by this process. For example in Fig. 4(a), R(p2)={m1,m4}. For simplicity we consider Class C applications which meet additional

Implementation of TCE

The TCE is implemented for PVM [4] platform. Fig. 5 shows layered architecture of the proposed implementation. The base layer is PVM (parallel virtual machine) platform, which provides process creation and message passing services. The next layer consists of PVM recorder and TCE simulator packages. The former supports communication events registration. The later enables impact on the communication and computation times. The panel tool is a front-end application, used for coordinating the test

Experimental results

To evaluate the proposed testing strategies of distributed applications and prepared TCE we made a number of experiments based on the fault-injection approach. The testing algorithms were verified using the simple PVM stub application. The application were created with randomly generated communication scenario. In such a scenario a number of independent races were injected. Then the resulting application was tested using different testing algorithms. The applications had various combinations of

General quality of testing strategies

We define several general quality attributes which describe testing strategies, they can be used for comparison of their properties. The following attributes are considered:

Test data generation The required test data generator to deliver the values of input data.

Needs of tools The kind of tools supporting the considered testing strategy.

Testing complexity The number of application executions during the testing process, depending on the number of processes (n), the number of paths (k), and

Conclusions

TCE is the modified environment for testing time dependent errors in distributed applications. The general strategy (RDS) bases on changing the environment parameters in contrast to other strategies changing or checking input data and control flow (structural testing [14], otot strategy [2]) in the application. It can be used for testing Class D applications.

We described three algorithms of setting time delays in the TCE. In consequence, we proposed three RDS algorithms where the opposite

Henryk Krawczyk was born in 1946, received M.Sc. and Ph.D. from the Technical University of Gdańsk in 1969 and 1976, respectively. A Professor in Computer Science and a Chair of the Department of Computer Architectures at the Faculty of Electronics, Telecommunications and Informatics. Research interests include dependability, quality assurance and distributed and parallel processing. Member of IEEE since 1996.

References (15)

  • B. Beizer, Software Testing Techniques. Van Nostrand Reinhold, New York,...
  • S.K. Damodaran-Kamal, J.M. Francioni, Testing races in parallel programs with an otot Strategy, Proceedings of...
  • G. Dózsa, P. Kacsuk, T. Fadgyas, Development of graphical parallel programs in PVM environments, Proceedings of DAPSYS,...
  • A. Geist, J. Beguelin, J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, Parallel Virtual Machine, PVM 3 Users Guide and...
  • C.A.R. Hoare, Communicating Sequential Processes, Prentice-Hall, Englewood Cliffs, NJ,...
  • P. Kacsuk, J. Cunha, G. Dózsa, J. Lourenco, T. Fadgyas, T. Antão, A Graphical Development and Debugging Environment for...
  • M. Neyman, H. Krawczyk, P. Kuzora, J. Proficz, B. Wiszniewski, STEPS —A Tool for Testing PVM Programs, SEIHPC-3...
There are more references available in the full text version of this article.

Cited by (0)

Henryk Krawczyk was born in 1946, received M.Sc. and Ph.D. from the Technical University of Gdańsk in 1969 and 1976, respectively. A Professor in Computer Science and a Chair of the Department of Computer Architectures at the Faculty of Electronics, Telecommunications and Informatics. Research interests include dependability, quality assurance and distributed and parallel processing. Member of IEEE since 1996.

Bartosz Krysztop was born in 1974, received M.Sc. from the Technical University of Gdańsk in 1998. Currently a Ph.D. student in distributed systems.

Jerzy Proficz was born in 1974, received M.Sc. from the Technical University of Gdańsk in 1998. Currently a Ph.D. student in distributed systems.

Supported in part by EU under INCO-Esprit KIT project 977100, and by the State Committee for Scientific Research (KBN) under grant 8 T11C 043 12.

View full text