Theory and algorithms for slicing unstructured programs

https://doi.org/10.1016/j.infsof.2005.06.001Get rights and content

Abstract

Program slicing identifies parts of a program that potentially affect a chosen computation. It has many applications in software engineering, including maintenance, evolution and re-engineering of legacy systems. However, these systems typically contain programs with unstructured control-flow, produced using goto statements; thus, effective slicing of unstructured programs remains an important topic of study.

This paper shows that slicing unstructured programs inherently requires making trade-offs between three slice attributes: termination behaviour, size, and syntactic structure. It is shown how different applications of slicing require different tradeoffs. The three attributes are used as the basis of a three-dimensional theoretical framework, which classifies slicing algorithms for unstructured programs. The paper proves that for two combinations of these dimensions, no algorithm exists and presents algorithms for the remaining six combinations.

Introduction

Mark Weiser first defined a program slice in the context of debugging [35]. Since then program slicing has found many applications besides debugging [17], [24], [27], [31] including program integration [23], comprehension [14], [18] and reuse [3], [12]. There also has been an active body of work in computing various types of slices resulting in a rich nomenclature for classifying slicing algorithms: intraprocedural vs. interprocedural; static vs. dynamic; backward vs. forward; executable vs. non-executable; and syntax-preserving vs. amorphous [6], [7], [13], [21], [33].

In short, intraprocedural slices consider a single procedure in isolation, while interprocedural slices consider multiple procedures with procedure calls. Static slices are computed from a program using static analysis while dynamic slices are computed from a program and an input and thus take into account a single execution of the program. A backward slice identifies program components that might affect a given computation. Its dual, a forward slice, identifies program components affected by a given component. An executable slice is an executable program that captures a subset of the original program's computation, while a non-executable (or closure) slice simply identifies the elements that affect (or are affected by) a given computation. These are often the same, but not always [4]. Finally, a syntax-preserving slice contains only portions of the original program's text, while an amorphous slice allows semantics-preserving transformations [18].

Slicing has found many applications because it allows the programmer to focus on a sub-computation; extracting it in the form of an executable subprogram—the slice. The sub-computation of interest may be one that the original author of the program had not considered and so the computation which denotes it may be arbitrarily scattered throughout the source code of the original program. The task of constructing the slice is thus the task of locating these scattered components and the supporting computations upon which they depend. It is a demanding problem because it requires a deep semantic analysis in order to ensure that the slice extracted preserves the behaviour of the original program with respect to the computation of interest.

The problem of slicing unstructured programs is important because many of the applications of slicing involve maintenance, evolution, and re-engineering of legacy systems, often written in programming styles which make heavy use of unstructured control flow [3], [9], [10], [26]. Even recent systems contain a significant proportion of goto statements. For example, an inspection of Linux Kernel version 2.6.8.1 revealed that approximately 0.86% of the statements are goto statements. Finally, some programming languages (e.g. C), require the use of break statements in common constructions (such as the switch statement). The break statement denotes a limited form of unstructured control flow.

Ottenstein's Program Dependence Graph (PDG) based algorithm is currently the best known algorithm for intraprocedural slicing of structured programs [15], [28]. This algorithm was not designed to compute slices of unstructured programs. Consequently, it fails to include any goto statements in a slice because a goto statement is neither the source of data nor control dependence. The literature contains several algorithms that extend Ottenstein's algorithm to compute slices of unstructured programs [1], [2], [11], [20], [25]. These algorithms are discussed in Section 6.

This paper focuses on the computation of static, backward, intraprocedural, executable slices of unstructured programs, henceforth simply referred to as slices. It makes the following contributions:

  • (1)

    Framework: The paper introduces the framework shown in Fig. 1 for classifying slicers of unstructured programs along three independent dimensions: termination behaviour, syntactic structure, and size. It is shown that slicing algorithms for two of the eight combinations within the framework, though desirable, simply do not exist. This non-existence result is not due to the familiar non-computability results relating to slice minimality [35]. Rather, it is a direct result of the particular properties of unstructured programs and their slices.

  • (2)

    Slicing algorithms: The paper presents slicing algorithms1 for the remaining six combinations. These algorithms are built using common data structures, thereby facilitating examination of the tradeoffs present between the different possibilities. Finally, existing algorithms for slicing unstructured programs are placed into the framework. Interestingly, this reveals that, of the six possibilities, only three have been considered in previous slicing literature.

The rest of the paper is organized as follows. Section 2 contains some necessary definitions. Section 3 presents the three-dimensions of the framework. Section 4 proves that two classes within the framework do not exist. Section 5 presents the new slicing algorithms. Section 6 discusses related work, places it into the framework, and compares it to the algorithms from Section 5. Finally, conclusions are presented in Section 7.

Section snippets

Definitions

This section defines the properties of the abstract syntax trees, control-flow graphs, and program dependence graphs required in subsequent sections. The language considered is essentially C, however the focus of the paper is on intraprocedural control issues; thus, the definitions and examples consider primarily assignment, if-then-else, while, do-while, sequence, goto, and ‘special’ statements. Interprocedural control issues (e.g. those introduced by exit(), longjmp(), or exceptions), and

The framework for slices and slicers

This section introduces the framework for classifying slices and slicing algorithms for unstructured programs. The framework has three orthogonal dimensions: termination behaviour, syntax, and size. In short these dimensions are described as follows:

Termination Behaviour. A slice is strong iff it terminates when the original program does, it is weak otherwise.

Syntax. A slice is syntax preserving, or syntactic, iff it is obtained solely by deleting statements from the original program. It is

Non-existence of SPE and WPE slicers

This section demonstrates that two classifications within the framework simply do not exist. These two are WPE (Weak, syntax-Preserving, Ottenstein-Equal) slicers and SPE (Strong, syntax-Preserving, Ottenstein-Equal) slicers. This section also shows that there exist programs and slicing criteria for which WPE slices exist, but SPE slices do not. These results show that it is necessary to choose between these three desirable properties: one simply cannot have a slice which is both syntax and

Slicing algorithms

As shown in the previous section, a slicer has to produce either amorphous or Ottenstein-more slices because Proposition 1 demonstrates that WPE and SPE slicers do not exist. Furthermore, the existence of WAE and SAE algorithms, presented in this section, obviates the need to consider WAM and SAM algorithm. The latter can be produced by simply adding the Ottenstein slice taken with respect to any node not in the WAE or SAE slice, respectively, which has the effect of unnecessarily increasing

Classification and comparison with prior work

The problem of slicing unstructured programs is detailed and subtle. There has been a steady study of this problem in the literature [1], [2], [11], [20], [25]. In this section existing algorithms are described, placed in the framework introduced in Section 3, and compared to the algorithms presented in Section 5. At this point it is helpful to introduce the following four definitions.

Definition 22

(Language Types). A language is block-structured if it has complex statements that are built from other

Conclusion and future work

To better understand and classify slicing algorithms for programs with unstructured control-flow, this paper introduces a three-dimensional space; a framework that facilitates exploration of the trade-offs made when slicing unstructured programs. The choice of the most suitable slicing algorithm is impacted by the target application. For example, weak, amorphous, Ottenstein-equal slices may be most acceptable for debugging; strong, amorphous, Ottenstein-equal slices for re-engineering; and

References (35)

  • David Wendell Binkley et al.
  • David Wendell Binkley et al.

    A survey of empirical results on program slicing

    Advances in Computers

    (2004)
  • Gerardo Canfora et al.
  • Mark Harman et al.

    Amorphous program slicing

    Journal of Systems and Software

    (2003)
  • Arun Lakhotia et al.
  • Hiralal Agrawal, On slicing programs with jump statements, in: ACM SIGPLAN Conference on Programming Language Design...
  • Thomas Ball et al.

    Slicing programs with arbitrary control-flow

  • Jon Beck, David Eichmann, Program and interface slicing for reverse engineering, in: IEEE/ACM 15th Conference on...
  • David Wendell Binkley

    Precise executable interprocedural slices

    ACM Letters on Programming Languages and Systems

    (1993)
  • David Wendell Binkley, Computing amorphous program slices using dependence graphs and a data-flow model, in: ACM...
  • David Wendell Binkley et al.

    Program integration for languages with procedure calls

    ACM Transactions on Software Engineering and Methodology

    (1995)
  • Gerardo Canfora, Aniello Cimitile, Andrea De Lucia, G.A. Di Lucca, Software salvaging based on conditions, in:...
  • Jong-Deok Choi et al.

    Static slicing in the presence of goto statements

    ACM Transactions on Programming Languages and Systems

    (1994)
  • Aniello Cimitile et al.

    A specification driven slicing process for identifying reusable functions

    Software Maintenance: Research and Practice

    (1996)
  • Andrea De Lucia, Program slicing: methods and applications, in 1st IEEE International Workshop on Source Code Analysis...
  • Andrea De Lucia, Anna Rita Fasolino, Malcolm Munro. Understanding function behaviours through program slicing, in 4th...
  • Jeanne Ferrante et al.

    The program dependence graph and its use in optimization

    ACM Transactions on Programming Languages and Systems

    (1987)
  • Cited by (0)

    View full text