Elsevier

Journal of Systems and Software

Volume 122, December 2016, Pages 311-326
Journal of Systems and Software

Method-level program dependence abstraction and its application to impact analysis

https://doi.org/10.1016/j.jss.2016.09.048Get rights and content

Highlights

  • We develop a new program abstraction that directly models dependencies among methods.

  • We assess the accuracy of the new abstraction model for forward dependence analysis.

  • The proposed approach achieves much greater cost-effectiveness than peer options.

  • The new dependence abstraction improves both static and dynamic impact analysis.

  • The new abstraction is much more cost-effective than traditional model at method level.

Abstract

The traditional software dependence (TSD) model based on the system dependence graph enables precise fine-grained program dependence analysis that supports a range of software analysis and testing tasks. However, this model often faces scalability challenges that hinder its applications as it can be unnecessarily expensive, especially for client analyses where coarser results suffice.

This paper revisits the static-execute-after (SEA), the most recent TSD abstraction approach, for its accuracy in approximating method-level forward dependencies relative to the TSD model. It also presents an alternative approach called the method dependence graph (MDG), compares its accuracy against the SEA, and explores applications of the dependence abstraction in the context of dependence-based impact analysis.

Unlike the SEA approach which roughly approximates dependencies via method-level control flows only, the MDG incorporates more fine-grained analyses of control and data dependencies to avoid being overly conservative. Meanwhile, the MDG avoids being overly expensive by ignoring context sensitivity in transitive interprocedural dependence computation and flow sensitivity in computing data dependencies induced by heap objects.

Our empirical studies revealed that (1) the MDG can approximate the TSD model safely, for method-level forward dependence at least, at much lower cost yet with low loss of precision, (2) for the same purpose, while both are safe and more efficient than the TSD model, the MDG can achieve higher precision than the SEA with better efficiency, both significantly, and (3) as example applications, the MDG can greatly enhance the cost-effectiveness of both static and dynamic impact analysis techniques that are based on program dependence analysis.

More generally, as a program dependence representation, the MDG provides a viable solution to many challenges that can be reduced to balancing cost and effectiveness faced by dependence-based tasks other than impact analysis.

Introduction

Program dependence analysis has long been underlying a wide range of software analysis and testing techniques (e.g., Podgurski, Clarke, 1990, Bates, Horwitz, 1993, Santelices, Harrold, 2010, Baah, Podgurski, Harrold, 2010). While traditional approaches to dependence analysis offer fine-grained results (at statement or even instruction level) (Ferrante, Ottenstein, Warren, 1987, Horwitz, Reps, Binkley, 1990), they can face severe scalability and/or usability challenges, especially with modern software of growing sizes and/or increasing complexity (Jász, Beszédes, Gyimóthy, Rajlich, 2008, Acharya, Robinson, 2011), even more so when high precision is demanded with safety guarantee (Jackson, Rinard, 2000, Binkley, 2007).

On the other hand, for many software-engineering tasks where results of coarser granularity suffice, computing the finest-grained dependencies tends to be superfluous and ends up with low cost-effectiveness in particular application contexts—in this work, a (dependence) analysis is considered cost-effective (measured by the ratio of effectiveness to cost) if it produces effective (measured by accuracy, or precision alone if with constantly perfect recall) results relative to the total overhead it incurs (including analysis cost and human cost inspecting the analysis results) (Cai et al., 2016). One example is impact analysis (Bohner and Arnold, 1996), which analyzes the effects of specific program components, or changes to them, on the rest of the program to support software evolution and many other client analyses, including regression testing (Jász, Schrettner, Beszédes, Osztrogonác, Gyimóthy, 2012, Schrettner, Jász, Gergely, Beszédes, Gyimóthy, 2014) and fault localization (Ren et al., 2006). For such tasks as impact analysis, results are commonly given at method level (Law, Rothermel, 2003, Apiwattanapong, Orso, Harrold, 2005, Jász, 2010), where fine (e.g., statement-level) results can be too large to fully utilize (Acharya and Robinson, 2011a). In other contexts such as program understanding, method-level results are also more practical to explore than those of the finest granularity.

Driven by varying needs, different approaches have been explored to abstract program dependencies to coarser levels, including the program summary graph (Callahan, 1988) used to speed up interprocedural data-flow analysis, the object-oriented class-member dependence graph (Sun et al., 2010), the lattice of class and method dependence (Sun et al., 2011), the influence graph (Breech et al., 2006), that are all used for impact analysis, and the module dependence graph (Mancoridis et al., 1999) used for understanding and improving software structure. While these abstractions have been shown useful for their particular client analyses, they either capture only partial dependencies among methods (Breech, Tegtmeyer, Pollock, 2006, Sun, Li, Tao, Wen, Zhang, 2010) or dependencies at levels of classes (Sun et al., 2011) even files (Mancoridis et al., 1999), which can be overly coarse for many dependence-based tasks. More critically, most such approaches were not designed or fully evaluated as a general program dependence abstraction with respect to their accuracy (both precision and recall) against that of the original full model they approximate as ground truth.

Initially intended to replace traditional software dependencies (TSD) that are based on the system dependence graph (SDG) (Horwitz, Reps, Binkley, 1990, Jász, Beszédes, Gyimóthy, Rajlich, 2008), a method-level dependence abstraction, called the static-execute-after/before (SEA/SEB) (Jász et al., 2008), has been proposed recently. It abstracts dependencies among methods based on the interprocedural control flow graph (ICFG) and was reported to have little loss of precision with no loss of (100%) recall relative to static slicing based on the TSD model (i.e., the SDG). Later, the SEA was applied to static impact analysis shown more accurate than peer techniques (Tóth et al., 2010) and capable of improving regression test selection and prioritization (Schrettner et al., 2014).

However, previous studies on the accuracy of SEA/SEB either exclusively targeted procedural programs (Jász et al., 2008), or focused on backward dependencies based on the SEB (against backward slicing on top of the SDG) only (Jász, 2010). The remaining relevant studies addressed the accuracy of SEA-based forward dependencies, with some indeed using object-oriented programs and compared to forward slicing on the TSD model, yet the accuracy of such dependencies was assessed either not at the method level, but at class level only (Beszédes et al., 2007), or not relative to ground truth based on the TSD model, but those based on repository changes (Jász, Schrettner, Beszédes, Osztrogonác, Gyimóthy, 2012, Schrettner, Jász, Gergely, Beszédes, Gyimóthy, 2014) or programmer opinions (Tóth et al., 2010), and only in the specific application context of impact analysis.

While forward dependence analysis is required by many dependence-based applications, including static impact analysis that the SEA/SEB has been mainly applied to, the accuracy of this abstraction with respect to the TSD model, for forward dependencies and object-oriented programs in particular, remains unknown. In addition, it has not yet been explored whether and, if possible, how such program dependence abstractions would improve dynamic analysis, especially hybrid ones that utilize both static dependence and execution data of programs, such as hybrid dynamic impact analysis (Maia, Bittencourt, de Figueiredo, Guerrero, 2010, Cai, Santelices, 2014, Cai, Santelices, 2015).

In this paper, we present and study an alternative method-level dependence abstraction using a program representation called the method dependence graph (MDG). In comparison to the SDG-based TSD models which represent a program in terms of the data and control dependencies among all of its statements, an MDG serves also as a general graphical program representation, but models those dependencies at method level instead. The method-level dependencies could be simply obtained from a TSD model by lifting statements in the SDG up to corresponding (enclosing) methods. Yet, our MDG model represents these dependencies directly with statement-level details within methods (i.e. intraprocedural dependencies) abstracted away and, more importantly, does so with much less computation than constructing the SDG would require. The MDG computes transitive interprocedural dependencies in a context-insensitive manner with flow sensitivity dismissed for heap-object-induced data dependencies too. Thus, it is more efficient than TSD models (Horwitz, Reps, Binkley, 1990, Yu, Rajlich, 2001). On the other hand, this abstraction captures whole-program control and data dependencies, including those due to exception-driven control flows (Sinha and Harrold, 2000), thus it is more informative than coarser models like call graphs or ICFG. With the MDG, we attempt to not only address the above questions concerning the latest peer approach SEA/SEB, but also to attain a more cost-effective dependence abstraction over existing alternative options in general.

We implemented the MDG and applied it to both static and dynamic impact analysis for Java,1 which are all evaluated on seven non-trivial Java subject programs. We computed the accuracy of the MDG for approximating forward dependencies in general and the cost-effectiveness of its specific application in static impact analysis; we also compared the accuracy and efficiency of the MDG with respect to the TSD as ground truth against the SEA approach. To explore how the MDG abstraction can be applied to and benefit dynamic analysis, we developed on top of the MDG a variant of Diver, the most cost-effective hybrid dynamic impact analysis in the literature (Cai and Santelices, 2014), and compared its cost and effectiveness against the original Diver.

Our results show that the MDG can approximate the TSD model with perfect recall (100%) and generally high precision (85–90% mostly) with great efficiency, at least for forward dependencies at the method level. We also found that the MDG appears to be a more cost-effective option than the SEA for the same purpose, according to its significantly higher precision with better overall efficiency. The study also reveals that, for the object-oriented programs we used at least, SEA can be much less precise for approximating forward dependencies at method level than previously reported at class level for object-oriented programs (Beszédes et al., 2007) and at method-level for procedural programs (Jász, Beszédes, Gyimóthy, Rajlich, 2008, Jász, 2010). The study also demonstrated that the MDG as a dependence abstraction model can significantly enhance the cost-effectiveness of both the dependence-based static and dynamic impact analysis techniques over the respective existing best alternatives. More broadly, the MDG as a general program abstraction approach could benefit any applications that are based on program dependencies at method level (e.g., testing and debugging) and that utilize the dependencies at this or even higher levels (e.g., refactoring and performance optimizations).

In summary, the contributions of this paper are as follows:

  • An approach to abstracting program dependencies to method level, called the MDG, that can approximate traditional software dependencies more accurately than existing options, including dependencies due to exception-driven control flows (Section 3).

  • An implementation of the MDG and two application analyses based on it, a static impact analysis and a hybrid dynamic impact analysis (Section 4.1).

  • An extensive evaluation of the MDG that assesses its accuracy relative to TSD-based static slicing in approximating method-level forward dependencies, and that demonstrates its application and benefits for both static and dynamic analyses (Section 4).

  • The first substantial empirical evidence on the accuracy of the SEA with respect to TSD-based static slices on object-oriented software, and that of the performance contrast between such dependence abstractions and the full TSD model those abstractions approximate (Section 5).

Section snippets

Motivation and background

Our work was primarily motivated by improving the cost-effectiveness of forward dependence analysis that directly supports dependence-based impact analysis (Bohner, Arnold, 1996, Li, Sun, Leung, Zhang, 2013), among many other software-evolution tasks (Rajlich, 2014). The need for a better cost-effectiveness of impact analysis has been extensively investigated in previous studies (e.g., Apiwattanapong, Orso, Harrold, 2005, Rovegard, Angelis, Wohlin, 2008; de Souza and Redmiles, 2008; Acharya,

The method dependence graph

We first give an high-level description, followed by the definition, of the MDG, and then present in detail the algorithm for constructing the MDG on a given input program. We use both graph and code examples for illustration. This section focuses on presenting the MDG technique itself as a generic program dependence model, with details on its use in impact analysis as an example application (mainly the implementation of two impact-analysis techniques based on this model) deferred to Section 4.1

Empirical evaluation

We evaluated our technique as a dependence abstraction in general and its applications to both static and dynamic impact analysis techniques in particular. For that purpose, we performed two empirical studies. In the first study, we computed the precision and recall of forward dependence sets derived from the MDG against forward static slices, both at method level, and compared the same measures and efficiency against the SEA. In the second study, we built a dynamic impact analysis based on the

Study I: approximating method-level forward static dependencies

This section presents the main study, which addresses the accuracy of the MDG against the SEA relative to the TSD model. Since impact sets computed by the static impact analysis based on the MDG and SEA are also the method-level forward dependence sets used by the accuracy study, we simultaneously evaluate the accuracy of the two abstraction models and the static impact analysis techniques based on them. We also study the efficiency of all these approaches. This study aims to answer the first

Study II: improving hybrid dynamic impact analysis

This section presents our secondary study, which addresses the application of static dependence abstraction in hybrid dynamic analysis using the dynamic impact analysis as an example. This study concerns the efficiency benefits of the MDG abstraction and seeks to answer the last research question (RQ4). We also examine the hypothesis that the MDG-based dynamic impact analysis (Madger) gives as accurate impact sets as the Diver technique. That is, we expect that Madger improves querying

Related work

We mainly discuss three categories of previous work related to ours: program dependence abstraction, static impact analysis, and dynamic impact analysis.

Conclusions and future work

Despite of a number of dependence abstractions proposed to approximate the fine-grained and heavyweight TSD model, only few of them intended for a safe and efficient general approximation. A recent one of such abstractions, the SEA/SEB, has been developed, yet it remains unclear how accurately this approach can approximate forward dependencies for object-oriented software. Also, our intuition and initial application of the SEA/SEB suggest that it may not be sufficiently accurate for that

Acknowledgments

This work was partially supported by ONR Award N000141410037 to the University of Notre Dame and faculty startup fund from Washington State University to the first author.

Haipeng Cai received his Ph.D. from the University of Notre Dame, Notre Dame, IN. His research interests are in software engineering and systems with a focus on program analysis and its applications to the reliability and security of evolving software. He is currently a faculty member in the School of Electrical Engineering and Computer Science at Washington State University, Pullman, WA.

References (63)

  • H. Cai et al.

    A comprehensive study of the predictive accuracy of dynamic change-impact analysis

    J. Syst. Software

    (2015)
  • B. Li et al.

    Combining concept lattice with call graph for impact analysis

    Adv. Eng. Software

    (2012)
  • X. Sun et al.

    Static change impact analysis techniques: a comparative study

    J. Syst. Software

    (2015)
  • M. Acharya et al.

    Practical Change impact analysis based on static program slicing for industrial software systems

    Proceedings of IEEE/ACM International Conference on Software Engineering, Software Engineering in Practice Track

    (2011)
  • M. Acharya et al.

    Practical change impact analysis based on static program slicing for industrial software systems

    Proceedings of the 33rd International Conference on Software Engineering

    (2011)
  • A.V. Aho et al.

    Compilers: Principles, Techniques and Tools

    (2006)
  • T. Apiwattanapong et al.

    Efficient and precise dynamic impact analysis using execute-after sequences

    Proc. of Intl. Conf. Softw. Eng.

    (2005)
  • G.K. Baah et al.

    The probabilistic program dependence graph and its application to fault diagnosis

    IEEE Trans. Software Eng.

    (2010)
  • S. Bates et al.

    Incremental program testing using program dependence graphs

    Proc. Symp. Principles of Program Lang.

    (1993)
  • A. Beszédes et al.

    Computation of static execute after relation with applications to software maintenance

    IEEE International Conference on Software Maintenance, 2007. ICSM 2007

    (2007)
  • D. Binkley

    Source code analysis: a road map

    2007 Future of Software Engineering

    (2007)
  • S.A. Bohner et al.

    An Introduction to Software Change Impact Analysis

    (1996)
  • B. Breech et al.

    Integrating influence mechanisms into impact analysis for increased precision

    Intl. Conf. Softw. Maint.

    (2006)
  • H. Cai et al.

    Diver: Precise dynamic impact analysis using dependence-based trace pruning

    Proceedings of International Conference on Automated Software Engineering

    (2014)
  • H. Cai et al.

    A framework for cost-effective dependence-based dynamic impact analysis

    2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

    (2015)
  • H. Cai et al.

    Tracerjd: Generic trace-based dynamic dependence analysis with fine-grained logging

    2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

    (2015)
  • H. Cai et al.

    Diapro: unifying dynamic impact analyses for improved and variable cost-effectiveness

    ACM Trans. Software Eng. Methodol. (TOSEM)

    (2016)
  • H. Cai et al.

    Estimating the accuracy of dynamic change-impact analysis using sensitivity analysis

    Proceedings of International Conference on Software Security and Reliability

    (2014)
  • D. Callahan

    The program summary graph and flow-sensitive interprocedual data flow analysis

    Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation

    (1988)
  • N. Cliff

    Ordinal Methods for Behavioral Data Analysis

    (1996)
  • H. Do et al.

    Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact

    Empirical Software Eng.

    (2005)
  • M. Emami

    A practical Interprocedural Alias Analysis for an Optimizing/Parallelizing C Compiler

    (1993)
  • J. Ferrante et al.

    The program dependence graph and its use in optimization

    ACM Trans. Prog. Lang. Syst.

    (1987)
  • Group, A. R., 2005. Java Architecture for Bytecode Analysis. http://gamma.cc.gatech.edu/jaba.html. [Online; accessed...
  • S. Horwitz et al.

    Interprocedural slicing using dependence graphs

    ACM Trans. Prog. Lang. Syst.

    (1990)
  • L. Huang et al.

    A dynamic impact analysis approach for object-oriented programs

    Adv. Software Eng. Its Appl.

    (2008)
  • D. Jackson et al.

    Software analysis: a roadmap

    Proceedings of the Conference on The Future of Software Engineering

    (2000)
  • J. Jász

    Static execute after algorithms as alternatives for impact analysis

    Electrical Eng.

    (2010)
  • J. Jász et al.

    Static execute after/before as a replacement of traditional software dependencies

    IEEE International Conference on Software Maintenance, 2008, ICSM 2008

    (2008)
  • J. Jász et al.

    Impact analysis using static execute after in webkit

    2012 16th European Conference on Software Maintenance and Reengineering (CSMR)

    (2012)
  • P. Lam et al.

    Soot – a Java bytecode optimization framework

    Cetus Users and Compiler Infrastructure Workshop

    (2011)
  • Cited by (0)

    Haipeng Cai received his Ph.D. from the University of Notre Dame, Notre Dame, IN. His research interests are in software engineering and systems with a focus on program analysis and its applications to the reliability and security of evolving software. He is currently a faculty member in the School of Electrical Engineering and Computer Science at Washington State University, Pullman, WA.

    Raul Santelices received his Ph.D. from Georgia Tech and was a faculty member at Notre Dame. His research interests are program analyses for software testing, debugging, and evolution. He currently works at Delphix on database virtualization technologies and distributed systems.

    View full text