Method-level program dependence abstraction and its application to impact analysis
Introduction
Program dependence analysis has long been underlying a wide range of software analysis and testing techniques (e.g., Podgurski, Clarke, 1990, Bates, Horwitz, 1993, Santelices, Harrold, 2010, Baah, Podgurski, Harrold, 2010). While traditional approaches to dependence analysis offer fine-grained results (at statement or even instruction level) (Ferrante, Ottenstein, Warren, 1987, Horwitz, Reps, Binkley, 1990), they can face severe scalability and/or usability challenges, especially with modern software of growing sizes and/or increasing complexity (Jász, Beszédes, Gyimóthy, Rajlich, 2008, Acharya, Robinson, 2011), even more so when high precision is demanded with safety guarantee (Jackson, Rinard, 2000, Binkley, 2007).
On the other hand, for many software-engineering tasks where results of coarser granularity suffice, computing the finest-grained dependencies tends to be superfluous and ends up with low cost-effectiveness in particular application contexts—in this work, a (dependence) analysis is considered cost-effective (measured by the ratio of effectiveness to cost) if it produces effective (measured by accuracy, or precision alone if with constantly perfect recall) results relative to the total overhead it incurs (including analysis cost and human cost inspecting the analysis results) (Cai et al., 2016). One example is impact analysis (Bohner and Arnold, 1996), which analyzes the effects of specific program components, or changes to them, on the rest of the program to support software evolution and many other client analyses, including regression testing (Jász, Schrettner, Beszédes, Osztrogonác, Gyimóthy, 2012, Schrettner, Jász, Gergely, Beszédes, Gyimóthy, 2014) and fault localization (Ren et al., 2006). For such tasks as impact analysis, results are commonly given at method level (Law, Rothermel, 2003, Apiwattanapong, Orso, Harrold, 2005, Jász, 2010), where fine (e.g., statement-level) results can be too large to fully utilize (Acharya and Robinson, 2011a). In other contexts such as program understanding, method-level results are also more practical to explore than those of the finest granularity.
Driven by varying needs, different approaches have been explored to abstract program dependencies to coarser levels, including the program summary graph (Callahan, 1988) used to speed up interprocedural data-flow analysis, the object-oriented class-member dependence graph (Sun et al., 2010), the lattice of class and method dependence (Sun et al., 2011), the influence graph (Breech et al., 2006), that are all used for impact analysis, and the module dependence graph (Mancoridis et al., 1999) used for understanding and improving software structure. While these abstractions have been shown useful for their particular client analyses, they either capture only partial dependencies among methods (Breech, Tegtmeyer, Pollock, 2006, Sun, Li, Tao, Wen, Zhang, 2010) or dependencies at levels of classes (Sun et al., 2011) even files (Mancoridis et al., 1999), which can be overly coarse for many dependence-based tasks. More critically, most such approaches were not designed or fully evaluated as a general program dependence abstraction with respect to their accuracy (both precision and recall) against that of the original full model they approximate as ground truth.
Initially intended to replace traditional software dependencies (TSD) that are based on the system dependence graph (SDG) (Horwitz, Reps, Binkley, 1990, Jász, Beszédes, Gyimóthy, Rajlich, 2008), a method-level dependence abstraction, called the static-execute-after/before (SEA/SEB) (Jász et al., 2008), has been proposed recently. It abstracts dependencies among methods based on the interprocedural control flow graph (ICFG) and was reported to have little loss of precision with no loss of (100%) recall relative to static slicing based on the TSD model (i.e., the SDG). Later, the SEA was applied to static impact analysis shown more accurate than peer techniques (Tóth et al., 2010) and capable of improving regression test selection and prioritization (Schrettner et al., 2014).
However, previous studies on the accuracy of SEA/SEB either exclusively targeted procedural programs (Jász et al., 2008), or focused on backward dependencies based on the SEB (against backward slicing on top of the SDG) only (Jász, 2010). The remaining relevant studies addressed the accuracy of SEA-based forward dependencies, with some indeed using object-oriented programs and compared to forward slicing on the TSD model, yet the accuracy of such dependencies was assessed either not at the method level, but at class level only (Beszédes et al., 2007), or not relative to ground truth based on the TSD model, but those based on repository changes (Jász, Schrettner, Beszédes, Osztrogonác, Gyimóthy, 2012, Schrettner, Jász, Gergely, Beszédes, Gyimóthy, 2014) or programmer opinions (Tóth et al., 2010), and only in the specific application context of impact analysis.
While forward dependence analysis is required by many dependence-based applications, including static impact analysis that the SEA/SEB has been mainly applied to, the accuracy of this abstraction with respect to the TSD model, for forward dependencies and object-oriented programs in particular, remains unknown. In addition, it has not yet been explored whether and, if possible, how such program dependence abstractions would improve dynamic analysis, especially hybrid ones that utilize both static dependence and execution data of programs, such as hybrid dynamic impact analysis (Maia, Bittencourt, de Figueiredo, Guerrero, 2010, Cai, Santelices, 2014, Cai, Santelices, 2015).
In this paper, we present and study an alternative method-level dependence abstraction using a program representation called the method dependence graph (MDG). In comparison to the SDG-based TSD models which represent a program in terms of the data and control dependencies among all of its statements, an MDG serves also as a general graphical program representation, but models those dependencies at method level instead. The method-level dependencies could be simply obtained from a TSD model by lifting statements in the SDG up to corresponding (enclosing) methods. Yet, our MDG model represents these dependencies directly with statement-level details within methods (i.e. intraprocedural dependencies) abstracted away and, more importantly, does so with much less computation than constructing the SDG would require. The MDG computes transitive interprocedural dependencies in a context-insensitive manner with flow sensitivity dismissed for heap-object-induced data dependencies too. Thus, it is more efficient than TSD models (Horwitz, Reps, Binkley, 1990, Yu, Rajlich, 2001). On the other hand, this abstraction captures whole-program control and data dependencies, including those due to exception-driven control flows (Sinha and Harrold, 2000), thus it is more informative than coarser models like call graphs or ICFG. With the MDG, we attempt to not only address the above questions concerning the latest peer approach SEA/SEB, but also to attain a more cost-effective dependence abstraction over existing alternative options in general.
We implemented the MDG and applied it to both static and dynamic impact analysis for Java,1 which are all evaluated on seven non-trivial Java subject programs. We computed the accuracy of the MDG for approximating forward dependencies in general and the cost-effectiveness of its specific application in static impact analysis; we also compared the accuracy and efficiency of the MDG with respect to the TSD as ground truth against the SEA approach. To explore how the MDG abstraction can be applied to and benefit dynamic analysis, we developed on top of the MDG a variant of Diver, the most cost-effective hybrid dynamic impact analysis in the literature (Cai and Santelices, 2014), and compared its cost and effectiveness against the original Diver.
Our results show that the MDG can approximate the TSD model with perfect recall (100%) and generally high precision (85–90% mostly) with great efficiency, at least for forward dependencies at the method level. We also found that the MDG appears to be a more cost-effective option than the SEA for the same purpose, according to its significantly higher precision with better overall efficiency. The study also reveals that, for the object-oriented programs we used at least, SEA can be much less precise for approximating forward dependencies at method level than previously reported at class level for object-oriented programs (Beszédes et al., 2007) and at method-level for procedural programs (Jász, Beszédes, Gyimóthy, Rajlich, 2008, Jász, 2010). The study also demonstrated that the MDG as a dependence abstraction model can significantly enhance the cost-effectiveness of both the dependence-based static and dynamic impact analysis techniques over the respective existing best alternatives. More broadly, the MDG as a general program abstraction approach could benefit any applications that are based on program dependencies at method level (e.g., testing and debugging) and that utilize the dependencies at this or even higher levels (e.g., refactoring and performance optimizations).
In summary, the contributions of this paper are as follows:
- •
An approach to abstracting program dependencies to method level, called the MDG, that can approximate traditional software dependencies more accurately than existing options, including dependencies due to exception-driven control flows (Section 3).
- •
An implementation of the MDG and two application analyses based on it, a static impact analysis and a hybrid dynamic impact analysis (Section 4.1).
- •
An extensive evaluation of the MDG that assesses its accuracy relative to TSD-based static slicing in approximating method-level forward dependencies, and that demonstrates its application and benefits for both static and dynamic analyses (Section 4).
- •
The first substantial empirical evidence on the accuracy of the SEA with respect to TSD-based static slices on object-oriented software, and that of the performance contrast between such dependence abstractions and the full TSD model those abstractions approximate (Section 5).
Section snippets
Motivation and background
Our work was primarily motivated by improving the cost-effectiveness of forward dependence analysis that directly supports dependence-based impact analysis (Bohner, Arnold, 1996, Li, Sun, Leung, Zhang, 2013), among many other software-evolution tasks (Rajlich, 2014). The need for a better cost-effectiveness of impact analysis has been extensively investigated in previous studies (e.g., Apiwattanapong, Orso, Harrold, 2005, Rovegard, Angelis, Wohlin, 2008; de Souza and Redmiles, 2008; Acharya,
The method dependence graph
We first give an high-level description, followed by the definition, of the MDG, and then present in detail the algorithm for constructing the MDG on a given input program. We use both graph and code examples for illustration. This section focuses on presenting the MDG technique itself as a generic program dependence model, with details on its use in impact analysis as an example application (mainly the implementation of two impact-analysis techniques based on this model) deferred to Section 4.1
Empirical evaluation
We evaluated our technique as a dependence abstraction in general and its applications to both static and dynamic impact analysis techniques in particular. For that purpose, we performed two empirical studies. In the first study, we computed the precision and recall of forward dependence sets derived from the MDG against forward static slices, both at method level, and compared the same measures and efficiency against the SEA. In the second study, we built a dynamic impact analysis based on the
Study I: approximating method-level forward static dependencies
This section presents the main study, which addresses the accuracy of the MDG against the SEA relative to the TSD model. Since impact sets computed by the static impact analysis based on the MDG and SEA are also the method-level forward dependence sets used by the accuracy study, we simultaneously evaluate the accuracy of the two abstraction models and the static impact analysis techniques based on them. We also study the efficiency of all these approaches. This study aims to answer the first
Study II: improving hybrid dynamic impact analysis
This section presents our secondary study, which addresses the application of static dependence abstraction in hybrid dynamic analysis using the dynamic impact analysis as an example. This study concerns the efficiency benefits of the MDG abstraction and seeks to answer the last research question (RQ4). We also examine the hypothesis that the MDG-based dynamic impact analysis (Madger) gives as accurate impact sets as the Diver technique. That is, we expect that Madger improves querying
Related work
We mainly discuss three categories of previous work related to ours: program dependence abstraction, static impact analysis, and dynamic impact analysis.
Conclusions and future work
Despite of a number of dependence abstractions proposed to approximate the fine-grained and heavyweight TSD model, only few of them intended for a safe and efficient general approximation. A recent one of such abstractions, the SEA/SEB, has been developed, yet it remains unclear how accurately this approach can approximate forward dependencies for object-oriented software. Also, our intuition and initial application of the SEA/SEB suggest that it may not be sufficiently accurate for that
Acknowledgments
This work was partially supported by ONR Award N000141410037 to the University of Notre Dame and faculty startup fund from Washington State University to the first author.
Haipeng Cai received his Ph.D. from the University of Notre Dame, Notre Dame, IN. His research interests are in software engineering and systems with a focus on program analysis and its applications to the reliability and security of evolving software. He is currently a faculty member in the School of Electrical Engineering and Computer Science at Washington State University, Pullman, WA.
References (63)
- et al.
A comprehensive study of the predictive accuracy of dynamic change-impact analysis
J. Syst. Software
(2015) - et al.
Combining concept lattice with call graph for impact analysis
Adv. Eng. Software
(2012) - et al.
Static change impact analysis techniques: a comparative study
J. Syst. Software
(2015) - et al.
Practical Change impact analysis based on static program slicing for industrial software systems
Proceedings of IEEE/ACM International Conference on Software Engineering, Software Engineering in Practice Track
(2011) - et al.
Practical change impact analysis based on static program slicing for industrial software systems
Proceedings of the 33rd International Conference on Software Engineering
(2011) - et al.
Compilers: Principles, Techniques and Tools
(2006) - et al.
Efficient and precise dynamic impact analysis using execute-after sequences
Proc. of Intl. Conf. Softw. Eng.
(2005) - et al.
The probabilistic program dependence graph and its application to fault diagnosis
IEEE Trans. Software Eng.
(2010) - et al.
Incremental program testing using program dependence graphs
Proc. Symp. Principles of Program Lang.
(1993) - et al.
Computation of static execute after relation with applications to software maintenance
IEEE International Conference on Software Maintenance, 2007. ICSM 2007
(2007)
Source code analysis: a road map
2007 Future of Software Engineering
An Introduction to Software Change Impact Analysis
Integrating influence mechanisms into impact analysis for increased precision
Intl. Conf. Softw. Maint.
Diver: Precise dynamic impact analysis using dependence-based trace pruning
Proceedings of International Conference on Automated Software Engineering
A framework for cost-effective dependence-based dynamic impact analysis
2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)
Tracerjd: Generic trace-based dynamic dependence analysis with fine-grained logging
2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)
Diapro: unifying dynamic impact analyses for improved and variable cost-effectiveness
ACM Trans. Software Eng. Methodol. (TOSEM)
Estimating the accuracy of dynamic change-impact analysis using sensitivity analysis
Proceedings of International Conference on Software Security and Reliability
The program summary graph and flow-sensitive interprocedual data flow analysis
Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation
Ordinal Methods for Behavioral Data Analysis
Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact
Empirical Software Eng.
A practical Interprocedural Alias Analysis for an Optimizing/Parallelizing C Compiler
The program dependence graph and its use in optimization
ACM Trans. Prog. Lang. Syst.
Interprocedural slicing using dependence graphs
ACM Trans. Prog. Lang. Syst.
A dynamic impact analysis approach for object-oriented programs
Adv. Software Eng. Its Appl.
Software analysis: a roadmap
Proceedings of the Conference on The Future of Software Engineering
Static execute after algorithms as alternatives for impact analysis
Electrical Eng.
Static execute after/before as a replacement of traditional software dependencies
IEEE International Conference on Software Maintenance, 2008, ICSM 2008
Impact analysis using static execute after in webkit
2012 16th European Conference on Software Maintenance and Reengineering (CSMR)
Soot – a Java bytecode optimization framework
Cetus Users and Compiler Infrastructure Workshop
Cited by (0)
Haipeng Cai received his Ph.D. from the University of Notre Dame, Notre Dame, IN. His research interests are in software engineering and systems with a focus on program analysis and its applications to the reliability and security of evolving software. He is currently a faculty member in the School of Electrical Engineering and Computer Science at Washington State University, Pullman, WA.
Raul Santelices received his Ph.D. from Georgia Tech and was a faculty member at Notre Dame. His research interests are program analyses for software testing, debugging, and evolution. He currently works at Delphix on database virtualization technologies and distributed systems.