Debugging and maintaining pragmatically reused test suites

https://doi.org/10.1016/j.infsof.2018.05.001Get rights and content

Abstract

Context

Pragmatic software reuse is a common activity in industry, involving the reuse of software artifacts not designed to anticipate that reuse.

Objective

There are two key issues in such tasks that have not been previously explored. (1) Subtle bugs can be inserted due to mistakes on the part of a developer performing the pragmatic reuse. The reused code, integrated in the target system, should be (re-)validated there. But it is not clear what validation strategies would be employed by professional developers, and which of these strategies would be most effective to detect and to repair these inserted bugs. (2) Although semi-automated reuse of the associated test suite has been previously proposed as a strategy to detect such inserted bugs, it is unknown if the reused test suite would be maintainable in practice and how its maintenance characteristics would compare against alternative strategies.

Method

We present two empirical studies with industrial developers to address these open issues.

Results

We find that industrial developers use a few strategies including test suite reuse, but that test suite reuse is more reliably effective at discovering and repairing bugs inserted during pragmatic reuse. We also find that, in general, semi-automatically reused test suites are slightly more maintainable than manually reused test suites, in pragmatic reuse scenarios; specific situations can vary wildly however. Participants suggested specific extensions to tool support for semi-automated reuse of test suites.

Conclusions

While various validation strategies are employed by industrial developers in the context of pragmatic reuse, none is as reliable and effective as test case reuse at discovering and repairing bugs inserted during pragmatic reuse. Despite the fact that semi-automatically reused test cases contain non-trivial adaptive code, their maintainability is equivalent to or exceeds that of manually reused test suites. The approach could be improved, however, by adopting the suggestions of our participants to increase usability.

Introduction

Software reuse encourages the development of new software systems by leveraging existing artifacts. Reuse has long been promoted for its potential to increase the productivity of software developers, to reduce development time, and to decrease defect density [5], [7], [55], [72]. Most research into software reuse has focused on pre-planned approaches, such as object-oriented inheritance [13], [39], software components [55], [74], and software product lines [45], [63]. Unfortunately, pre-planned reuse has drawbacks: (1) prediction is difficult as to what artifacts should be built for reuse [77]; (2) it is too expensive to build all artifacts for reuse [4], [11]; and (3) artifacts cannot be reused intact in arbitrary contexts because of their embedded assumptions [25], [49].

Instead, software developers sometimes find themselves in situations where existing artifacts do not quite meet their needs. Rather than reimplementing the functionality of interest or refactoring the software where the functionality exists, developers often perform an ad hoc but pragmatic process of copy-and-modify on portions of the existing source code [31].1 Pragmatic reuse is known to be an industrially common practice [6], [11], [33], [36], [40], [44], [68], [78], [80], and it can be the disciplined action of a developer who has carefully weighed the risks involved [31], [80]. Nevertheless, pragmatic reuse could cause subtle bugs when constraints that are met within the originating system are violated in the target system [33].

To prevent silent introduction of bugs during pragmatic reuse, the developer has three known options: (1) use automated test generation techniques [e.g., 12, [22], [28], [61], 83]; (2) create new automated test suites (e.g., via JUnit or other xUnit family members2 [56]); or (3) pragmatically reuse and adapt test suites from the originating system. Note that automated test generation techniques require detailed knowledge of expected behavior of the system (as opposed to actual behavior), by means of: detailed formal specifications (which are unlikely to exist in industrial settings); manually supplied program invariants (which require expertise with the reused code that developers performing pragmatic reuse tasks lack [8], [31], [43], [50]); or feedback about whether actual behavior is correct or not (which again requires expertise with the reused code). Furthermore, new automated test suites are expensive to create manually [58], [75] and the developer’s superficial understanding of the reused code would cause such new test suites to be of questionable value. Apparently, reusing and adapting the original test suite is the best option. Currently, we have no evidence about two questions: (RQ1) What strategies would developers use to discover and repair errors inserted during pragmatic reuse? (RQ2) What strategies are most effective to discover and repair errors inserted during pragmatic reuse?

We conduct a semi-controlled experiment to address these two questions. We discover a set of alternative approaches to validate pragmatic reuse tasks, and we compare the merits of developers’ chosen approaches against reuse and adaptation of the original test suite. Our results show that, while developers do attempt a variety of other strategies, identification and repair of errors is more successful when test suite adaptation and reuse is pursued.

Given the value of reuse and adaptation of the original test suite, we previously considered how this process can be automated [51], [52], addressing the problem of leveraging the original test suite to validate pragmatically reused functionality. We semi-automatically reuse the portions of a test suite that are associated with the reused functionality. This permits faults to be detected that were introduced during pragmatic reuse, minimizing false alarms. The approach is reified in a tool called Skipper that uses a record-and-replay (R&R) technique to partially transform the originating system’s unit tests to only exercise the reused code, placing them in the target system: runtime information is serialized during an execution of the test suite on the originating system, and deserialized for use within the execution of the transformed test suite on the target system.

Although we demonstrated that Skipper was more effective than an alternative manual reuse approach [52], two additional open questions remain. (RQ3) Are Skipper’s R&R tests harder to maintain than manually written tests? (RQ4) How would developers validate pragmatically reused code in the absence of R&R tests?

To address these questions, we performed a case study into the maintainability of Skipper’s R&R test suites in the presence of various disruptive changes caused by the evolution of reused source code. The developers maintained manually-written or Skipper R&R tests, after we had applied various test-breaking changes to different portions of some reused source code; the changes were mostly behavior-modifying, requiring non-trivial understanding of the reused code to repair broken tests. We then interviewed the developers about the pros and cons of both approaches, and how they would test such reused code in the absence of Skipper. Our results indicate that developers can successfully maintain R&R tests; they would struggle with the same kinds of missing information as in manually created tests. Furthermore, developers see R&R tests as appropriate where creating manual tests is too difficult, particularly for complex or unfamiliar reused functionality.

The paper is structured as follows. Section 2 describes a running example in which a developer must validate pragmatically reused code. Section 3 details a semi-controlled experiment into validation strategies employed by professional developers during pragmatic reuse, addressing RQ1 and RQ2. Section 4 overviews background of our previous work on Skipper’s R&R test suites. Section 5 continues our running example, to demonstrate potential problems of maintaining Skipper’s R&R test suites. Section 6 describes our case study and interviews into the maintainability of Skipper R&R test suites in the presence of various disruptive changes, addressing RQ3 and RQ4. Section 7 discusses remaining issues. Section 8 describes related work.

Section snippets

Motivation: validating pragmatically reused features

Consider a scenario in which a developer is building a new application, which we refer to as “YouTube Recommender”, to recommend musical videos to users based on their musical taste. For instance, if a user prefers to listen to pop songs specifically, that application would recommend to him any YouTube videos of songs from that same genre. The developer happens to know of aTunes [18], [19], a system for managing and playing audio files. Among its various features, aTunes provides the related

Error insertion study

Pragmatically reused code may introduce subtle bugs due to poor changes or to incompatible constraints between the originating and target systems. In our first study, we address two research questions related to this concern.

  • RQ1: What strategies would developers use to discover and repair errors inserted during pragmatic reuse?

  • RQ2: What strategies are most effective to discover and repair errors inserted during pragmatic reuse?

As a preliminary step (Section 3.1), the first author explored a set

Background: the Skipper approach to reusing test cases

Pragmatic source code reuse involves three main steps: selecting high quality source code, adapting it for reuse, and integrating it into a target system [44]. During pragmatic reuse tasks, the reused source code is edited twice: first within the adaptation phase, and next within the integration phase. During the adaptation step, the developer may modify or discard portions of the reused code due to being irrelevant to the functionality of interest. Within the integration phase, the developer

Motivation: maintaining manual and R&R test suites

Long after the related artists feature has been integrated into YouTube Recommender, updated requirements demand changes to the reused code, which in turn force its corresponding reused tests to be changed as well. Fig. 7 shows the JUnit manually-written test that would need to be changed, had the developer relied on reusing manually-written JUnit tests. Fig. 6 shows the equivalent Skipper R&R test that would need to be changed to match changes to the reused source code.

Maintainability study

All R&R techniques record execution traces in which desired behavior was exhibited, comparing these traces against new ones to detect differences. When the system under test undergoes evolution, its execution traces (and consequently any R&R tests) become obsolete. Whether developers will be able to effectively deal with the needed changes—by modifying the R&R tests, replacing the obsolete test cases with manually-created tests, or some other strategy—remains an open question.

We investigate

Revisiting our research questions

RQ1: “What strategies would developers use to discover and repair errors inserted during pragmatic reuse?” The participants used five strategies when they were free to choose: (1) reading the reused code’s starting point; (2) reading the reused code’s dependencies from the starting point; (3) comparing the reused code to the original code; (4) writing new test cases; and (5) reusing test cases. Four developers attempted to read the starting point to reason about the code. Three attempted to

Related work

Software reuse can be classified into two categories: preplanned and pragmatic reuse. Preplanned reuse predicts future needs, typically providing functionality hidden behind some interface; object-oriented frameworks, software components, and product lines are all examples of this genre. The pros and cons of preplanned versus pragmatic reuse have been discussed at length [e.g., 33, 52] so we do not repeat such arguments here; neither approach is best suited to all situations. Cases of pragmatic

Conclusion

Pragmatic software reuse is a common activity in industry, involving the reuse of software artifacts not designed to anticipate that reuse. Unfortunately pragmatic reuse tasks can introduce subtle bugs, due to errors on the part of a developer performing the pragmatic reuse. We have performed a study to investigate what strategies industrial developers would employ in order to validate the reused code. We found that developers do try a number of different strategies: reading the reused code;

Acknowledgments

This work was supported in part by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada.

References (83)

  • R. Prieto-Díaz

    Status report: software reusability

    IEEE Softw.

    (1993)
  • F. Ricca et al.

    Web testware evolution

    Proceedings of the IEEE International Symposium on Web Systems Evolution

    (2013)
  • W.W. Agresti

    Software reuse: Developers’ experiences and perceptions

    J. Softw. Eng. Appl.

    (2011)
  • E. Alégroth et al.

    JAutomate: A tool for system-and acceptance-test automation

    Proceedings of the IEEE International Conference on Software Testing, Verification and Validation

    (2013)
  • S. Artzi, K. Sunghun, M.D. Ernst, ReCrash: Making software failures reproducible by preserving object states, in:...
  • T.J. Biggerstaff

    The library scaling problem and the limits of concrete component reuse

    Proceedings of the International Conference on Software Reuse

    (1994)
  • B. Boehm

    Managing software productivity and reuse

    Computer (Long Beach Calif)

    (1999)
  • J. Brandt et al.

    Two studies of opportunistic programming: interleaving web foraging, learning, and writing code

    Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems

    (2009)
  • F.P. Brooks Jr.

    No silver bullet: essence and accidents of software engineering

    Computer (Long Beach Calif)

    (1987)
  • J.-M. Burkhardt et al.

    An empirical study of software reuse by experts in object-oriented design

    Proceedings of the IFIP TC13 International Conference on Human–Computer Interactaction

    (1995)
  • G. Canfora et al.

    Software salvaging based on conditions

    Proceedings of the IEEE International Conference on Software Maintenance

    (1994)
  • M. Ceccato et al.

    Do automatically generated test cases make debugging easier? an experimental assessment of debugging effectiveness and efficiency

    ACM Trans. Softw. Eng. Method.

    (2015)
  • J.R. Cordy

    Comprehending reality: practical barriers to industrial adoption of software maintenance automation

    Proceedings of the IEEE International Workshop on Program Comprehenension

    (2003)
  • C. Csallner et al.

    JCrasher: an automatic robustness tester for Java

    Softw.: Pract. Exp.

    (2004)
  • O.-J. Dahl et al.

    SIMULA: an ALGOL-based simulation language

    Commun. ACM

    (1966)
  • E. Duala-Ekoko et al.

    Clone region descriptors: representing and tracking duplication in source code

    ACM Trans. Softw. Eng. Method.

    (2010)
  • S. Elbaum et al.

    Carving and replaying differential unit test cases from system test cases

    IEEE Trans. Softw. Eng.

    (2009)
  • H. Evans

    Why Object Serialization is Inappropriate for Providing Persistence in Java

    (2000)
  • S. Fischer et al.

    Enhancing clone-and-own with systematic reuse for developing software variants

    Proceedings of the IEEE International Conference on Software Maintenance and Evolution

    (2014)
  • fleax, aTunes: Cross-platform player and audio manager, http://www.atunes.sourceforge.net/ (v1.9.0, v1.11.2, v2.0.0,...
  • fleax, aTunes: Cross-platform player and audio manager, 2010, http://www.sourceforge.net/projects/atunes...
  • B. Fluri et al.

    Fine-grained analysis of change couplings

    Proceedings of the IEEE International Workshop on Source Code Analysis and Manipulation

    (2005)
  • M. Fowler

    Refactoring: Improving the Design of Existing Code

    (1999)
  • G. Fraser et al.

    Whole test suite generation

    IEEE Trans. Software Eng.

    (2013)
  • G. Fraser et al.

    Does automated white-box test generation really help software testers?

    Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis

    (2013)
  • G. Fraser et al.

    Does automated unit test generation really help software testers? a controlled empirical study

    ACM Trans. Softw. Eng. Method.

    (2015)
  • D. Garlan et al.

    Architectural mismatch: why reuse is so hard

    IEEE Softw.

    (1995)
  • A. Gupta et al.

    A case study comparing defect profiles of a reused framework and of applications reusing it

    Empir. Softw. Eng.

    (2009)
  • C. Gupta et al.

    A dynamic approach to estimate change impact using type of change propagation

    J. Inf. Process. Syst.

    (2010)
  • M. Harman et al.

    A theoretical and empirical study of search based testing: local, global and hybrid search

    IEEE Trans. Softw. Eng.

    (2010)
  • M. Harman et al.

    Genetic programming for reverse engineering

    Proceedings of the Working Conference on Reverse Engineering

    (2013)
  • R. Holmes

    Unanticipated reuse of large-scale software features

    Proceedings of the ACM/IEEE International Conference on Software Engineering

    (2006)
  • R. Holmes et al.

    Supporting the investigation and planning of pragmatic reuse tasks

    Proceedings of the ACM/IEEE International Conference on Software Engineering

    (2007)
  • R. Holmes et al.

    Lightweight, semi-automated enactment of pragmatic-reuse plans

    Proceedings of the International Conference on Software Reuse

    (2008)
  • R. Holmes et al.

    Systematizing pragmatic software reuse

    ACM Trans. Softw. Eng. Method.

    (2012)
  • A. Hunt et al.

    The Pragmatic Programmer: From Journeyman to Master

    (1999)
  • P. Jack, N. Levitt, Heritrix, http://crawler.archive.org/...
  • S. Jansen et al.

    Pragmatic and opportunistic reuse in innovative start-up companies

    IEEE Softw.

    (2008)
  • Java Object Serialization Specification, Java object serialization specification, 2010,...
  • H. Jaygarl et al.

    OCAT: object capture-based automated testing

    Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis

    (2010)
  • R.E. Johnson et al.

    Designing reuseable classes

    J. Object-Orient. Program.

    (1988)
  • View full text