Investigations about replication of empirical studies in software engineering: A systematic mapping study

https://doi.org/10.1016/j.infsof.2015.02.001Get rights and content

Abstract

Context

Two recent mapping studies which were intended to verify the current state of replication of empirical studies in Software Engineering (SE) identified two sets of studies: empirical studies actually reporting replications (published between 1994 and 2012) and a second group of studies that are concerned with definitions, classifications, processes, guidelines, and other research topics or themes about replication work in empirical software engineering research (published between 1996 and 2012).

Objective

In this current article, our goal is to analyze and discuss the contents of the second set of studies about replications to increase our understanding of the current state of the work on replication in empirical software engineering research.

Method

We applied the systematic literature review method to build a systematic mapping study, in which the primary studies were collected by two previous mapping studies covering the period 1996–2012 complemented by manual and automatic search procedures that collected articles published in 2013.

Results

We analyzed 37 papers reporting studies about replication published in the last 17 years. These papers explore different topics related to concepts and classifications, presented guidelines, and discuss theoretical issues that are relevant for our understanding of replication in our field. We also investigated how these 37 papers have been cited in the 135 replication papers published between 1994 and 2012.

Conclusions

Replication in SE still lacks a set of standardized concepts and terminology, which has a negative impact on the replication work in our field. To improve this situation, it is important that the SE research community engage on an effort to create and evaluate taxonomy, frameworks, guidelines, and methodologies to fully support the development of replications.

Introduction

Replications of empirical studies play important roles in the construction of knowledge. According to Schmidt, a replication that demonstrates the same findings obtained by other experiment “… is the proof that the experiment reflects knowledge that can be separated from the specific circumstances (such as time, place, or persons) under which it was gained” [2]. Replications are also important to identify the range of conditions under which findings from one experiment hold and the possible exceptions [3].

Considering the importance of replications in the advance of science in general, Schmidt [2] expected that one would find a body of knowledge that provide clear and unambiguous definitions for central questions like ‘what exactly is a replication experiment?’, ‘what exactly is a successful replication?’, and ‘what are all types of replication and their corresponding roles?’. Furthermore, one would expect to find empirically evaluated guidelines on how to perform and report replications complementing existing guidelines to perform experiments and other empirical studies.

However, Schmidt argues that this is not true for most of scientific disciplines [2]. The published replications and the theoretical works about replication research have not used clear-cut definitions of terms and concepts, and there is no generally accepted taxonomy to distinguish between types of replications and their roles in generating scientific knowledge. According to Schmidt, “the word replication is used as a collective term to describe various meanings in different contexts” [2]. Carver et al. [4] report that a similar situation is also found in empirical software engineering research. Our findings reinforce the need to address these issues in software engineering.

The goal of this article is to contribute to the advance of the replication work in empirical software engineering. We expect that the results presented in our study will stimulate and provide support for a debate in the scientific community to central questions related to replications. Although we do not expect to fully answer these questions in this article, we believe our work will contribute to some of the answers:

  • What should be considered a replication?

  • What should be considered a successful replication?

  • What are the types of replications and their functions?

  • How should replications be performed?

  • How should replications be reported?

In a recent mapping study, da Silva et al. [5] studied the current state of published replications of empirical studies in software engineering research. The mapping study selected and analyzed papers reporting replications of empirical studies published until 2010 and also found a second set of studies addressing several topics about replication work. The papers about replication were not further analyzed by da Silva et al. [5]. More recently, the same research group performed an update of the mapping study previously published, covering material published in 2011 and 2012 [6]. Also in this update, the same type of papers about replication were collected and saved for future analysis.

In this current article, we analyze and discuss the content of the papers about replications (hereafter referred to as ABO papers) published in the Software Engineering literature to increase our understanding about the current state of the work on replication in empirical software engineering research. We expect that this analysis will shed some light in the issues related to the five questions raised above.

Our goal is twofold. First, to classify the set of ABO studies in Software Engineering into categories related to the topics in which the articles focused on (recommendations, frameworks, guidelines, among others). Second, to analyze how the replications performed between 1994 and 2012 have cited and used the ABO studies, in order to verify the impact of these studies in recent replication work.

The set of papers analyzed in this article is composed of those selected by da Silva et al. [5], those found in the update of the mapping study [6], and papers found through a search process performed to cover work published in 2013. We systematically structured and analyzed data extracted from these articles to answer the following six research questions:

  • RQ1: What was the evolution in the number of ABO studies over the years?

  • RQ2: Which individuals and organizations are most active in publishing ABO studies?

  • RQ3: How the ABO studies define replication?

  • RQ4: What topics or themes have been addressed by the ABO studies?

  • RQ5: Which ABO studies are cited by the papers that reported replications?

  • RQ6: How the results or propositions presented in the cited ABO studies have been used in papers that report replications?

This article is organized as follows. In Section 2, we present a background with discussion on concepts and related works. In Section 3, we present the method used in this study. In Section 4, we present a comprehensive set of results of our review and in Section 5 we discuss these results. Finally, in Section 6, we present some conclusions and proposals for future works.

Section snippets

Background and related work

As briefly discussed in the Introduction, there is little agreement about nomenclature and definition of concepts about replication in many empirical sciences and also in empirical software engineering. In this article, we expect to shed some light on the debate about some theoretical and practical issues related to performing, classifying, and reporting replications in SE research. In this section, we start by providing some preliminary definitions, we then briefly describe the two mapping

Method

The scientific literature differentiates at least two types of systematic reviews: conventional systematic reviews and mapping studies [13]. The former aims to aggregate results about the effectiveness of a treatment, intervention, or technology, and therefore seeks answers to causal or relational research questions (e.g., Is intervention I on population P more effective for obtaining outcome O in context C than comparison treatment C?). The latter, aims to identify all research related to a

Results

Our results naturally fall into two groups. The first group of research questions (RQ1–RQ4) deals with the descriptive nature of ABO studies and the second group of research questions (RQ5 and RQ6) describe how the papers reporting replications use of the results or propositions presented in the ABO studies.

Discussions

Our goal, in this review of research about replications in empirical software engineering, is to plot the general landscape of the body of work about replications and to complement the review of replications produced by da Silva [5] and Bezerra and da Silva [6]. In this section, we discuss our results, their implications for software engineering research, and the limitations of our work. We also briefly discuss the results of the RESER workshop with respect to the published studies about

Conclusions

In this article, we presented a review of 37 papers reporting studies on concepts, classifications, guidelines, frameworks, and other topics about replication in Software Engineering published between 1996 and 2013. We used the papers selected from two mapping studies that covered the period between 1996 and 2012, and from a search procedure performed by the authors to cover the year 2013. Over 67% (25/37) of the papers are published in conferences and workshops (19 full and 6 short papers).

Acknowledgments

Professor Fabio Q. B. da Silva holds a research grant from the Brazilian National Research Council (CNPq), process #314523/2009-0. Cleyton V. C. de Magalhães and Ronnie E. S. Santos are both master students at the Center of Informatics of UFPE where they receive a scholarship from CAPES and FACEPE (process #IBPG-0651-1.03/12), respectively. We would like to thank the anonymous reviewers of this article for their comments and constructive criticisms that lead to important improvements in the

References (25)

  • O.S. Gómez et al.

    Understanding replication of experiments in software engineering: a classification

    Inf. Softw. Technol.

    (2014)
  • C.V. de Magalhães et al.

    Investigations about replication of empirical studies in software engineering: preliminary findings from a mapping study

  • S. Schmidt

    Shall we really do it again? The powerful concept of replication is neglected in the social sciences

    Rev. Gen. Psychol.

    (2009)
  • R.M. Lindsay et al.

    The design of replicated studies

    Am. Stat.

    (1993)
  • J.C. Carver et al.

    Replications of software engineering experiments

    Empirical Softw. Eng.

    (2014)
  • F.Q. da Silva et al.

    Replication of empirical studies in software engineering research: a systematic mapping study

    Empirical Softw. Eng.

    (2012)
  • R. Bezerra, F.Q. da Silva, Replication of Empirical Studies in Software Engineering: A Systematic Mapping Study...
  • M.A. La Sorte

    Replication as a verification technique in survey research: a paradigm

    Sociol. Quart.

    (1972)
  • J. Gould et al.

    Dictionary of the Social Sciences

    (1964)
  • J. Daly et al.

    Veri cation of results in software maintenance through external replication

  • C.D. Knutson et al.

    Report from the 1st international workshop on replication in empirical software engineering research (RESER 2010)

    ACM SIGSOFT Softw. Eng. Notes

    (2010)
  • J.L. Krein et al.

    Report from the 2nd international workshop on replication in empirical software engineering research (RESER 2011)

    ACM SIGSOFT Softw. Eng. Notes

    (2012)
  • Cited by (31)

    • VALIDATE: A deep dive into vulnerability prediction datasets

      2024, Information and Software Technology
    • Investigating replication challenges through multiple replications of an experiment

      2022, Information and Software Technology
      Citation Excerpt :

      For a piece of knowledge to be considered valid, a phenomenon must be replicable and observable under different contexts [2]. Over the years, many guidelines, frameworks, and techniques have been developed to guide researchers to perform replications in Software Engineering [3–8]. Furthermore, to a limited extent, experiment papers have been published providing their assets for replication in the form of a replication package1 [9,10].

    • The role and value of replication in empirical software engineering results

      2018, Information and Software Technology
      Citation Excerpt :

      Second, their focus covers all software engineering, whereas we are interested in software project effort prediction and pair programming experiments. As has been noted by others such as de Magalhães et al. [62], one of the difficulties we encountered was that there is no consistent interpretation of the notion of replication. For inclusion in our review we required four things (as discussed in Section 2 on page 4).

    • Predicting bug-fixing time: A replication study using an open source software project

      2018, Journal of Systems and Software
      Citation Excerpt :

      As this research is the replication study on predicting the bug fixing time, we reviewed the literature both on the value of replication studies in software engineering and also on bug fixing time prediction and estimation. Replication is an integral part of software engineering (SE) experimentation to enrich the body of knowledge and to find the conditions that make an experiment steady (de Magalhães et al., 2015). La Sorte (1972) defined replication as “a conscious and systematic repeat of an original study.”

    • Replicated results are more trustworthy

      2016, Perspectives on Data Science for Software Engineering
    View all citing articles on Scopus

    Article Notes: Preliminary and partial results of this study have been presented at the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE’2014) and published in the Conference Proceedings [1].

    View full text