Towards an operationalization of test-driven development skills: An industrial empirical study

doi:10.1016/j.infsof.2015.08.004

Information and Software Technology

Volume 68, December 2015, Pages 82-97

https://doi.org/10.1016/j.infsof.2015.08.004 Get rights and content

Abstract

Context: The majority of the empirical studies on Test-driven development (TDD) are concerned with verifying or refuting the effectiveness of the technique over a traditional approach, and they tend to neglect whether the subjects possess the necessary skills to apply TDD, though they argue such skills are necessary.

Objective: We evaluate a set of minimal, a priori and in process skills necessary to apply TDD. We determine whether variations in external quality (i.e., number of defects) and productivity (i.e., number of features implemented) can be associated with different clusters of the TDD skills’ set.

Method: We executed a quasi-experiment involving 30 practitioners from industry. We first grouped the participants according to their TDD skills’ set (consisting of a priori experience on programming and testing as well as in-process TDD conformance) into three levels (Low-Medium-High) using k-means clustering. We then applied ANOVA to compare the clusters in terms of external quality and productivity, and conducted post-hoc pairwise analysis.

Results: We did not observe a statistically significant difference between the clusters either for external software quality ( $F (2, 27 = 1.44,$ $p = . 260$ ), or productivity ( $F (2, 27) = 3.02,$ $p = . 065$ ). However, the analysis of the effect sizes and their confidence intervals shows that the TDD skills’ set is a factor that could account for up to 28% of the external quality, and 38% for productivity.

Conclusion: We have reason to conclude that focusing on the improvement of TDD skills’ set investigated in this study could benefit software developers in improving their baseline productivity and the external quality of the code they produce. However, replications are needed to overcome the issues related with the statistical power of this study. We suggest practical insights for future work to investigate the phenomenon further.

Introduction

Test-driven development (TDD) is a software development technique in which the development is guided by writing unit tests. It was popularized in the late 1990s as part of Extreme Programming [1]. A developer using TDD follows four steps:

1.
Write a unit test for the functionality she wants to add.
2.
Run the unit test to make sure it fails.
3.
Write only enough production code to make the test to pass.
4.
Refactor both production and test code, and re-run the tests.

TDD is claimed to yield better results than traditional approaches to software development (e.g., when unit tests are written after the intended functionality is considered completed by the development team) in terms of developers’ productivity, external quality (e.g., reduced number of defects), maintainability, and extensibility [2], [3]. However, empirical investigations of the effects of TDD are contrasting [4], [5], arguing that the results are influenced by several variables (e.g., academic vs. industrial settings), including the skills of developers.

Literature reviews on TDD conclude that the application of the technique—and subsequently the manifestation of its postulated benefits—requires some skills [5], [6]; however, these studies do not indicate what these skills are. We started our investigation on skills with students in a previous study [7]. In that context, we looked at their pre-existing knowledge regarding two practical skills: proficiency with programming language and unit testing (UT). When the subjects tackled a small programming task using TDD, we found that such skills had little impact on their productivity—defined as the output (e.g., parts of the task completed) per unit of effort (e.g., time to complete the task). No significant relationship was observed regarding the quality of the software they produced—e.g., the defects found in the parts of the task which were completed by the subjects. In the same study, we acknowledged that other skills must be present in order for TDD developers to achieve the benefits advocated by TDD supporters.

With these motivations based on existing literature and our previous work, we incorporate in this study another practical skill, which we call TDD process conformance, along with programming language and unit test skills. TDD process conformance represents the ability of a developer to follow the TDD cycle. Together, these three skills represent our TDD skill set. Further, we used a more realistic task to overcome the limitations of small programming tasks, and recruited professional developers for the study. Consequently, the research goal of this work is the following:

In our previous studies [7], [8], [9] we have investigated the role that each skill plays individually with student subjects working on toy tasks. We now focus on the impact the skills have, when taken together, on the outcomes of interest, by performing a quasi-experiment involving 43 professional software developers (30 after mortality) without prior working experience in TDD. The developers were trained during a week-long workshop and then asked to implement new features of a legacy system using TDD. Finally, we evaluated the composite effect of their skills on their performance in terms of external quality and productivity. Hence, we contribute to the existing knowledge by:

•
Empirically investigating an anecdotal claim: that is, TDD requires skills to manifest benefits, with professional developers.
•
Building a model for quality and productivity that takes into account a set of practical skills (Section 3)
•
Providing initial empirical evidence that further investigation of the proposed TDD skill set are worth pursuing (Section 5)

The strong points of our study lie in the settings (Section 4) in which it was conducted. In particular, we:

•
Analyze data collected from professional software developers.
•
Utilize a near real-world, brown-field task, rather than a toy, green-field, task (see Section 4.2 and Appendix B).
•
Quantify process conformance analytically, rather than relying on self reports.

The rest of the paper is organized as follows. In Section 2 we present the existing literature related to our research, in Section 3 we define the TDD skill set used in our study. Section 4 explains the details of our empirical study design. Sections 5 and 6 report the results and associated discussions. We address the threats to the validity of our study in Section 7. We conclude the paper in Section 8.

Section snippets

Related work

Test-driven development has been the subject of several secondary studies. The systematic literature review by Turhan et al. [5]—covering 32 empirical studies—found positive effects on external quality, whereas the productivity results were inconclusive, when TDD was used across different settings. The meta-analysis by Rafique and Misic [4] is of interest when looking at how experience works with the postulated TDD effects. The work covers 10 years of TDD publications, from 2000 to 2011, in 25

A skill set for TDD

Our goal in this paper is to make a holistic analysis of the skills rather than focusing on them individually. Therefore, we include three skills, i.e., programming and testing skills as well as TDD process conformance, to define a TDD skill set.

Although existing literature acknowledges that skills matter when applying TDD, none indicates the necessary ones. For example, Causevic et al. [6] identified the lack of developers’ skills as one of the main impediments to the adoption of TDD by

Study definition

An overview of the study is presented in Fig. 1. The study seeks the answers to the research questions presented in Section 4.1. We recruited subjects from two companies, in the context of a workshop about UT and TDD (Section 4.2). We assessed the subjects’ skills in Java development and UT at the beginning of the workshop. During the workshop the subjects carried out a brown-field, real-world task (Section 4.3). Subsequently, we collected the necessary data to extract TDD process

Results

In this section, we first report the descriptive statistics of the data, and provide a sanity check in order to proceed with clustering and ANOVA. All the statistical tests use $α = 0.05 .$

Discussion

We investigated two research hypotheses in which we argue that a difference in terms of external quality (H_QLTY) and productivity (H_PROD) exists among three TDD skill set groups.Our TDD skill set includes two different kind of skills: a-priori knowledge of concepts necessary to apply TDD (i.e., Java programming language and UT); and in process skill, i.e., the level of conformance to the TDD process. We first clustered the subjects according to their skills’ set, then we applied statistical

Threats to validity

In this section, we explain the main threats to the validity of our study following Wohlin et al. [41], along with the countermeasures we took when possible. Moreover, we suggest some actions that researchers willing to replicate this study could take to limit some of the threats. The types of validity threats are prioritized, in increasing order, following Cook and Campbell’s [42] guidelines. In particular, since this study is part of an effort to apply research in industry, we give more

Conclusions

In this work, we studied 30 professional software developers applying TDD to add new features to a legacy system close to real-world complexity. We contributed to the existing knowledge by operationalizing developers’ test-driven development skills, not only according to their a priori abilities (i.e., Java programming and UT), but also including their capacity to follow the test-driven development cycle. We clustered the subjects according to such skill set and compared them in terms of

Acknowledgements

This research is partially supported by the Academy of Finland with decision no.: 278354, and by Finnish Distinguished Professor (Fi.Di.Pro.) programme, ESEIL. The first author would like to acknowledge the Nokia Foundation and ISACA Finland chapter for the support provided in completing this work. We would like to acknowledge Dr. Lucas Layman who significantly contributed to the design of the task used in the study. We would like to acknowledge the anonymous reviewers for their helpful

References (50)

H. Munir et al.
Considering rigor and relevance when evaluating test driven development: A systematic review
Inform. Softw. Technol.
(2014)
K. Becker et al.
Besouro: A framework for exploring compliance rules in automatic {TDD} behavior assessment
Inform. Softw. Technol.
(2015)
K. Beck
Test-driven Development: By Example
(2002)
D. Astels
Test Driven Development: A Practical Guide
(2003)
K. Beck
Aim, fire
IEEE Softw.
(2001)
Y. Rafique et al.
The effects of test-driven development on external quality and productivity: A meta-analysis
IEEE Trans. Softw. Eng.
(2013)
B. Turhan et al.
How effective is test driven development?
A. Causevic et al.
Factors limiting industrial adoption of test driven development: A systematic review
2011 IEEE Fourth International Conference on Software Testing, Verification and Validation (ICST)
(2011)
D. Fucci et al.
On the effects of programming and testing skills on external quality and productivity in a test-driven development context
Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering (EASE’15)
(2015)
D. Fucci et al.
On the role of tests in test-driven development: A differentiated and partial replication
Empir. Softw. Eng.
(2013)

D. Fucci et al.

Impact of process conformance on the effects of test-driven development

8th ACM/IEEE International Symposium (ESEM’14)

(2014)

R. Latorre

Effects of developer experience on learning and applying unit test-driven development

IEEE Trans. Softw. Eng.

(2014)

M.M. Müller et al.

The effect of experience on the test-driven development process

Empir. Softw. Eng.

(2007)

A.T. Misirli et al.

Topic selection in industry experiments

Proceedings of the 2nd International Workshop on Conducting Empirical Studies in Industry (CESI’14)

(2014)

I. Salman et al.

Are students representatives of professionals in software engineering experiments?

Proceedings of the 37th International Conference on Software Engineering (ICSE’15)

(2014)

L.A. Meyerovich et al.

Empirical analysis of programming language adoption

SIGPLAN Not.

(2013)

D. Fucci et al.

Conformance factor in test-driven development: Initial results from an enhanced replication

Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE’14)

(2014)

W.R. Shadish et al.

Experimental and Quasi-Experimental Designs for Generalized Causal Inference

(2003)

M. Weinberger et al.

Multisite randomized controlled trials in health services research: Scientific challenges and operational issues

Med. Care

(2001)

S.W. Raudenbush et al.

Statistical power and optimal design for multisite randomized trials.

Psychol. Methods

(2000)

B. Vodde et al.

Learning test-driven development by counting lines

IEEE Softw.

(2007)

D.T. Sato et al.

Coding Dojo: An environment for learning and sharing Agile practices

Agile Conference (AGILE’08)

(2008)

H. Kou et al.

Operational definition and automated inference of test-driven development with Zorro

Autom. Softw. Eng.

(2010)

Y. Wang et al.

The role of process measurement in test-driven development

N. Zazworka et al.

Tool supported detection and judgment of nonconformance in process execution

3rd International Symposium on Empirical Software Engineering and Measurement (ESEM’09)

(2009)

Cited by (20)

Studying test-driven development and its retainment over a six-month time span
2021, Journal of Systems and Software
Citation Excerpt :
A number of primary studies, like experiments or case studies, have been conducted on TDD (Fucci et al., 2016; Erdogmus et al., 2005; George and Williams, 2004; Bhat and Nagappan, 2006; Nagappan et al., 2008). Their results, gathered and combined in a number of secondary studies (Karac and Turhan, 2018; Bissi et al., 2016; Fucci et al., 2015; Turhan et al., 2010; Munir et al., 2014; Rafique and Mišić, 2013), do not fully support the claimed benefits of TDD (i.e., while some primary studies have shown that TDD allows improving quality of software products and/or developers’ productivity, other primary studies have not). Some researchers have conjectured that long-term observations are needed to see the claimed benefits of TDD and/or to better understand this development approach; therefore, they have recommended taking a longitudinal approach when investigating TDD (Fucci et al., 2015; Munir et al., 2014; Shull et al., 2010; Müller and Höfer, 2007)—i.e., studying TDD over a time span.
In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers’ productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability, than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months.
Findings from a multi-method study on test-driven development
2017, Information and Software Technology
Citation Excerpt :
The participants in our study were trained using a similar material, and over a similar time span as in the study presented in Fucci and Turhan [34]. A study with professionals using TDD [35] leveraged process conformance, along with other metrics related to the developer’ skills, to investigate the impact of the practice on the quality of the software as well as the developer’ productivity. In industrial settings, the authors showed that developer’ conformance was close to 75%; however, 13% of the time the process was not followed at all.
Test-driven development (TDD) is an iterative software development practice where unit tests are defined before production code. A number of quantitative empirical investigations have been conducted about this practice. The results are contrasting and inconclusive. In addition, previous studies fail to analyze the values, beliefs, and assumptions that inform and shape TDD.
We present a study designed, and conducted to understand the values, beliefs, and assumptions about TDD. Participants were novice and professional software developers.
We conducted an ethnographically-informed study with 14 novice software developers, i.e., graduate students in Computer Science at the University of Basilicata, and six professional software developers (with one to 10 years work experience). The participants worked on the implementation of a new feature for an existing software written in Java. We immersed ourselves in the context of our study. We collected qualitative information by means of audio recordings, contemporaneous field notes, and other kinds of artifacts. We collected quantitative data from the integrated development environment to support or refute the ethnography results.
The main insights of our study can be summarized as follows: (i) refactoring (one of the phases of TDD) is not performed as often as the process requires and it is considered less important than other phases, (ii) the most important phase is implementation, (iii) unit tests are almost never up-to-date, and (iv) participants first build in their mind a sort of model of the source code to be implemented and only then write test cases. The analysis of the quantitative data supported the following qualitative findings: (i), (iii), and (iv).
Developers write quick-and-dirty production code to pass the tests, do not update their tests often, and ignore refactoring.
A Two-stage Method of Synchronization Prediction Framework in TDD
2022, Arabian Journal for Science and Engineering
Studying test-driven development and its retainment over a six-month time span
2021, arXiv
Construct validity in software engineering
2021, TechRxiv
The effect of Test-Driven Development and Behavior-Driven Development on Project Success Factors: A Systematic Literature Review Based Study
2021, Proceedings of: 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering, ICCCEEE 2020

View all citing articles on Scopus

View full text

Towards an operationalization of test-driven development skills: An industrial empirical study

Abstract

Introduction

Section snippets

Related work

A skill set for TDD

Study definition

Results

Discussion

Threats to validity

Conclusions

Acknowledgements

Inform. Softw. Technol.

Inform. Softw. Technol.

Test-driven Development: By Example

Test Driven Development: A Practical Guide

Aim, fire

IEEE Softw.

The effects of test-driven development on external quality and productivity: A meta-analysis

IEEE Trans. Softw. Eng.

How effective is test driven development?

Factors limiting industrial adoption of test driven development: A systematic review

2011 IEEE Fourth International Conference on Software Testing, Verification and Validation (ICST)

On the effects of programming and testing skills on external quality and productivity in a test-driven development context

Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering (EASE’15)

On the role of tests in test-driven development: A differentiated and partial replication

Empir. Softw. Eng.

Impact of process conformance on the effects of test-driven development

8th ACM/IEEE International Symposium (ESEM’14)

Effects of developer experience on learning and applying unit test-driven development

IEEE Trans. Softw. Eng.

The effect of experience on the test-driven development process

Empir. Softw. Eng.

Topic selection in industry experiments

Proceedings of the 2nd International Workshop on Conducting Empirical Studies in Industry (CESI’14)

Are students representatives of professionals in software engineering experiments?

Proceedings of the 37th International Conference on Software Engineering (ICSE’15)

Empirical analysis of programming language adoption

SIGPLAN Not.

Conformance factor in test-driven development: Initial results from an enhanced replication

Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE’14)

Experimental and Quasi-Experimental Designs for Generalized Causal Inference

Multisite randomized controlled trials in health services research: Scientific challenges and operational issues

Med. Care

Statistical power and optimal design for multisite randomized trials.

Psychol. Methods

Learning test-driven development by counting lines

IEEE Softw.

Coding Dojo: An environment for learning and sharing Agile practices

Agile Conference (AGILE’08)

Operational definition and automated inference of test-driven development with Zorro

Autom. Softw. Eng.

The role of process measurement in test-driven development

Tool supported detection and judgment of nonconformance in process execution

3rd International Symposium on Empirical Software Engineering and Measurement (ESEM’09)