An empirical study of the long duration of continuous integration builds

Ghaleb, Taher Ahmed; da Costa, Daniel Alencar; Zou, Ying

doi:10.1007/s10664-019-09695-9

An empirical study of the long duration of continuous integration builds

Published: 01 March 2019

Volume 24, pages 2102–2139, (2019)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

1603 Accesses
35 Citations
1 Altmetric
Explore all metrics

Abstract

Continuous Integration (CI) is a set of software development practices that allow software development teams to generate software builds more quickly and periodically (e.g., daily or even hourly). CI brings many advantages, such as the early identification of errors when integrating code. When builds are generated frequently, a long build duration may hold developers from performing other important tasks. Recent research has shown that a considerable amount of development time is invested on optimizing the generation of builds. However, the reasons behind long build durations are still vague and need an in-depth study. Our initial investigation shows that many projects have build durations that far exceed the acceptable build duration (i.e., 10 minutes) as reported by recent studies. In this paper, we study several characteristics of CI builds that may be associated with the long duration of CI builds. We perform an empirical study on 104,442 CI builds from 67 GitHub projects. We use mixed-effects logistic models to model long build durations across projects. Our results reveal that, in addition to common wisdom factors (e.g., project size, team size, build configuration size, and test density), there are other highly important factors to explain long build durations. We observe that rerunning failed commands multiple times is most likely to be associated with long build durations. We also find that builds may run faster if they are configured (a) to cache content that does not change often or (b) to finish as soon as all the required jobs finish. However, we observe that about 40% of the studied projects do not use or misuse such configurations in their builds. In addition, we observe that triggering builds on weekdays or at daytime is most likely to have a direct relationship with long build durations. Our results suggest that developers should use proper CI build configurations to maintain successful builds and to avoid long build durations. Tool builders should supply development teams with tools to identify cacheable spots of the project in order to accelerate the generation of CI builds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A Brief Study on Build Failures in Continuous Integration: Causation and Effect

A study of build inflation in 30 million CPAN builds on 13 Perl versions and 10 operating systems

Article 19 June 2019

Mahdis Zolfagharinia, Bram Adams & Yann-Gaël Guéhéneuc

Every build you break: developer-oriented assistance for build failure resolution

Article 09 October 2019

Carmine Vassallo, Sebastian Proksch, … Harald C. Gall

Notes

References

Agresti A (1989) Tutorial on modeling ordered categorical response data. Psychol Bull 105(2):290
Article Google Scholar
Ammons G (2006) Grexmk: speeding up scripted builds. In: Proceedings of the international workshop on dynamic systems analysis. ACM, pp 81–87
Astels D (2018) One assertion per test. http://www.artima.com/weblogs/viewpost.jsp?thread=35578. Visited on February 05
Atchison A, Berardi C, Best N, Stevens E, Linstead E (2017) A time series analysis of TravisTorrent builds: to everything there is a season. In: Proceedings of the 14th international conference on mining software repositories, pp 463–466
Beck K (2000) Extreme programming explained: embrace change. Addison-Wesley Professional, Reading
Google Scholar
Beller M, Gousios G, Zaidman A (2017a) Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub. In: Proceedings of the 14th international conference on mining software repositories, pp 356–367
Beller M, Gousios G, Zaidman A (2017b) Travistorrent: synthesizing Travis CI and Github for full-stack research on continuous integration. In: Proceedings of the 14th international conference on mining software repositories, pp 447–450
Bernardo JH, da Costa DA, Kulesza U (2018) Studying the impact of adopting continuous integration on the delivery time of pull requests. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 131–141
Bisong E, Tran E, Baysal O (2017) Built to last or built too fast?: evaluating prediction models for build times. In: Proceedings of the 14th international conference on mining software repositories, pp 487–490
Brooks G (2008) Team pace keeping build times down. In: Proceedings of the AGILE conference. IEEE, pp 294–297
Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494
Article Google Scholar
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Article Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Article MathSciNet MATH Google Scholar
Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252
Article Google Scholar
Duvall PM, Matyas S, Glover A (2007) Continuous integration: improving software quality and reducing risk. Pearson Education, London
Google Scholar
Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. In: Guide to advanced empirical software engineering. Springer, pp 285–311
Elbaum S, Rothermel G, Penix J (2014) Techniques for improving regression testing in continuous integration development environments. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 235–245
Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess?. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 153–162
Faraway JJ (2016) Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models, vol 124. CRC Press, Boca Raton
Book MATH Google Scholar
Feldman SI (1979) Make—a program for maintaining computer programs. Software: practice and experience 9(4):255–265
MATH Google Scholar
Fisher RA (1925) Statistical methods for research workers. Genesis Publishing Pvt Ltd, London
MATH Google Scholar
Fowler M, Foemmel M (2006) Continuous integration. http://www.dccia.ua.es/dccia/inf/asignaturas/MADS/2013-14/lecturas/10_fowler_continuous_integration.pdf
Gallaba K, McIntosh S (2018) Use and misuse of continuous integration features: an empirical study of projects that (mis) use Travis CI. IEEE Trans Softw Eng 1–17
Gallaba K, Macho C, Pinzger M, McIntosh S (2018) Noise and heterogeneity in historical build data: an empirical study of Travis CI. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, 87–97
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Article Google Scholar
Harrell FE (2001) Regression modeling strategies, with applications to linear models, survival analysis and logistic regression. Springer, Berlin
MATH Google Scholar
Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 426–437
Hilton M, Nelson N, Tunnell T, Marinov D, Dig D (2017) Trade-offs in continuous integration: assurance, security, and flexibility. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 197–207
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
MathSciNet MATH Google Scholar
Howell DC (2014) Median absolute deviation. Wiley StatsRef: Statistics reference online
Huo C, Clause J (2014) Improving oracle quality by detecting brittle assertions and unused inputs in tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 621–631
Kampstra P, et al (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw 28:1–9
Article Google Scholar
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621
Article MATH Google Scholar
Kumfert G, Epperly T (2002) Software in the DOE: the hidden overhead of “The Build”. Tech. rep., Lawrence Livermore National Lab., CA (US)
Laukkanen E, Mäntylä MV (2015) Build waiting time in continuous integration: an initial interdisciplinary literature review. In: Proceedings of the second international workshop on rapid continuous software engineering, pp 1–4
Lei Y, Andrews JH (2005) Minimization of randomized unit test cases. In: 16th IEEE international symposium on software reliability engineering, 2005. ISSRE 2005. IEEE, pp 10–pp
Leitner A, Oriol M, Zeller A, Ciupa I, Meyer B (2007) Efficient unit test case minimization. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 417–420
Lewis AJ (2009) Mixed effects models and extensions in ecology with R. Springer, Berlin
Google Scholar
Liang J, Elbaum S, Rothermel G (2018) Redefining prioritization: continuous prioritization for continuous integration. In: Proceedings of the 40th international conference on software engineering. ACM, pp 688–698
McIntosh S, Adams B, Hassan AE (2012) The evolution of Java build systems. Empir Softw Eng 17(4-5):578–608
Article Google Scholar
McIntosh S, Nagappan M, Adams B, Mockus A, Hassan AE (2015) A large-scale empirical study of the relationship between build technology and build maintenance. Empir Softw Eng 20(6):1587–1633
Article Google Scholar
Meszaros G (2007) xUnit test patterns: refactoring test code Pearson education
Meyer M (2014) Continuous integration and its tools. IEEE Softw 31(3):14–16
Article Google Scholar
Mokhov A, Mitchell N, Peyton Jones S, Marlow S (2016) Non-recursive make considered harmful: build systems at scale. In: Proceedings of the 9th international symposium on Haskell. ACM, pp 170–181
Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods Ecol Evol 4(2):133–142
Article Google Scholar
Ni A, Li M (2017) Cost-effective build outcome prediction using cascaded classifiers. In: Proceedings of the 14th international conference on mining software repositories, pp 455–458
Online Appendix (2018) https://taher-ghaleb.github.io/papers/emse_2018/appendix.html
Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49(12):1373–1379
Article Google Scholar
Pinheiro P (2010) Linear and nonlinear mixed effects models. R package version 3.1-97. http://cran.r-project.org/web/packages/nlme
Rasmusson J (2004) Long build trouble shooting guide. Proceedings of the Extreme Programming and Agile Methods-XP/Agile Universe Conference, pp 557–574
Rausch T, Hummer W, Leitner P, Schulte S (2017) An empirical analysis of build failures in the continuous integration workflows of Java-based open-source software. In: Proceedings of the 14th international conference on mining software repositories, pp 345–355
Rogers RO (2004) Scaling continuous integration. In: Proceedings of the international conference on extreme programming and agile processes in software engineering. Springer, pp 68–76
Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Should we really be using t-test and cohen’sd for evaluating group differences on the nsse and other surveys
Sarle W (1990) The VARCLUS procedure SAS/STAT user’s guide
Seo H, Sadowski C, Elbaum S, Aftandilian E, Bowdidge R (2014) Programmers’ build errors: a case study (at Google). In: Proceedings of the 36th international conference on software engineering. ACM, pp 724–734
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: ACM sigsoft software engineering notes, vol 30(4). ACM, pp 1–5
Smith P (2011) Software build systems: principles and experience. Addison-Wesley Professional, Reading
Google Scholar
Sulír M, Porubän J (2016) A quantitative study of Java software buildability. In: Proceedings of the 7th international workshop on evaluation and usability of programming languages and tools. ACM, pp 17–25
Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2017) There and back again: can you compile that snapshot? Journal of Software: Evolution and Process 29(4):1–11
Google Scholar
Van Deursen A, Moonen L, van den Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001), pp 92–95
Vandekerckhove J, Matzke D, Wagenmakers EJ (2015) Model comparison and the principle. In: The oxford handbook of computational and mathematical psychology, vol 300. Oxford library of psychology
Vasilescu B, Van Schuylenburg S, Wulms J, Serebrenik A, van den Brand MG (2014) Continuous integration in a social-coding world: empirical evidence from Github. In: Proceedings of the international conference on software maintenance and evolution (ICSME 2014). IEEE, pp 401–405
Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in GitHub. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 805–816
Vassallo C, Schermann G, Zampetti F, Romano D, Leitner P, Zaidman A, Di Penta M, Panichella s (2017) A tale of CI build failures: an open source and a financial organization perspective. In: Proceedings of the 33rd international conference on software maintenance and evolution
Winter B (2013) A very basic tutorial for performing linear mixed effects analyses. arXiv:1308.5499
Xia J, Li Y (2017) Could we predict the result of a continuous integration build? An empirical study. In: Proceedings of the IEEE international conference on software quality, reliability and security companion, pp 311–315
Xuan J, Monperrus M (2014) Test case purification for improving fault localization. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 52–63
Zolfagharinia M, Adams B, Guéhéneuc YG (2017) Do not trust build results at face value: an empirical study of 30 million CPAN builds. In: Proceedings of the 14th international conference on mining software repositories, pp 312–322

Download references

Author information

Authors and Affiliations

School of Computing, Queen’s University, Kingston, ON, K7L 3N6, Canada
Taher Ahmed Ghaleb
Department of Information Science, University of Otago, Otago, New Zealand
Daniel Alencar da Costa
Department of Electrical and Computer Engineering, Queen’s University, Kingston, ON, K7L 3N6, Canada
Ying Zou

Authors

Taher Ahmed Ghaleb
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Alencar da Costa
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taher Ahmed Ghaleb.

Additional information

Communicated by: Denys Poshyvanyk

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghaleb, T.A., da Costa, D.A. & Zou, Y. An empirical study of the long duration of continuous integration builds. Empir Software Eng 24, 2102–2139 (2019). https://doi.org/10.1007/s10664-019-09695-9

Download citation

Published: 01 March 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10664-019-09695-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study of the long duration of continuous integration builds

Abstract

Access this article

Similar content being viewed by others

A Brief Study on Build Failures in Continuous Integration: Causation and Effect

A study of build inflation in 30 million CPAN builds on 13 Perl versions and 10 operating systems

Every build you break: developer-oriented assistance for build failure resolution

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An empirical study of the long duration of continuous integration builds

Abstract

Access this article

Similar content being viewed by others

A Brief Study on Build Failures in Continuous Integration: Causation and Effect

A study of build inflation in 30 million CPAN builds on 13 Perl versions and 10 operating systems

Every build you break: developer-oriented assistance for build failure resolution

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation