Skip to main content
Log in

An empirical study of the long duration of continuous integration builds

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Continuous Integration (CI) is a set of software development practices that allow software development teams to generate software builds more quickly and periodically (e.g., daily or even hourly). CI brings many advantages, such as the early identification of errors when integrating code. When builds are generated frequently, a long build duration may hold developers from performing other important tasks. Recent research has shown that a considerable amount of development time is invested on optimizing the generation of builds. However, the reasons behind long build durations are still vague and need an in-depth study. Our initial investigation shows that many projects have build durations that far exceed the acceptable build duration (i.e., 10 minutes) as reported by recent studies. In this paper, we study several characteristics of CI builds that may be associated with the long duration of CI builds. We perform an empirical study on 104,442 CI builds from 67 GitHub projects. We use mixed-effects logistic models to model long build durations across projects. Our results reveal that, in addition to common wisdom factors (e.g., project size, team size, build configuration size, and test density), there are other highly important factors to explain long build durations. We observe that rerunning failed commands multiple times is most likely to be associated with long build durations. We also find that builds may run faster if they are configured (a) to cache content that does not change often or (b) to finish as soon as all the required jobs finish. However, we observe that about 40% of the studied projects do not use or misuse such configurations in their builds. In addition, we observe that triggering builds on weekdays or at daytime is most likely to have a direct relationship with long build durations. Our results suggest that developers should use proper CI build configurations to maintain successful builds and to avoid long build durations. Tool builders should supply development teams with tools to identify cacheable spots of the project in order to accelerate the generation of CI builds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://ant.apache.org

  2. http://maven.apache.org

  3. https://travis-ci.org

  4. https://appveyor.com

  5. https://circleci.com

  6. https://cran.r-project.org/web/packages/rms/rms.eps

  7. https://github.com/diaspora/diaspora

  8. https://travis-ci.org/diaspora/diaspora/builds/4033669

  9. https://travis-ci.org/diaspora/diaspora/builds/10766342

  10. https://blog.travis-ci.com/2013-11-27-fast-finishing-builds

  11. https://github.com/killbill/killbill

  12. https://github.com/ruboto/ruboto

  13. https://blog.travis-ci.com/2014-12-17-faster-builds-with-container-based-infrastructure/

  14. https://github.com/killbill/killbill

  15. https://github.com/apache/flink

  16. https://github.com/assaf/vanity

  17. https://github.com/opf/openproject

  18. https://bundler.io

  19. https://docs.travis-ci.com/user/common-build-problems/#travis_retry

  20. https://github.com/jruby/jruby

  21. https://travis-ci.org/jruby/jruby/builds/108164066

  22. https://travis-ci.org/jruby/jruby/builds/108161963

  23. https://travis-ci.org/jruby/jruby/builds/108165671

  24. https://github.com/jruby/jruby/commit/30d975e6abdb1bdab1b80b0bfbd83313f139f8a2

  25. https://github.com/structr/structr

References

  • Agresti A (1989) Tutorial on modeling ordered categorical response data. Psychol Bull 105(2):290

    Article  Google Scholar 

  • Ammons G (2006) Grexmk: speeding up scripted builds. In: Proceedings of the international workshop on dynamic systems analysis. ACM, pp 81–87

  • Astels D (2018) One assertion per test. http://www.artima.com/weblogs/viewpost.jsp?thread=35578. Visited on February 05

  • Atchison A, Berardi C, Best N, Stevens E, Linstead E (2017) A time series analysis of TravisTorrent builds: to everything there is a season. In: Proceedings of the 14th international conference on mining software repositories, pp 463–466

  • Beck K (2000) Extreme programming explained: embrace change. Addison-Wesley Professional, Reading

    Google Scholar 

  • Beller M, Gousios G, Zaidman A (2017a) Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub. In: Proceedings of the 14th international conference on mining software repositories, pp 356–367

  • Beller M, Gousios G, Zaidman A (2017b) Travistorrent: synthesizing Travis CI and Github for full-stack research on continuous integration. In: Proceedings of the 14th international conference on mining software repositories, pp 447–450

  • Bernardo JH, da Costa DA, Kulesza U (2018) Studying the impact of adopting continuous integration on the delivery time of pull requests. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 131–141

  • Bisong E, Tran E, Baysal O (2017) Built to last or built too fast?: evaluating prediction models for build times. In: Proceedings of the 14th international conference on mining software repositories, pp 487–490

  • Brooks G (2008) Team pace keeping build times down. In: Proceedings of the AGILE conference. IEEE, pp 294–297

  • Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494

    Article  Google Scholar 

  • Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

    Article  Google Scholar 

  • Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64

    Article  MathSciNet  MATH  Google Scholar 

  • Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252

    Article  Google Scholar 

  • Duvall PM, Matyas S, Glover A (2007) Continuous integration: improving software quality and reducing risk. Pearson Education, London

    Google Scholar 

  • Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. In: Guide to advanced empirical software engineering. Springer, pp 285–311

  • Elbaum S, Rothermel G, Penix J (2014) Techniques for improving regression testing in continuous integration development environments. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 235–245

  • Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess?. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 153–162

  • Faraway JJ (2016) Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models, vol 124. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Feldman SI (1979) Make—a program for maintaining computer programs. Software: practice and experience 9(4):255–265

    MATH  Google Scholar 

  • Fisher RA (1925) Statistical methods for research workers. Genesis Publishing Pvt Ltd, London

    MATH  Google Scholar 

  • Fowler M, Foemmel M (2006) Continuous integration. http://www.dccia.ua.es/dccia/inf/asignaturas/MADS/2013-14/lecturas/10_fowler_continuous_integration.pdf

  • Gallaba K, McIntosh S (2018) Use and misuse of continuous integration features: an empirical study of projects that (mis) use Travis CI. IEEE Trans Softw Eng 1–17

  • Gallaba K, Macho C, Pinzger M, McIntosh S (2018) Noise and heterogeneity in historical build data: an empirical study of Travis CI. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, 87–97

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36

    Article  Google Scholar 

  • Harrell FE (2001) Regression modeling strategies, with applications to linear models, survival analysis and logistic regression. Springer, Berlin

    MATH  Google Scholar 

  • Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 426–437

  • Hilton M, Nelson N, Tunnell T, Marinov D, Dig D (2017) Trade-offs in continuous integration: assurance, security, and flexibility. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 197–207

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

    MathSciNet  MATH  Google Scholar 

  • Howell DC (2014) Median absolute deviation. Wiley StatsRef: Statistics reference online

  • Huo C, Clause J (2014) Improving oracle quality by detecting brittle assertions and unused inputs in tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 621–631

  • Kampstra P, et al (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw 28:1–9

    Article  Google Scholar 

  • Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621

    Article  MATH  Google Scholar 

  • Kumfert G, Epperly T (2002) Software in the DOE: the hidden overhead of “The Build”. Tech. rep., Lawrence Livermore National Lab., CA (US)

  • Laukkanen E, Mäntylä MV (2015) Build waiting time in continuous integration: an initial interdisciplinary literature review. In: Proceedings of the second international workshop on rapid continuous software engineering, pp 1–4

  • Lei Y, Andrews JH (2005) Minimization of randomized unit test cases. In: 16th IEEE international symposium on software reliability engineering, 2005. ISSRE 2005. IEEE, pp 10–pp

  • Leitner A, Oriol M, Zeller A, Ciupa I, Meyer B (2007) Efficient unit test case minimization. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 417–420

  • Lewis AJ (2009) Mixed effects models and extensions in ecology with R. Springer, Berlin

    Google Scholar 

  • Liang J, Elbaum S, Rothermel G (2018) Redefining prioritization: continuous prioritization for continuous integration. In: Proceedings of the 40th international conference on software engineering. ACM, pp 688–698

  • McIntosh S, Adams B, Hassan AE (2012) The evolution of Java build systems. Empir Softw Eng 17(4-5):578–608

    Article  Google Scholar 

  • McIntosh S, Nagappan M, Adams B, Mockus A, Hassan AE (2015) A large-scale empirical study of the relationship between build technology and build maintenance. Empir Softw Eng 20(6):1587–1633

    Article  Google Scholar 

  • Meszaros G (2007) xUnit test patterns: refactoring test code Pearson education

  • Meyer M (2014) Continuous integration and its tools. IEEE Softw 31(3):14–16

    Article  Google Scholar 

  • Mokhov A, Mitchell N, Peyton Jones S, Marlow S (2016) Non-recursive make considered harmful: build systems at scale. In: Proceedings of the 9th international symposium on Haskell. ACM, pp 170–181

  • Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods Ecol Evol 4(2):133–142

    Article  Google Scholar 

  • Ni A, Li M (2017) Cost-effective build outcome prediction using cascaded classifiers. In: Proceedings of the 14th international conference on mining software repositories, pp 455–458

  • Online Appendix (2018) https://taher-ghaleb.github.io/papers/emse_2018/appendix.html

  • Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49(12):1373–1379

    Article  Google Scholar 

  • Pinheiro P (2010) Linear and nonlinear mixed effects models. R package version 3.1-97. http://cran.r-project.org/web/packages/nlme

  • Rasmusson J (2004) Long build trouble shooting guide. Proceedings of the Extreme Programming and Agile Methods-XP/Agile Universe Conference, pp 557–574

  • Rausch T, Hummer W, Leitner P, Schulte S (2017) An empirical analysis of build failures in the continuous integration workflows of Java-based open-source software. In: Proceedings of the 14th international conference on mining software repositories, pp 345–355

  • Rogers RO (2004) Scaling continuous integration. In: Proceedings of the international conference on extreme programming and agile processes in software engineering. Springer, pp 68–76

  • Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Should we really be using t-test and cohen’sd for evaluating group differences on the nsse and other surveys

  • Sarle W (1990) The VARCLUS procedure SAS/STAT user’s guide

  • Seo H, Sadowski C, Elbaum S, Aftandilian E, Bowdidge R (2014) Programmers’ build errors: a case study (at Google). In: Proceedings of the 36th international conference on software engineering. ACM, pp 724–734

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: ACM sigsoft software engineering notes, vol 30(4). ACM, pp 1–5

  • Smith P (2011) Software build systems: principles and experience. Addison-Wesley Professional, Reading

    Google Scholar 

  • Sulír M, Porubän J (2016) A quantitative study of Java software buildability. In: Proceedings of the 7th international workshop on evaluation and usability of programming languages and tools. ACM, pp 17–25

  • Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2017) There and back again: can you compile that snapshot? Journal of Software: Evolution and Process 29(4):1–11

    Google Scholar 

  • Van Deursen A, Moonen L, van den Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001), pp 92–95

  • Vandekerckhove J, Matzke D, Wagenmakers EJ (2015) Model comparison and the principle. In: The oxford handbook of computational and mathematical psychology, vol 300. Oxford library of psychology

  • Vasilescu B, Van Schuylenburg S, Wulms J, Serebrenik A, van den Brand MG (2014) Continuous integration in a social-coding world: empirical evidence from Github. In: Proceedings of the international conference on software maintenance and evolution (ICSME 2014). IEEE, pp 401–405

  • Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in GitHub. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 805–816

  • Vassallo C, Schermann G, Zampetti F, Romano D, Leitner P, Zaidman A, Di Penta M, Panichella s (2017) A tale of CI build failures: an open source and a financial organization perspective. In: Proceedings of the 33rd international conference on software maintenance and evolution

  • Winter B (2013) A very basic tutorial for performing linear mixed effects analyses. arXiv:1308.5499

  • Xia J, Li Y (2017) Could we predict the result of a continuous integration build? An empirical study. In: Proceedings of the IEEE international conference on software quality, reliability and security companion, pp 311–315

  • Xuan J, Monperrus M (2014) Test case purification for improving fault localization. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 52–63

  • Zolfagharinia M, Adams B, Guéhéneuc YG (2017) Do not trust build results at face value: an empirical study of 30 million CPAN builds. In: Proceedings of the 14th international conference on mining software repositories, pp 312–322

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taher Ahmed Ghaleb.

Additional information

Communicated by: Denys Poshyvanyk

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghaleb, T.A., da Costa, D.A. & Zou, Y. An empirical study of the long duration of continuous integration builds. Empir Software Eng 24, 2102–2139 (2019). https://doi.org/10.1007/s10664-019-09695-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09695-9

Keywords

Navigation