Abstract
Continuous Integration (CI) is a set of software development practices that allow software development teams to generate software builds more quickly and periodically (e.g., daily or even hourly). CI brings many advantages, such as the early identification of errors when integrating code. When builds are generated frequently, a long build duration may hold developers from performing other important tasks. Recent research has shown that a considerable amount of development time is invested on optimizing the generation of builds. However, the reasons behind long build durations are still vague and need an in-depth study. Our initial investigation shows that many projects have build durations that far exceed the acceptable build duration (i.e., 10 minutes) as reported by recent studies. In this paper, we study several characteristics of CI builds that may be associated with the long duration of CI builds. We perform an empirical study on 104,442 CI builds from 67 GitHub projects. We use mixed-effects logistic models to model long build durations across projects. Our results reveal that, in addition to common wisdom factors (e.g., project size, team size, build configuration size, and test density), there are other highly important factors to explain long build durations. We observe that rerunning failed commands multiple times is most likely to be associated with long build durations. We also find that builds may run faster if they are configured (a) to cache content that does not change often or (b) to finish as soon as all the required jobs finish. However, we observe that about 40% of the studied projects do not use or misuse such configurations in their builds. In addition, we observe that triggering builds on weekdays or at daytime is most likely to have a direct relationship with long build durations. Our results suggest that developers should use proper CI build configurations to maintain successful builds and to avoid long build durations. Tool builders should supply development teams with tools to identify cacheable spots of the project in order to accelerate the generation of CI builds.





Similar content being viewed by others
Notes
References
Agresti A (1989) Tutorial on modeling ordered categorical response data. Psychol Bull 105(2):290
Ammons G (2006) Grexmk: speeding up scripted builds. In: Proceedings of the international workshop on dynamic systems analysis. ACM, pp 81–87
Astels D (2018) One assertion per test. http://www.artima.com/weblogs/viewpost.jsp?thread=35578. Visited on February 05
Atchison A, Berardi C, Best N, Stevens E, Linstead E (2017) A time series analysis of TravisTorrent builds: to everything there is a season. In: Proceedings of the 14th international conference on mining software repositories, pp 463–466
Beck K (2000) Extreme programming explained: embrace change. Addison-Wesley Professional, Reading
Beller M, Gousios G, Zaidman A (2017a) Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub. In: Proceedings of the 14th international conference on mining software repositories, pp 356–367
Beller M, Gousios G, Zaidman A (2017b) Travistorrent: synthesizing Travis CI and Github for full-stack research on continuous integration. In: Proceedings of the 14th international conference on mining software repositories, pp 447–450
Bernardo JH, da Costa DA, Kulesza U (2018) Studying the impact of adopting continuous integration on the delivery time of pull requests. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 131–141
Bisong E, Tran E, Baysal O (2017) Built to last or built too fast?: evaluating prediction models for build times. In: Proceedings of the 14th international conference on mining software repositories, pp 487–490
Brooks G (2008) Team pace keeping build times down. In: Proceedings of the AGILE conference. IEEE, pp 294–297
Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252
Duvall PM, Matyas S, Glover A (2007) Continuous integration: improving software quality and reducing risk. Pearson Education, London
Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. In: Guide to advanced empirical software engineering. Springer, pp 285–311
Elbaum S, Rothermel G, Penix J (2014) Techniques for improving regression testing in continuous integration development environments. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 235–245
Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess?. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 153–162
Faraway JJ (2016) Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models, vol 124. CRC Press, Boca Raton
Feldman SI (1979) Make—a program for maintaining computer programs. Software: practice and experience 9(4):255–265
Fisher RA (1925) Statistical methods for research workers. Genesis Publishing Pvt Ltd, London
Fowler M, Foemmel M (2006) Continuous integration. http://www.dccia.ua.es/dccia/inf/asignaturas/MADS/2013-14/lecturas/10_fowler_continuous_integration.pdf
Gallaba K, McIntosh S (2018) Use and misuse of continuous integration features: an empirical study of projects that (mis) use Travis CI. IEEE Trans Softw Eng 1–17
Gallaba K, Macho C, Pinzger M, McIntosh S (2018) Noise and heterogeneity in historical build data: an empirical study of Travis CI. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, 87–97
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Harrell FE (2001) Regression modeling strategies, with applications to linear models, survival analysis and logistic regression. Springer, Berlin
Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 426–437
Hilton M, Nelson N, Tunnell T, Marinov D, Dig D (2017) Trade-offs in continuous integration: assurance, security, and flexibility. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 197–207
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Howell DC (2014) Median absolute deviation. Wiley StatsRef: Statistics reference online
Huo C, Clause J (2014) Improving oracle quality by detecting brittle assertions and unused inputs in tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 621–631
Kampstra P, et al (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw 28:1–9
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621
Kumfert G, Epperly T (2002) Software in the DOE: the hidden overhead of “The Build”. Tech. rep., Lawrence Livermore National Lab., CA (US)
Laukkanen E, Mäntylä MV (2015) Build waiting time in continuous integration: an initial interdisciplinary literature review. In: Proceedings of the second international workshop on rapid continuous software engineering, pp 1–4
Lei Y, Andrews JH (2005) Minimization of randomized unit test cases. In: 16th IEEE international symposium on software reliability engineering, 2005. ISSRE 2005. IEEE, pp 10–pp
Leitner A, Oriol M, Zeller A, Ciupa I, Meyer B (2007) Efficient unit test case minimization. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 417–420
Lewis AJ (2009) Mixed effects models and extensions in ecology with R. Springer, Berlin
Liang J, Elbaum S, Rothermel G (2018) Redefining prioritization: continuous prioritization for continuous integration. In: Proceedings of the 40th international conference on software engineering. ACM, pp 688–698
McIntosh S, Adams B, Hassan AE (2012) The evolution of Java build systems. Empir Softw Eng 17(4-5):578–608
McIntosh S, Nagappan M, Adams B, Mockus A, Hassan AE (2015) A large-scale empirical study of the relationship between build technology and build maintenance. Empir Softw Eng 20(6):1587–1633
Meszaros G (2007) xUnit test patterns: refactoring test code Pearson education
Meyer M (2014) Continuous integration and its tools. IEEE Softw 31(3):14–16
Mokhov A, Mitchell N, Peyton Jones S, Marlow S (2016) Non-recursive make considered harmful: build systems at scale. In: Proceedings of the 9th international symposium on Haskell. ACM, pp 170–181
Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods Ecol Evol 4(2):133–142
Ni A, Li M (2017) Cost-effective build outcome prediction using cascaded classifiers. In: Proceedings of the 14th international conference on mining software repositories, pp 455–458
Online Appendix (2018) https://taher-ghaleb.github.io/papers/emse_2018/appendix.html
Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49(12):1373–1379
Pinheiro P (2010) Linear and nonlinear mixed effects models. R package version 3.1-97. http://cran.r-project.org/web/packages/nlme
Rasmusson J (2004) Long build trouble shooting guide. Proceedings of the Extreme Programming and Agile Methods-XP/Agile Universe Conference, pp 557–574
Rausch T, Hummer W, Leitner P, Schulte S (2017) An empirical analysis of build failures in the continuous integration workflows of Java-based open-source software. In: Proceedings of the 14th international conference on mining software repositories, pp 345–355
Rogers RO (2004) Scaling continuous integration. In: Proceedings of the international conference on extreme programming and agile processes in software engineering. Springer, pp 68–76
Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Should we really be using t-test and cohen’sd for evaluating group differences on the nsse and other surveys
Sarle W (1990) The VARCLUS procedure SAS/STAT user’s guide
Seo H, Sadowski C, Elbaum S, Aftandilian E, Bowdidge R (2014) Programmers’ build errors: a case study (at Google). In: Proceedings of the 36th international conference on software engineering. ACM, pp 724–734
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: ACM sigsoft software engineering notes, vol 30(4). ACM, pp 1–5
Smith P (2011) Software build systems: principles and experience. Addison-Wesley Professional, Reading
Sulír M, Porubän J (2016) A quantitative study of Java software buildability. In: Proceedings of the 7th international workshop on evaluation and usability of programming languages and tools. ACM, pp 17–25
Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2017) There and back again: can you compile that snapshot? Journal of Software: Evolution and Process 29(4):1–11
Van Deursen A, Moonen L, van den Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001), pp 92–95
Vandekerckhove J, Matzke D, Wagenmakers EJ (2015) Model comparison and the principle. In: The oxford handbook of computational and mathematical psychology, vol 300. Oxford library of psychology
Vasilescu B, Van Schuylenburg S, Wulms J, Serebrenik A, van den Brand MG (2014) Continuous integration in a social-coding world: empirical evidence from Github. In: Proceedings of the international conference on software maintenance and evolution (ICSME 2014). IEEE, pp 401–405
Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in GitHub. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 805–816
Vassallo C, Schermann G, Zampetti F, Romano D, Leitner P, Zaidman A, Di Penta M, Panichella s (2017) A tale of CI build failures: an open source and a financial organization perspective. In: Proceedings of the 33rd international conference on software maintenance and evolution
Winter B (2013) A very basic tutorial for performing linear mixed effects analyses. arXiv:1308.5499
Xia J, Li Y (2017) Could we predict the result of a continuous integration build? An empirical study. In: Proceedings of the IEEE international conference on software quality, reliability and security companion, pp 311–315
Xuan J, Monperrus M (2014) Test case purification for improving fault localization. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 52–63
Zolfagharinia M, Adams B, Guéhéneuc YG (2017) Do not trust build results at face value: an empirical study of 30 million CPAN builds. In: Proceedings of the 14th international conference on mining software repositories, pp 312–322
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Denys Poshyvanyk
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghaleb, T.A., da Costa, D.A. & Zou, Y. An empirical study of the long duration of continuous integration builds. Empir Software Eng 24, 2102–2139 (2019). https://doi.org/10.1007/s10664-019-09695-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09695-9