skip to main content
10.1145/3510003.3510211acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Lessons from eight years of operational data from a continuous integration service: an exploratory case study of CircleCI

Published:05 July 2022Publication History

ABSTRACT

Continuous Integration (CI) is a popular practice that enables the rapid pace of modern software development. Cloud-based CI services have made CI ubiquitous by relieving software teams of the hassle of maintaining a CI infrastructure. To improve these CI services, prior research has focused on analyzing historical CI data to help service consumers. However, finding areas of improvement for CI service providers could also improve the experience for service consumers. To search for these opportunities, we conduct an empirical study of 22.2 million builds spanning 7,795 open-source projects that used CircleCI from 2012 to 2020.

First, we quantitatively analyze the builds (i.e., invocations of the CI service) with passing or failing outcomes. We observe that the heavy and typical service consumer groups spend significantly different proportions of time on seven of the nine build actions (e.g., dependency retrieval). On the other hand, the compilation and testing actions consistently consume a large proportion of build time across consumer groups (median 33%). Second, we study builds that terminate prior to generating a pass or fail signal. Through a systematic manual analysis, we find that availability issues, configuration errors, user cancellation, and exceeding time limits are key reasons that lead to premature build termination.

Our observations suggest that (1) heavy service consumers would benefit most from build acceleration approaches that tackle long build durations (e.g., skipping build steps) or high throughput rates (e.g., optimizing CI service job queues), (2) efficiency in CI pipelines can be improved for most CI consumers by focusing on the compilation and testing stages, and (3) avoiding misconfigurations and tackling service availability issues present the largest opportunities for improving the robustness of CI services.

References

  1. R. Abdalkareem, S. Mujahid, and E. Shihab. A machine learning approach to improve the detection of CI skip commits. IEEE Transactions on Software Engineering (TSE), pages 2740--2754, 2020.Google ScholarGoogle Scholar
  2. R. Abdalkareem, S. Mujahid, E. Shihab, and J. Rilling. Which commits can be CI skipped? IEEE Transactions on Software Engineering (TSE), pages 448--463, 2019.Google ScholarGoogle Scholar
  3. S. Ananthanarayanan, M. S. Ardekani, D. Haenikel, B. Varadarajan, S. Soriano, D. Patel, and A.-R. Adl-Tabatabai. Keeping master green at scale. In Proc. of EuroSys Conference. ACM, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. AtLee, L. Blakk, J. O'Duinn, and A. Z. Gasparnian. Firefox release engineering. In A. Brown and G. Wilson, editors, The Architecture of Open Source Applications: Structure, Scale, and a Few More Fearless Hacks, chapter 2. Creative Commons, 2012.Google ScholarGoogle Scholar
  5. M. Beller, G. Gousios, and A. Zaidman. Oops, my tests broke the build: An explorative analysis of travis CI with GitHub. In Proc. of International Conference on Mining Software Repositories (MSR), pages 356--367, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Beller, G. Gousios, and A. Zaidman. TravisTorrent: Synthesizing travis CI and GitHub for full-stack research on continuous integration. In Proc. of International Conference on Mining Software Repositories (MSR), pages 447--450, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3):494--509, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  8. T. Durieux, R. Abreu, M. Monperrus, T. F. Bissyandé, and L. Cruz. An analysis of 35+ million jobs of Travis CI. In Proc. of International Conference on Software Maintenance and Evolution (ICSME), pages 291--295, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  9. P. M. Duvall, S. Matyas, and A. Glover. Continuous Integration: Improving Software Quality and Reducing Risk. Pearson Education, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Esfahani, J. Fietz, Q. Ke, A. Kolomiets, E. Lan, E. Mavrinac, W. Schulte, N. Sanches, and S. Kandula. CloudBuild: Microsoft's distributed and caching build service. In Proc. of International Conference on Software Engineering Companion (ICSE-C), pages 11--20, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Felidré, L. Furtado, D. A. da Costa, B. Cartaxo, and G. Pinto. Continuous integration theater. In Proc. of International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1--10, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  12. K. Gallaba, Y. Junqueira, J. Ewart, and S. Mcintosh. Accelerating continuous integration by caching environments and inferring dependencies. IEEE Transactions on Software Engineering (TSE), 2020.Google ScholarGoogle Scholar
  13. K. Gallaba, C. Macho, M. Pinzger, and S. McIntosh. Noise and heterogeneity in historical build data: an empirical study of Travis CI. In Proc. of International Conference on Automated Software Engineering (ASE), pages 87--97, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Gallaba and S. McIntosh. Use and misuse of continuous integration features: An empirical study of projects that (mis)use Travis CI. IEEE Transactions on Software Engineering (TSE), 46(1):33--50, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. A. Ghaleb, D. A. da Costa, and Y. Zou. An empirical study of the long duration of continuous integration builds. Empirical Software Engineering (EMSE), 24(4):2102--2139, 2019.Google ScholarGoogle Scholar
  16. C. Gini. On the measure of concentration with special reference to income and statistics. Colorado College Publication, General Series, 208(1):73--79, 1936.Google ScholarGoogle Scholar
  17. O. Günalp, C. Escoffier, and P. Lalanda. Rondo: A tool suite for continuous deployment in dynamic environments. In Proc. of International Conference on Services Computing (SCC), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Gupta, L. Ulanova, S. Bhardwaj, P. Dmitriev, P. Raff, and A. Fabijan. The anatomy of a large-scale experimentation platform. In Proc. of International Conference on Software Architecture (ICSA), 2018.Google ScholarGoogle ScholarCross RefCross Ref
  19. F. Hassan and X. Wang. HireBuild: An Automatic Approach to History-Driven Repair of Build Scripts. In Proc. of International Conference on Software Engineering (ICSE), pages 1078--1089, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Hilton, N. Nelson, T. Tunnell, D. Marinov, and D. Dig. Trade-offs in continuous integration: assurance, security, and flexibility. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 197--207, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig. Usage, costs, and benefits of continuous integration in open-source projects. In Proc. of International Conference on Automated Software Engineering (ASE), pages 426--437, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Hirao, S. McIntosh, A. Ihara, and K. Matsumoto. The review linkage graph for code review analytics: a recovery approach and empirical study. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 578--589, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Jin and F. Servant. A cost-efficient approach to building in continuous integration. In Proc. of International Conference on Software Engineering (ICSE), pages 13--25, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Le Goues, M. Pradel, and A. Roychoudhury. Automated program repair. Communications of the ACM, 62(12):56--65, Nov. 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Machalica, A. Samylkin, M. Porth, and S. Chandra. Predictive test selection. In Proc. of International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 91--100, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Macho, S. McIntosh, and M. Pinzger. Automatically repairing dependency-related build breakage. In Proc. of International Conference on Software Analysis, Evolution, Reengineering (SANER), pages 106--117, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  27. H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, pages 50--60, 1947.Google ScholarGoogle ScholarCross RefCross Ref
  28. A. Mesbah, A. Rice, E. Johnston, N. Glorioso, and E. Aftandilian. Deepdelta: Learning to repair compilation errors. In Proc. of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 925--936, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. R. Mesbahi, A. M. Rahmani, and M. Hosseinzadeh. Reliability and high availability in cloud computing environments: a reference roadmap. Humancentric Computing and Information Sciences, 8(1), 2018.Google ScholarGoogle Scholar
  30. A. N. Meyer, L. E. Barton, G. C. Murphy, T. Zimmermann, and T. Fritz. The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering (TSE), 43(12):1178--1193, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Pavlik, V. Sobeslav, and A. Komarek. Measurement of cloud computing services availability. In Proc. of the International Conference on Nature of Computation and Communication, pages 191--201, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  32. G. Pinto, F. Castor, R. Bonifacio, and M. Rebouças. Work practices and challenges in continuous integration: A survey with Travis CI users. Software: Practice and Experience, 48(12):2223--2236, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  33. T. Rausch, W. Hummer, P. Leitner, and S. Schulte. An empirical analysis of build failures in the continuous integration workflows of java-based open-source software. In Proc. of International Conference on Mining Software Repositories (MSR), pages 345--355, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Rossi, E. Shibley, S. Su, K. Beck, T. Savor, and M. Stumm. Continuous deployment of mobile software at Facebook (showcase). In Proc. of International Symposium on Foundations of Software Engineering (FSE), pages 12--23, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Savor, M. Douglas, M. Gentili, L. Williams, K. Beck, and M. Stumm. Continuous deployment at Facebook and OANDA. In Proc. of International Conference on Software Engineering Companion (ICSE-C), pages 21--30, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. Schermann and P. Leitner. Search-based scheduling of experiments in continuous deployment. In Proc. of International Conference on Software Maintenance and Evolution (ICSME), pages 485--495, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  37. H. Seo, C. Sadowski, S. Elbaum, E. Aftandilian, and R. Bowdidge. Programmers' build errors: a case study (at Google). In Proc. of International Conference on Software Engineering (ICSE), pages 724--734, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. A. Stephens. Edf statistics for goodness of fit and some comparisons. Journal of the American statistical Association, 69(347):730--737, 1974.Google ScholarGoogle Scholar
  39. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto. An empirical comparison of model validation techniques for defect prediction models. IEEE Transactions on Software Engineering (TSE), 43(1):1--18, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. of the London mathematical society, 2(1):230--265, 1937.Google ScholarGoogle ScholarCross RefCross Ref
  41. B. Vasilescu, S. van Schuylenburg, J. Wulms, A. Serebrenik, and M. G. J. van den Brand. Continuous integration in a social-coding world: Empirical evidence from github. **updated version with corrections**. CoRR, abs/1512.01862, 2015.Google ScholarGoogle Scholar
  42. B. Vasilescu, Y. Yu, H. Wang, P. Devanbu, and V. Filkov. Quality and productivity outcomes relating to continuous integration in GitHub. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 805--816, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. C. Vassallo, S. Proksch, H. C. Gall, and M. Di Penta. Automated reporting of anti-patterns and decay in continuous integration. In Proc. of International Conference on Software Engineering (ICSE), pages 105--115, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Vassallo, S. Proksch, T. Zemp, and H. C. Gall. Un-break my build: Assisting developers with build repair hints. In Proc. of International Conference on Program Comprehension (ICPC), pages 41--51, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. C. Vassallo, S. Proksch, T. Zemp, and H. C. Gall. Every build you break: developer-oriented assistance for build failure resolution. Empirical Software Engineering (EMSE), 25(3):2218--2257, 2019.Google ScholarGoogle Scholar
  46. D. G. Widder, M. Hilton, C. Kästner, and B. Vasilescu. I'm leaving you, Travis: a continuous integration breakup story. In Proc. of International Conference on Mining Software Repositories (MSR), pages 165--169, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. G. Widder, M. Hilton, C. Kästner, and B. Vasilescu. A conceptual replication of continuous integration pain points in the context of Travis CI. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 647--658, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. X. Xie, L. Ma, F. Juefei-Xu, M. Xue, H. Chen, Y. Liu, J. Zhao, B. Li, J. Yin, and S. See. Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA), pages 146--157, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. F. Zampetti, C. Vassallo, S. Panichella, G. Canfora, H. Gall, and M. D. Penta. An empirical characterization of bad practices in continuous integration. Empirical Software Engineering (EMSE), 25(2):1095--1135, 2020.Google ScholarGoogle Scholar
  50. C. Zhang, B. Chen, L. Chen, X. Peng, and W. Zhao. A large-scale empirical study of compiler errors in continuous integration. In Proc. of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 176--187, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, and B. Vasilescu. The impact of continuous integration on other software development practices: A large-scale empirical study. In Proc. of International Conference on Automated Software Engineering (ASE), pages 60--71, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  52. M. Zolfagharinia, B. Adams, and Y.-G. Guéhéneuc. A study of build inflation in 30 million CPAN builds on 13 Perl versions and 10 operating systems. Empirical Software Engineering (EMSE), 24(6):3933--3971, 2019.Google ScholarGoogle Scholar

Index Terms

  1. Lessons from eight years of operational data from a continuous integration service: an exploratory case study of CircleCI
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICSE '22: Proceedings of the 44th International Conference on Software Engineering
            May 2022
            2508 pages
            ISBN:9781450392211
            DOI:10.1145/3510003

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 July 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate276of1,856submissions,15%

            Upcoming Conference

            ICSE 2025

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader