research-article

Lessons from eight years of operational data from a continuous integration service: an exploratory case study of CircleCI

Authors:
Keheliya Gallaba

Huawei Canada, Kingston, Canada

Huawei Canada, Kingston, Canada
View Profile

,
Maxime Lamothe

Polytechnique Montréal, Montréal, Canada

Polytechnique Montréal, Montréal, Canada
View Profile

,
Shane McIntosh

University of Waterloo, Waterloo, Canada

University of Waterloo, Waterloo, Canada
View Profile

ICSE '22: Proceedings of the 44th International Conference on Software EngineeringMay 2022Pages 1330–1342https://doi.org/10.1145/3510003.3510211

Published:05 July 2022Publication History

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 1330–1342

ABSTRACT

Continuous Integration (CI) is a popular practice that enables the rapid pace of modern software development. Cloud-based CI services have made CI ubiquitous by relieving software teams of the hassle of maintaining a CI infrastructure. To improve these CI services, prior research has focused on analyzing historical CI data to help service consumers. However, finding areas of improvement for CI service providers could also improve the experience for service consumers. To search for these opportunities, we conduct an empirical study of 22.2 million builds spanning 7,795 open-source projects that used CircleCI from 2012 to 2020.

First, we quantitatively analyze the builds (i.e., invocations of the CI service) with passing or failing outcomes. We observe that the heavy and typical service consumer groups spend significantly different proportions of time on seven of the nine build actions (e.g., dependency retrieval). On the other hand, the compilation and testing actions consistently consume a large proportion of build time across consumer groups (median 33%). Second, we study builds that terminate prior to generating a pass or fail signal. Through a systematic manual analysis, we find that availability issues, configuration errors, user cancellation, and exceeding time limits are key reasons that lead to premature build termination.

Our observations suggest that (1) heavy service consumers would benefit most from build acceleration approaches that tackle long build durations (e.g., skipping build steps) or high throughput rates (e.g., optimizing CI service job queues), (2) efficiency in CI pipelines can be improved for most CI consumers by focusing on the compilation and testing stages, and (3) avoiding misconfigurations and tackling service availability issues present the largest opportunities for improving the robustness of CI services.

References

R. Abdalkareem, S. Mujahid, and E. Shihab. A machine learning approach to improve the detection of CI skip commits. IEEE Transactions on Software Engineering (TSE), pages 2740--2754, 2020.Google Scholar
R. Abdalkareem, S. Mujahid, E. Shihab, and J. Rilling. Which commits can be CI skipped? IEEE Transactions on Software Engineering (TSE), pages 448--463, 2019.Google Scholar
S. Ananthanarayanan, M. S. Ardekani, D. Haenikel, B. Varadarajan, S. Soriano, D. Patel, and A.-R. Adl-Tabatabai. Keeping master green at scale. In Proc. of EuroSys Conference. ACM, 2019.Google ScholarDigital Library
C. AtLee, L. Blakk, J. O'Duinn, and A. Z. Gasparnian. Firefox release engineering. In A. Brown and G. Wilson, editors, The Architecture of Open Source Applications: Structure, Scale, and a Few More Fearless Hacks, chapter 2. Creative Commons, 2012.Google Scholar
M. Beller, G. Gousios, and A. Zaidman. Oops, my tests broke the build: An explorative analysis of travis CI with GitHub. In Proc. of International Conference on Mining Software Repositories (MSR), pages 356--367, 2017.Google ScholarDigital Library
M. Beller, G. Gousios, and A. Zaidman. TravisTorrent: Synthesizing travis CI and GitHub for full-stack research on continuous integration. In Proc. of International Conference on Mining Software Repositories (MSR), pages 447--450, 2017.Google ScholarDigital Library
N. Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3):494--509, 1993.Google ScholarCross Ref
T. Durieux, R. Abreu, M. Monperrus, T. F. Bissyandé, and L. Cruz. An analysis of 35+ million jobs of Travis CI. In Proc. of International Conference on Software Maintenance and Evolution (ICSME), pages 291--295, 2019.Google ScholarCross Ref
P. M. Duvall, S. Matyas, and A. Glover. Continuous Integration: Improving Software Quality and Reducing Risk. Pearson Education, 2007.Google ScholarDigital Library
H. Esfahani, J. Fietz, Q. Ke, A. Kolomiets, E. Lan, E. Mavrinac, W. Schulte, N. Sanches, and S. Kandula. CloudBuild: Microsoft's distributed and caching build service. In Proc. of International Conference on Software Engineering Companion (ICSE-C), pages 11--20, 2016.Google ScholarDigital Library
W. Felidré, L. Furtado, D. A. da Costa, B. Cartaxo, and G. Pinto. Continuous integration theater. In Proc. of International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1--10, 2019.Google ScholarCross Ref
K. Gallaba, Y. Junqueira, J. Ewart, and S. Mcintosh. Accelerating continuous integration by caching environments and inferring dependencies. IEEE Transactions on Software Engineering (TSE), 2020.Google Scholar
K. Gallaba, C. Macho, M. Pinzger, and S. McIntosh. Noise and heterogeneity in historical build data: an empirical study of Travis CI. In Proc. of International Conference on Automated Software Engineering (ASE), pages 87--97, 2018.Google ScholarDigital Library
K. Gallaba and S. McIntosh. Use and misuse of continuous integration features: An empirical study of projects that (mis)use Travis CI. IEEE Transactions on Software Engineering (TSE), 46(1):33--50, 2020.Google ScholarCross Ref
T. A. Ghaleb, D. A. da Costa, and Y. Zou. An empirical study of the long duration of continuous integration builds. Empirical Software Engineering (EMSE), 24(4):2102--2139, 2019.Google Scholar
C. Gini. On the measure of concentration with special reference to income and statistics. Colorado College Publication, General Series, 208(1):73--79, 1936.Google Scholar
O. Günalp, C. Escoffier, and P. Lalanda. Rondo: A tool suite for continuous deployment in dynamic environments. In Proc. of International Conference on Services Computing (SCC), 2015.Google ScholarDigital Library
S. Gupta, L. Ulanova, S. Bhardwaj, P. Dmitriev, P. Raff, and A. Fabijan. The anatomy of a large-scale experimentation platform. In Proc. of International Conference on Software Architecture (ICSA), 2018.Google ScholarCross Ref
F. Hassan and X. Wang. HireBuild: An Automatic Approach to History-Driven Repair of Build Scripts. In Proc. of International Conference on Software Engineering (ICSE), pages 1078--1089, 2018.Google ScholarDigital Library
M. Hilton, N. Nelson, T. Tunnell, D. Marinov, and D. Dig. Trade-offs in continuous integration: assurance, security, and flexibility. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 197--207, 2017.Google ScholarDigital Library
M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig. Usage, costs, and benefits of continuous integration in open-source projects. In Proc. of International Conference on Automated Software Engineering (ASE), pages 426--437, 2016.Google ScholarDigital Library
T. Hirao, S. McIntosh, A. Ihara, and K. Matsumoto. The review linkage graph for code review analytics: a recovery approach and empirical study. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 578--589, 2019.Google ScholarDigital Library
X. Jin and F. Servant. A cost-efficient approach to building in continuous integration. In Proc. of International Conference on Software Engineering (ICSE), pages 13--25, 2020.Google ScholarDigital Library
C. Le Goues, M. Pradel, and A. Roychoudhury. Automated program repair. Communications of the ACM, 62(12):56--65, Nov. 2019.Google ScholarDigital Library
M. Machalica, A. Samylkin, M. Porth, and S. Chandra. Predictive test selection. In Proc. of International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 91--100, 2019.Google ScholarDigital Library
C. Macho, S. McIntosh, and M. Pinzger. Automatically repairing dependency-related build breakage. In Proc. of International Conference on Software Analysis, Evolution, Reengineering (SANER), pages 106--117, 2018.Google ScholarCross Ref
H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, pages 50--60, 1947.Google ScholarCross Ref
A. Mesbah, A. Rice, E. Johnston, N. Glorioso, and E. Aftandilian. Deepdelta: Learning to repair compilation errors. In Proc. of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 925--936, 2019.Google ScholarDigital Library
M. R. Mesbahi, A. M. Rahmani, and M. Hosseinzadeh. Reliability and high availability in cloud computing environments: a reference roadmap. Humancentric Computing and Information Sciences, 8(1), 2018.Google Scholar
A. N. Meyer, L. E. Barton, G. C. Murphy, T. Zimmermann, and T. Fritz. The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering (TSE), 43(12):1178--1193, 2017.Google ScholarDigital Library
J. Pavlik, V. Sobeslav, and A. Komarek. Measurement of cloud computing services availability. In Proc. of the International Conference on Nature of Computation and Communication, pages 191--201, 2015.Google ScholarCross Ref
G. Pinto, F. Castor, R. Bonifacio, and M. Rebouças. Work practices and challenges in continuous integration: A survey with Travis CI users. Software: Practice and Experience, 48(12):2223--2236, 2018.Google ScholarCross Ref
T. Rausch, W. Hummer, P. Leitner, and S. Schulte. An empirical analysis of build failures in the continuous integration workflows of java-based open-source software. In Proc. of International Conference on Mining Software Repositories (MSR), pages 345--355, 2017.Google ScholarDigital Library
C. Rossi, E. Shibley, S. Su, K. Beck, T. Savor, and M. Stumm. Continuous deployment of mobile software at Facebook (showcase). In Proc. of International Symposium on Foundations of Software Engineering (FSE), pages 12--23, 2016.Google ScholarDigital Library
T. Savor, M. Douglas, M. Gentili, L. Williams, K. Beck, and M. Stumm. Continuous deployment at Facebook and OANDA. In Proc. of International Conference on Software Engineering Companion (ICSE-C), pages 21--30, 2016.Google ScholarDigital Library
G. Schermann and P. Leitner. Search-based scheduling of experiments in continuous deployment. In Proc. of International Conference on Software Maintenance and Evolution (ICSME), pages 485--495, 2018.Google ScholarCross Ref
H. Seo, C. Sadowski, S. Elbaum, E. Aftandilian, and R. Bowdidge. Programmers' build errors: a case study (at Google). In Proc. of International Conference on Software Engineering (ICSE), pages 724--734, 2014.Google ScholarDigital Library
M. A. Stephens. Edf statistics for goodness of fit and some comparisons. Journal of the American statistical Association, 69(347):730--737, 1974.Google Scholar
C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto. An empirical comparison of model validation techniques for defect prediction models. IEEE Transactions on Software Engineering (TSE), 43(1):1--18, 2017.Google ScholarDigital Library
A. M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. of the London mathematical society, 2(1):230--265, 1937.Google ScholarCross Ref
B. Vasilescu, S. van Schuylenburg, J. Wulms, A. Serebrenik, and M. G. J. van den Brand. Continuous integration in a social-coding world: Empirical evidence from github. ^**updated version with corrections^**. CoRR, abs/1512.01862, 2015.Google Scholar
B. Vasilescu, Y. Yu, H. Wang, P. Devanbu, and V. Filkov. Quality and productivity outcomes relating to continuous integration in GitHub. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 805--816, 2015.Google ScholarDigital Library
C. Vassallo, S. Proksch, H. C. Gall, and M. Di Penta. Automated reporting of anti-patterns and decay in continuous integration. In Proc. of International Conference on Software Engineering (ICSE), pages 105--115, 2019.Google ScholarDigital Library
C. Vassallo, S. Proksch, T. Zemp, and H. C. Gall. Un-break my build: Assisting developers with build repair hints. In Proc. of International Conference on Program Comprehension (ICPC), pages 41--51, 2018.Google ScholarDigital Library
C. Vassallo, S. Proksch, T. Zemp, and H. C. Gall. Every build you break: developer-oriented assistance for build failure resolution. Empirical Software Engineering (EMSE), 25(3):2218--2257, 2019.Google Scholar
D. G. Widder, M. Hilton, C. Kästner, and B. Vasilescu. I'm leaving you, Travis: a continuous integration breakup story. In Proc. of International Conference on Mining Software Repositories (MSR), pages 165--169, 2018.Google ScholarDigital Library
D. G. Widder, M. Hilton, C. Kästner, and B. Vasilescu. A conceptual replication of continuous integration pain points in the context of Travis CI. In Proc. of Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 647--658, 2019.Google ScholarDigital Library
X. Xie, L. Ma, F. Juefei-Xu, M. Xue, H. Chen, Y. Liu, J. Zhao, B. Li, J. Yin, and S. See. Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA), pages 146--157, 2019.Google ScholarDigital Library
F. Zampetti, C. Vassallo, S. Panichella, G. Canfora, H. Gall, and M. D. Penta. An empirical characterization of bad practices in continuous integration. Empirical Software Engineering (EMSE), 25(2):1095--1135, 2020.Google Scholar
C. Zhang, B. Chen, L. Chen, X. Peng, and W. Zhao. A large-scale empirical study of compiler errors in continuous integration. In Proc. of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 176--187, 2019.Google ScholarDigital Library
Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, and B. Vasilescu. The impact of continuous integration on other software development practices: A large-scale empirical study. In Proc. of International Conference on Automated Software Engineering (ASE), pages 60--71, 2017.Google ScholarCross Ref
M. Zolfagharinia, B. Adams, and Y.-G. Guéhéneuc. A study of build inflation in 30 million CPAN builds on 13 Perl versions and 10 operating systems. Empirical Software Engineering (EMSE), 24(6):3933--3971, 2019.Google Scholar

Index Terms

Lessons from eight years of operational data from a continuous integration service: an exploratory case study of CircleCI

Index terms have been assigned to the content through auto-classification.

Recommendations

Usage, costs, and benefits of continuous integration in open-source projects
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI rising as a big success story in automated software engineering, it has received almost no attention from the research community. For example, ...
Read More
The impact of a continuous integration service on the delivery time of merged pull requests
Abstract
Continuous Integration (CI) is a software development practice that builds and tests software frequently (e.g., at every push). One main motivator to adopt CI is the potential to deliver software functionalities more quickly than not using CI. ...
Read More
Noise and heterogeneity in historical build data: an empirical study of Travis CI
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Automated builds, which may pass or fail, provide feedback to a development team about changes to the codebase. A passing build indicates that the change compiles cleanly and tests (continue to) pass. A failing (a.k.a., broken) build indicates that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '22: Proceedings of the 44th International Conference on Software Engineering
May 2022
2508 pages
ISBN:9781450392211
DOI:10.1145/3510003
General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automated builds
build systems
continuous integration
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 232
  Total Downloads
- Downloads (Last 12 months)118
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lessons from eight years of operational data from a continuous integration service: an exploratory case study of CircleCI

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Usage, costs, and benefits of continuous integration in open-source projects

The impact of a continuous integration service on the delivery time of merged pull requests

Noise and heterogeneity in historical build data: an empirical study of Travis CI