A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance

McIntosh, Shane; Nagappan, Meiyappan; Adams, Bram; Mockus, Audris; Hassan, Ahmed E.

doi:10.1007/s10664-014-9324-x

A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance

Published: 01 August 2014

Volume 20, pages 1587–1633, (2015)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Shane McIntosh¹,
Meiyappan Nagappan¹,
Bram Adams²,
Audris Mockus^3,4 &
…
Ahmed E. Hassan¹

1042 Accesses
29 Citations
Explore all metrics

Abstract

Build systems specify how source code is translated into deliverables. They require continual maintenance as the system they build evolves. This build maintenance can become so burdensome that projects switch build technologies, potentially having to rewrite thousands of lines of build code. We aim to understand the prevalence of different build technologies and the relationship between build technology and build maintenance by analyzing version histories in a corpus of 177,039 repositories spread across four software forges, three software ecosystems, and four large-scale projects. We study low-level, abstraction-based, and framework-driven build technologies, as well as tools that automatically manage external dependencies. We find that modern, framework-driven build technologies need to be maintained more often and these build changes are more tightly coupled with the source code than low-level or abstraction-based ones. However, build technology migrations tend to coincide with a shift of build maintenance work to a build-focused team, deferring the cost of build maintenance to them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Design Science Research

Challenges of Low-Code/No-Code Software Development: A Literature Review

Business Process Reengineering: Issues and Challenges

Notes

References

Adams B, De Schutter K, Tromp H, Meuter W (2007) Design recovery and maintenance of build systems. In: Proceedings of the 23rd int’l conference on software maintenance (ICSM), pp 114–123
Adams B, Schutter KD, Tromp H, Meuter WD (2008) The evolution of the Linux Build System. Electronic Communications of the ECEASST 8
Al-Kofahi JM, Nguyen HV, Nguyen AT, Nguyen TT, Nguyen TN (2012) Detecting semantic changes in Makefile Build Code. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 150–159
Bauer DF (1972) Constructing confidence sets using rank statistics. J Am Stat Assoc 67(339):687–690
Article MATH Google Scholar
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced? Bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the European software engineering conference and the symposium on the foundations of software engineering (ESEC/FSE), pp 121–130
Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009b) The promises and perils of mining git. In: Proceedings of the 6th working conference on mining software repositories (MSR)
Dietrich C, Tartler R, Schröder-Preikschat W, Lohmann D (2012) A robust approach for variability extraction from the Linux Build System. In: Proceedings of the 16th int’l software product line conference (SPLC), pp 21–30
Ebersole S (2007) Maven migration. http://lists.jboss.org/pipermail/hibernate-dev/2007-May/002075.html, last viewed: 18 Mar 2010
Feldman S (1979) Make—a program for maintaining computer programs. Softw - Pract Exp 9 (4): 255–265
Article MATH Google Scholar
Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the 14th int’l conference on software maintenance (ICSM), pp 190–198
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. Trans Softw Eng (TSE) 26(7):653–661
Article Google Scholar
Grimmer L (2010) Building MySQL server with CMake on Linux/Unix. http://www.lenzg.net/archives/291-Building-MySQL-Server-with-CMake-on-LinuxUnix.html, Last viewed: 20 Aug 2010
Herraiz I, Robles G, Gonzalez-Barahona J, Capiluppi A, Ramil J (2006) Comparison between SLOCs and number of files as size metrics for software evolution analysis. In: Proceedings of the 10th European conference on software maintenance and reengineering (CSMR), pp 213–221
Hochstein L, Jiao Y (2011) The cost of the build tax in scientific software. In: Proceedings of the 5th international symposium on empirical software engineering and measurement (ESEM), pp 384–387
Humble J, Farley D (2010) Continuous delivery: reliable software releases through build, test, and deployment automation. Addison-Wesley, Reading
Kampstra P (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw, Code Snippets 28(1):1–9. http://www.jstatsoft.org/v28/c01/
Google Scholar
Lawrence R (2004) The space efficience of XML. Information and software technology (IST) 46 (11): 753–759
Article Google Scholar
Linden Labs (2010) CMake. http://wiki.secondlife.com/wiki/CMake, Last viewed: 20 Aug 2010
McIntosh S, Adams B, Nguyen THD, Kamei Y, Hassan AE (2011) An empirical study of build maintenance effort. In: Proceedings of the 33rd int’l conference on software engineering (ICSE), pp 141–150
McIntosh S, Adams B, Hassan AE (2012) The evolution of Java build systems. Empir Softw Eng 17(4–5):578–608
Article Google Scholar
Miller P (1998) Recursive make considered harmful. In: Australian Unix User Group Newsletter, vol 19, pp 14–25
Miller RG (1981) Simultaneous statistical inference. Springer, Berlin
Book MATH Google Scholar
Mockus A (2007) Software support tools and experimental work. In: Proc of the int’l conference on empirical software engineering issues: critical assessment and future directions, pp 91–99
Mockus A (2009) Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th working conference on mining software repositories (MSR), pp 11–20
Nadi S, Holt R (2011) Make it or break it: mining anomalies in Linux Kbuild. In: Proceedings of the 18th working conference on reverse engineering (WCRE), pp 315–324
Nadi S, Holt R (2012) Mining Kbuild to detect variability anomalies in Linux. In: Proceedings of the 16th European conference on software maintenance and reengineering (CSMR), pp 107–116
Neitsch A, Wong K, Godfrey MW (2012) Build system issues in multilanguage software. In: Proceedings of the 28th int’l conference on software maintenance, pp 140–149
Neundorf A (2010) Why the KDE project switched to CMake—and how (continued). http://lwn.net/Articles/188693/, last viewed: 06 Mar 2010
Neville-Neal GV (2009) Kode vicious: system changes and side effects. Commun ACM 52 (4): 25–26
Article Google Scholar
Nguyen THD, Adams B, Hassan AE (2010) A case study of bias in bug-fix datasets. In: Proceedings of the 17th working conference on reverse engineering (WCRE), pp 259–268
Savage B (2010) Build systems: relevancy of automated builds in a web world. http://www.brandonsavage.net/build-systems-relevancy-of-automated-builds-in-a-web-world/
Smith P (2011) Software build systems: principles and experience, 1st edn. Addison-Wesley, Reading
Suvorov R, Nagappan M, Hassan AE, Zou Y, Adams B (2012) An empirical study of build system migrations in practice: case studies on KDE and the Linux Kernel. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 160–169
Tamrawi A, Nguyen HA, Nguyen HV, Nguyen T (2012) Build code analysis with symbolic evaluation. In: Proceedings of the 34th int’l conference on software engineering (ICSE), pp 650–660
Tu Q, Godfrey M (2002) The build-time software architecture view. In: Proceedings of int’l conference on software maintenance (ICSM), pp 398–407
Zadok E (2002) Overhauling Amd for the ’00s: a case study of GNU Autotools. In: Proceedings of the FREENIX track on the USENIX technical conference. USENIX Association, pp 287–297

Download references

Author information

Authors and Affiliations

Software Analysis and Intelligence Lab (SAIL), Queen’s University, Kingston, Canada
Shane McIntosh, Meiyappan Nagappan & Ahmed E. Hassan
Lab on Maintenance, Construction, and Intelligence of Software (MCIS), Polytechnique Montréal, Montréal, Canada
Bram Adams
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
Audris Mockus
Avaya Labs Research, 233 Mt Airy Rd, Basking Ridge, NJ, USA
Audris Mockus

Authors

Shane McIntosh
View author publications
You can also search for this author in PubMed Google Scholar
Meiyappan Nagappan
View author publications
You can also search for this author in PubMed Google Scholar
Bram Adams
View author publications
You can also search for this author in PubMed Google Scholar
Audris Mockus
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shane McIntosh.

Additional information

Communicated by: Maurizio Morisio

Appendices

Appendix

1.1 A Build Technology Examples

In this appendix, we briefly describe how each of the studied technologies can be used to specify a simple build system.

1.2 A.1 Low-Level

Figure 17 provides working examples of the five studied low-level build technologies.

Make

One of the earliest build technologies on record is Feldman’s make tool (Feldman 1979), which automatically synchronizes program sources with deliverables. Make specifications outline target-dependency-recipe tuples. Targets specify files created by a recipe, i.e., a shell script that is executed when the target either: (1) does not exist, or (2) is older than one or more of its dependencies, i.e., a list of other files and targets.

The make specification snippet in Fig. 17a describes three target-dependency-recipe tuples. Lines 2, 4, and 7 list targets to the left of the colons and dependency lists to the right. Recipes are specified for the main.o and example targets on lines 5 and 8. Line 1 of Fig. 17a specifies that the all target is phony, representing an abstract phase in the build process rather than a concrete file in the filesystem.

Jam

Jam provides a more procedural-style structure for target-dependency-recipe tuples. Figure 17b shows how rules (the equivalent of make tuples) can be specified (lines 1–4 and 10–13). Dependencies are expressed by invoking the built-in Depends rule on lines 2 and 11. Jam actions (the equivalent of make recipes) for C compilation and object code linking are defined on lines 6–8 and 15–17 respectively.

Ant

Ant borrows the target-dependency-recipe concept from make, however all Ant targets are abstract. When an Ant target is triggered, a list of specified tasks (the equivalent of make recipes) are invoked. Ant tasks execute Java code rather than shell scripts to synchronize sources with deliverables.

Figure 17c shows an Ant specification that describes two targets, i.e., compile (lines 2–8) and link (lines 10–18). The compile target invokes the javac task (lines 3–7), which executes the javac compiler. The link target invokes the jar task (lines 14–17), which executes the jar command. The dependency between the link and compile targets is expressed on line 12 using the depends target attribute.

SCons

SCons provides several advanced build system features (e.g., implicit dependency tracking for popular programming languages) and allows maintainers to write highly portable build specifications using Python. Line 7 of Fig. 17d shows how a binary example can be assembled from object code. Line 5 shows how object code can be generated using SCons built-in support for C ++ compilation. Environmental settings (e.g., compilers, linkers, and flags) are automatically detected, however parameters passed to the Environment() function call will override the detected settings, as shown on line 1.

Rake

Rake is a modern build tool with advanced support for building Ruby applications. Similar to SCons, Rake specifications are written in a high-level scripting language (i.e., Ruby), to give build maintainers the power to express complex relationships and transformations in a highly portable language. Similar to Ant, Rake tasks (the equivalent of targets in make) are abstract.

The example snippet in Fig. 17e shows how a unit testing task utest can be specified (lines 3–5). Line 4 describes the recipe that is executed when utest is triggered. Line 1 specifies that the default target depends upon the utest target.

1.3 A.2 Abstraction-Based

Figure 18 provides working examples of the two studied abstraction-based technologies.

Autotools

GNU Autotools specifications describe external and internal dependencies, configurable compile-time features, and platform requirements. These specifications are parsed to generate make specifications that satisfy the described constraints.

Autotools is actually a large collection of build tools that work together to generate build systems according to specifications. Two of the most commonly used tools are autoconf and automake, for which we provide example specifications in Fig. 18a and b respectively. Lines 1 and 2 of Fig. 18a initialize the autoconf environment, specifying that our project name is example version 1.0 and that automake is also necessary. Line 3 specifies an environment dependency on a C compiler, while lines 4 and 5 request that the configuration step store preprocessor directives in a file named config.h, and store the build system implementation in a file called Makefile. Line 1 of Fig. 18b specifies that a deliverable called example should be constructed during the build process and that it should be deployed in the bin directory. Line 2 states that main.c is a source file that should be compiled and linked into the example binary.

CMake

Similar to Autotools, CMake abstractions can be used to generate make specifications, but can also generate Microsoft Visual Studio and Apple Xcode project files. Figure 18c specifies that a build system should be generated to produce a binary called example by compiling and linking main.cc (line 4) as a part of a project called Example (line 2). Line 1 denotes that CMake version 2.6 (or later) should be used to parse the specification.

1.4 A.3 Framework-Driven

Below we describe the studied Maven framework-driven technology.

Maven

Maven assumes that source and test files are placed in default locations and that projects adhere to a typical Java dependency policy, unless otherwise specified. If projects abide by the conventions, Maven can infer build behaviour automatically without any explicit specification. For example, Fig. 19a does not specify a location for source or output files. Convention specifies that source and unit test code appear under src/main/java and src/test/java respectively.

Lines 10–18 of Fig. 19a show how the Maven convention can be overridden through configuration. The Java compiler is instructed to operate in Java 1.5 source mode (line 15), and generate bytecode that is compatible with the Java 1.7 runtime environment (line 16).

1.5 A.4 Dependency Management

Figure 19 provides working examples of dependency management in Maven (Fig. 19a) and the two studied dependency management technologies (Fig. 19b and c).

Maven

In addition to providing a framework-driven build environment, Maven doubles as a dependency management technology. Lines 22–26 of Fig. 19a provide an example dependency declaration on the JUnit tool, version 3.8.1 (Figs. 20 and 21).

Ivy

Ivy provides dependency management features that are most notably leveraged by Ant. Figure 19b shows an Ivy specification for the same JUnit dependency as depicted in Fig. 19a.

Bundler

Bundler provides packaging and dependency management for Ruby applications. Line 1 of Fig. 19c specifies that bundler should download gems, i.e., Ruby packages, from the given host. Lines 2 and 3 specify dependencies on Rake version 10.0.3 (at least) and rspec version 2.13.0 (exact).

B Additional Build Maintenance Figures

We perform longitudinal analyses of the Tukey HSD ranks for each metric in the forges to complement our median-based analyses in Section 6. Figures 20 and 21 show only the first twelve months of history and the top three ranks to improve the readability of the figures. Unfiltered figures are available online.^{Footnote 15}

Rights and permissions

Reprints and permissions

About this article

Cite this article

McIntosh, S., Nagappan, M., Adams, B. et al. A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance. Empir Software Eng 20, 1587–1633 (2015). https://doi.org/10.1007/s10664-014-9324-x

Download citation

Published: 01 August 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10664-014-9324-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance

Abstract

Access this article