Skip to main content
Log in

A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Build systems specify how source code is translated into deliverables. They require continual maintenance as the system they build evolves. This build maintenance can become so burdensome that projects switch build technologies, potentially having to rewrite thousands of lines of build code. We aim to understand the prevalence of different build technologies and the relationship between build technology and build maintenance by analyzing version histories in a corpus of 177,039 repositories spread across four software forges, three software ecosystems, and four large-scale projects. We study low-level, abstraction-based, and framework-driven build technologies, as well as tools that automatically manage external dependencies. We find that modern, framework-driven build technologies need to be maintained more often and these build changes are more tightly coupled with the source code than low-level or abstraction-based ones. However, build technology migrations tend to coincide with a shift of build maintenance work to a build-focused team, deferring the cost of build maintenance to them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://en.wikipedia.org/wiki/List_of_build_automation_software

  2. http://www.cmake.org/

  3. http://maven.apache.org/

  4. http://ant.apache.org/ivy/

  5. http://lists.kde.org/?l=kde-core-devel&m=95953244511288&w=4

  6. http://argouml.tigris.org/ds/viewMessage.do?dsForumId=450&dsMessageId=2618367

  7. https://github.com/github/linguist/

  8. http://sailhome.cs.queensu.ca/replication/shane/EMSE2013/

  9. Threshold values of 5 % and 15 % yielded similar results.

  10. https://wiki.openoffice.org/wiki/Build_System_Analysis

  11. https://github.com/github/linguist/

  12. http://sailhome.cs.queensu.ca/replication/shane/EMSE2013/

  13. http://en.wikipedia.org/wiki/List_of_build_automation_software

  14. http://travis-ci.org/

  15. http://sailhome.cs.queensu.ca/replication/shane/EMSE2013/

References

  • Adams B, De Schutter K, Tromp H, Meuter W (2007) Design recovery and maintenance of build systems. In: Proceedings of the 23rd int’l conference on software maintenance (ICSM), pp 114–123

  • Adams B, Schutter KD, Tromp H, Meuter WD (2008) The evolution of the Linux Build System. Electronic Communications of the ECEASST 8

  • Al-Kofahi JM, Nguyen HV, Nguyen AT, Nguyen TT, Nguyen TN (2012) Detecting semantic changes in Makefile Build Code. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 150–159

  • Bauer DF (1972) Constructing confidence sets using rank statistics. J Am Stat Assoc 67(339):687–690

    Article  MATH  Google Scholar 

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced? Bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the European software engineering conference and the symposium on the foundations of software engineering (ESEC/FSE), pp 121–130

  • Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009b) The promises and perils of mining git. In: Proceedings of the 6th working conference on mining software repositories (MSR)

  • Dietrich C, Tartler R, Schröder-Preikschat W, Lohmann D (2012) A robust approach for variability extraction from the Linux Build System. In: Proceedings of the 16th int’l software product line conference (SPLC), pp 21–30

  • Ebersole S (2007) Maven migration. http://lists.jboss.org/pipermail/hibernate-dev/2007-May/002075.html, last viewed: 18 Mar 2010

  • Feldman S (1979) Make—a program for maintaining computer programs. Softw - Pract Exp 9 (4): 255–265

    Article  MATH  Google Scholar 

  • Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the 14th int’l conference on software maintenance (ICSM), pp 190–198

  • Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. Trans Softw Eng (TSE) 26(7):653–661

    Article  Google Scholar 

  • Grimmer L (2010) Building MySQL server with CMake on Linux/Unix. http://www.lenzg.net/archives/291-Building-MySQL-Server-with-CMake-on-LinuxUnix.html, Last viewed: 20 Aug 2010

  • Herraiz I, Robles G, Gonzalez-Barahona J, Capiluppi A, Ramil J (2006) Comparison between SLOCs and number of files as size metrics for software evolution analysis. In: Proceedings of the 10th European conference on software maintenance and reengineering (CSMR), pp 213–221

  • Hochstein L, Jiao Y (2011) The cost of the build tax in scientific software. In: Proceedings of the 5th international symposium on empirical software engineering and measurement (ESEM), pp 384–387

  • Humble J, Farley D (2010) Continuous delivery: reliable software releases through build, test, and deployment automation. Addison-Wesley, Reading

  • Kampstra P (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw, Code Snippets 28(1):1–9. http://www.jstatsoft.org/v28/c01/

    Google Scholar 

  • Lawrence R (2004) The space efficience of XML. Information and software technology (IST) 46 (11): 753–759

    Article  Google Scholar 

  • Linden Labs (2010) CMake. http://wiki.secondlife.com/wiki/CMake, Last viewed: 20 Aug 2010

  • McIntosh S, Adams B, Nguyen THD, Kamei Y, Hassan AE (2011) An empirical study of build maintenance effort. In: Proceedings of the 33rd int’l conference on software engineering (ICSE), pp 141–150

  • McIntosh S, Adams B, Hassan AE (2012) The evolution of Java build systems. Empir Softw Eng 17(4–5):578–608

    Article  Google Scholar 

  • Miller P (1998) Recursive make considered harmful. In: Australian Unix User Group Newsletter, vol 19, pp 14–25

  • Miller RG (1981) Simultaneous statistical inference. Springer, Berlin

    Book  MATH  Google Scholar 

  • Mockus A (2007) Software support tools and experimental work. In: Proc of the int’l conference on empirical software engineering issues: critical assessment and future directions, pp 91–99

  • Mockus A (2009) Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th working conference on mining software repositories (MSR), pp 11–20

  • Nadi S, Holt R (2011) Make it or break it: mining anomalies in Linux Kbuild. In: Proceedings of the 18th working conference on reverse engineering (WCRE), pp 315–324

  • Nadi S, Holt R (2012) Mining Kbuild to detect variability anomalies in Linux. In: Proceedings of the 16th European conference on software maintenance and reengineering (CSMR), pp 107–116

  • Neitsch A, Wong K, Godfrey MW (2012) Build system issues in multilanguage software. In: Proceedings of the 28th int’l conference on software maintenance, pp 140–149

  • Neundorf A (2010) Why the KDE project switched to CMake—and how (continued). http://lwn.net/Articles/188693/, last viewed: 06 Mar 2010

  • Neville-Neal GV (2009) Kode vicious: system changes and side effects. Commun ACM 52 (4): 25–26

    Article  Google Scholar 

  • Nguyen THD, Adams B, Hassan AE (2010) A case study of bias in bug-fix datasets. In: Proceedings of the 17th working conference on reverse engineering (WCRE), pp 259–268

  • Savage B (2010) Build systems: relevancy of automated builds in a web world. http://www.brandonsavage.net/build-systems-relevancy-of-automated-builds-in-a-web-world/

  • Smith P (2011) Software build systems: principles and experience, 1st edn. Addison-Wesley, Reading

  • Suvorov R, Nagappan M, Hassan AE, Zou Y, Adams B (2012) An empirical study of build system migrations in practice: case studies on KDE and the Linux Kernel. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 160–169

  • Tamrawi A, Nguyen HA, Nguyen HV, Nguyen T (2012) Build code analysis with symbolic evaluation. In: Proceedings of the 34th int’l conference on software engineering (ICSE), pp 650–660

  • Tu Q, Godfrey M (2002) The build-time software architecture view. In: Proceedings of int’l conference on software maintenance (ICSM), pp 398–407

  • Zadok E (2002) Overhauling Amd for the ’00s: a case study of GNU Autotools. In: Proceedings of the FREENIX track on the USENIX technical conference. USENIX Association, pp 287–297

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shane McIntosh.

Additional information

Communicated by: Maurizio Morisio

Appendices

Appendix

1.1 A Build Technology Examples

In this appendix, we briefly describe how each of the studied technologies can be used to specify a simple build system.

1.2 A.1 Low-Level

Figure 17 provides working examples of the five studied low-level build technologies.

Fig. 17
figure 17

Example low-level technology specifications

Make

One of the earliest build technologies on record is Feldman’s make tool (Feldman 1979), which automatically synchronizes program sources with deliverables. Make specifications outline target-dependency-recipe tuples. Targets specify files created by a recipe, i.e., a shell script that is executed when the target either: (1) does not exist, or (2) is older than one or more of its dependencies, i.e., a list of other files and targets.

The make specification snippet in Fig. 17a describes three target-dependency-recipe tuples. Lines 2, 4, and 7 list targets to the left of the colons and dependency lists to the right. Recipes are specified for the main.o and example targets on lines 5 and 8. Line 1 of Fig. 17a specifies that the all target is phony, representing an abstract phase in the build process rather than a concrete file in the filesystem.

Jam

Jam provides a more procedural-style structure for target-dependency-recipe tuples. Figure 17b shows how rules (the equivalent of make tuples) can be specified (lines 1–4 and 10–13). Dependencies are expressed by invoking the built-in Depends rule on lines 2 and 11. Jam actions (the equivalent of make recipes) for C compilation and object code linking are defined on lines 6–8 and 15–17 respectively.

Ant

Ant borrows the target-dependency-recipe concept from make, however all Ant targets are abstract. When an Ant target is triggered, a list of specified tasks (the equivalent of make recipes) are invoked. Ant tasks execute Java code rather than shell scripts to synchronize sources with deliverables.

Figure 17c shows an Ant specification that describes two targets, i.e., compile (lines 2–8) and link (lines 10–18). The compile target invokes the javac task (lines 3–7), which executes the javac compiler. The link target invokes the jar task (lines 14–17), which executes the jar command. The dependency between the link and compile targets is expressed on line 12 using the depends target attribute.

SCons

SCons provides several advanced build system features (e.g., implicit dependency tracking for popular programming languages) and allows maintainers to write highly portable build specifications using Python. Line 7 of Fig. 17d shows how a binary example can be assembled from object code. Line 5 shows how object code can be generated using SCons built-in support for C ++ compilation. Environmental settings (e.g., compilers, linkers, and flags) are automatically detected, however parameters passed to the Environment() function call will override the detected settings, as shown on line 1.

Rake

Rake is a modern build tool with advanced support for building Ruby applications. Similar to SCons, Rake specifications are written in a high-level scripting language (i.e., Ruby), to give build maintainers the power to express complex relationships and transformations in a highly portable language. Similar to Ant, Rake tasks (the equivalent of targets in make) are abstract.

The example snippet in Fig. 17e shows how a unit testing task utest can be specified (lines 3–5). Line 4 describes the recipe that is executed when utest is triggered. Line 1 specifies that the default target depends upon the utest target.

1.3 A.2 Abstraction-Based

Figure 18 provides working examples of the two studied abstraction-based technologies.

Fig. 18
figure 18

Example abstraction-based technology specifications

Autotools

GNU Autotools specifications describe external and internal dependencies, configurable compile-time features, and platform requirements. These specifications are parsed to generate make specifications that satisfy the described constraints.

Autotools is actually a large collection of build tools that work together to generate build systems according to specifications. Two of the most commonly used tools are autoconf and automake, for which we provide example specifications in Fig. 18a and b respectively. Lines 1 and 2 of Fig. 18a initialize the autoconf environment, specifying that our project name is example version 1.0 and that automake is also necessary. Line 3 specifies an environment dependency on a C compiler, while lines 4 and 5 request that the configuration step store preprocessor directives in a file named config.h, and store the build system implementation in a file called Makefile. Line 1 of Fig. 18b specifies that a deliverable called example should be constructed during the build process and that it should be deployed in the bin directory. Line 2 states that main.c is a source file that should be compiled and linked into the example binary.

CMake

Similar to Autotools, CMake abstractions can be used to generate make specifications, but can also generate Microsoft Visual Studio and Apple Xcode project files. Figure 18c specifies that a build system should be generated to produce a binary called example by compiling and linking main.cc (line 4) as a part of a project called Example (line 2). Line 1 denotes that CMake version 2.6 (or later) should be used to parse the specification.

1.4 A.3 Framework-Driven

Below we describe the studied Maven framework-driven technology.

Maven

Maven assumes that source and test files are placed in default locations and that projects adhere to a typical Java dependency policy, unless otherwise specified. If projects abide by the conventions, Maven can infer build behaviour automatically without any explicit specification. For example, Fig. 19a does not specify a location for source or output files. Convention specifies that source and unit test code appear under src/main/java and src/test/java respectively.

Fig. 19
figure 19

Example Framework-driven and dependency management technology specifications

Lines 10–18 of Fig. 19a show how the Maven convention can be overridden through configuration. The Java compiler is instructed to operate in Java 1.5 source mode (line 15), and generate bytecode that is compatible with the Java 1.7 runtime environment (line 16).

1.5 A.4 Dependency Management

Figure 19 provides working examples of dependency management in Maven (Fig. 19a) and the two studied dependency management technologies (Fig. 19b and c).

Maven

In addition to providing a framework-driven build environment, Maven doubles as a dependency management technology. Lines 22–26 of Fig. 19a provide an example dependency declaration on the JUnit tool, version 3.8.1 (Figs. 20 and 21).

Fig. 20
figure 20

Monthly build commit proportion, sizes, and churn volume in the studied forges

Fig. 21
figure 21

Monthly source-build coupling and build author ratios for the studied forges

Ivy

Ivy provides dependency management features that are most notably leveraged by Ant. Figure 19b shows an Ivy specification for the same JUnit dependency as depicted in Fig. 19a.

Bundler

Bundler provides packaging and dependency management for Ruby applications. Line 1 of Fig. 19c specifies that bundler should download gems, i.e., Ruby packages, from the given host. Lines 2 and 3 specify dependencies on Rake version 10.0.3 (at least) and rspec version 2.13.0 (exact).

B Additional Build Maintenance Figures

We perform longitudinal analyses of the Tukey HSD ranks for each metric in the forges to complement our median-based analyses in Section 6. Figures 20 and 21 show only the first twelve months of history and the top three ranks to improve the readability of the figures. Unfiltered figures are available online.Footnote 15

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McIntosh, S., Nagappan, M., Adams, B. et al. A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance. Empir Software Eng 20, 1587–1633 (2015). https://doi.org/10.1007/s10664-014-9324-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9324-x

Keywords

Navigation