Skip to main content
Log in

Identifying and understanding header file hotspots in C/C++ build processes

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Software developers rely on a fast build system to incrementally compile their source code changes and produce modified deliverables for testing and deployment. Header files, which tend to trigger slow rebuild processes, are most problematic if they also change frequently during the development process, and hence, need to be rebuilt often. In this paper, we propose an approach that analyzes the build dependency graph (i.e., the data structure used to determine the minimal list of commands that must be executed when a source code file is modified), and the change history of a software system to pinpoint header file hotspots—header files that change frequently and trigger long rebuild processes. Through a case study on the GLib, PostgreSQL, Qt, and Ruby systems, we show that our approach identifies header file hotspots that, if improved, will provide greater improvement to the total future build cost of a system than just focusing on the files that trigger the slowest rebuild processes, change the most frequently, or are used the most throughout the codebase. Furthermore, regression models built using architectural and code properties of source files can explain 32–57 % of these hotspots, identifying subsystems that are particularly hotspot-prone and would benefit the most from architectural refinement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://tbpl.mozilla.org/.

  2. http://mathiasdm.com/2014/01/24/a-c-questionnaire-on-build-speed-the-results-are-in/.

  3. https://bugs.webkit.org/show_bug.cgi?id=32921.

  4. https://bugs.webkit.org/show_bug.cgi?id=33556.

  5. https://developer.gnome.org/glib/.

  6. http://www.postgresql.org/.

  7. http://qt.digia.com/.

  8. https://www.ruby-lang.org/.

  9. The simulation was repeated for 20 and 50 % improvements, which yielded similar results.

  10. http://www.scitools.com/index.php.

  11. http://www.dwheeler.com/sloccount/.

References

  • Adams, B., De Schutter, K., Tromp, H., Meuter, W.: Design recovery and maintenance of build systems. In: Proceedings of the 23rd International Conference on Software Maintenance (ICSM), pp. 114–123 (2007)

  • Adams, B., Schutter, KD., Tromp, H., Meuter, WD.: The evolution of the linux build system. In: Electronic Communications of the ECEASST 8 (2008)

  • Adams, R., Tichy, W., Weinert, A.: The cost of selective recompilation and environment processing. Trans. Softw. Eng. Methodol. (TOSEM) 3(1), 3–28 (1994)

    Article  Google Scholar 

  • Al-Kofahi, J.M., Nguyen, H.V., Nguyen, A.T., Nguyen, T.T., Nguyen, T.N.: Detecting semantic changes in makefile build code. In: Proceedings of the 28th International Conference on Software Maintenance (ICSM), pp. 150–159 (2012)

  • Cataldo, M., Mockus, A., Roberts, J.A., Herbsleb, J.D.: Software dependencies, work dependencies, and their impact on failures. Trans. Softw. Eng. (TSE) 35(6), 864–878 (2009)

    Article  Google Scholar 

  • Chambers, J.M., Hastie, T.J. (eds.): Statistical Models in S, vol. 4. Wadsworth and Brooks/Cole, Pacific Grove (1992)

  • Dayani-Fard, H., Yu, Y., Mylopoulos, J., Andritsos, P.: Improving the build architecture of legacy C/C++ Software systems. In: Proceedings of the 8th International Conference on Fundamental Approaches to Software Engineering (FASE), pp. 96–110 (2005)

  • Feldman, S.: Make: a program for maintaining computer programs. Software 9(4), 255–265 (1979)

    MATH  Google Scholar 

  • Fischer, A.R.H., Blommaert, F.J.J., Midden, C.J.H.: Monitoring and evaluation of time delay. Int. J. Hum. Comput. Interact. 19(2), 163–180 (2005)

    Article  Google Scholar 

  • Fox, J.: Applied Regression Analysis and Generalized Linear Models, 2nd edn. Sage Publications, Thousand Oaks (2008)

    Google Scholar 

  • Hassan, A.E., Zhang, K.: Using decision trees to predict the certification result of a build. In: Proceedings of the 21st International Conference on Automated Software Engineering (ASE), pp. 189–198 (2006)

  • Hochstein, L., Jiao, Y.: The cost of the build tax in scientific software. In: Proceedings of the 5th International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 384–387 (2011)

  • Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley, New Jersey (2010)

    Google Scholar 

  • Khomh, F., Chan, B., Zou, Y., Hassan, A.E.: An Entropy evaluation approach for triaging field crashes: a case study of mozilla firefox. In: Proceedings of the 18th Working Conference on Reverse Engineering (WCRE), pp. 261–270 (2011)

  • Kumfert, G., Epperly, T.: Software in the DOE: the hidden overhead of “The Build”. Techical Report UCRL-ID-147343, Lawrence Livermore National Laboratory, CA, USA (2002)

  • Kwan, I., Schröter, A., Damian, D.: Does socio-technical congruence have an effect on software build success? A study of coordination in a software project? Trans. Softw. Eng. (TSE) 37(3), 307–324 (2011)

    Article  Google Scholar 

  • Lakos, J.: Large-Scale C++ Software Design. Addison-Wesley, New Jersey (1996)

    Google Scholar 

  • McIntosh, S., Adams, B., Nguyen, T.H.D., Kamei, Y., Hassan, A.E.: An empirical study of build maintenance effort. In: Proceedings of the 33rd International Conference on Software Engineering (ICSE), pp. 141–150 (2011)

  • McIntosh, S., Adams, B., Hassan, A.E.: The evolution of Java build systems. Empir. Softw. Eng. 17(4–5), 578–608 (2012)

    Article  Google Scholar 

  • McIntosh, S., Nagappan, M., Adams, B., Mockus, A., Hassan, A.E.: A large-scale empirical study of the relationship between build technology and build maintenance. Empir. Softw. Eng. (2015)

  • Mockus, A.: Organizational volatility and its effects on software defects. In: Proceedings of the 18th Symposium on the Foundations of Software Engineering (FSE), pp. 117–126 (2010)

  • Morgenthaler, J.D., Gridnev, M., Sauciuc, R., Bhansali, S.: Searching for build debt: experiences managing technical debt at google. In: Proceedings of the 3rd International Workshop on Managing Technical Debt (MTD), pp. 1–6 (2012)

  • Nadi, S., Holt, R.: Make it or break it: mining anomalies in linux kbuild. In: Proceedings of the 18th Working Conference on Reverse Engineering (WCRE), pp. 315–324 (2011)

  • Nadi, S., Holt, R.: Mining Kbuild to detect variability anomalies in linux. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering (CSMR), pp. 107–116 (2012)

  • Nadi, S., Dietrich, C., Tartler, R., Holt, R.C., Lohmann, D.: Linux variability anomalies: what causes them and how do they get fixed? In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp. 111–120 (2013)

  • Neitsch, A., Wong, K., Godfrey, M.W.: Build system issues in multilanguage software. In: Proceedings of the 28th International Conference on Software Maintenance, pp. 140–149 (2012)

  • R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/

  • Shihab, E., Jiang, Z.M., Ibrahim, W.M., Adams, B., Hassan, A.E.: Understanding the Impact of code and process metrics on post-release defects: a case study on the eclipse project. In: Proceedings of the 4th International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–10 (2010)

  • van der Storm, T.: Component-based configuration, integration and delivery. Ph.D Thesis, University of Amsterdam (2007)

  • van der Storm, T.: Backtracking incremental continuous integration. In: Proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR), pp. 233–242 (2008)

  • Tamrawi, A., Nguyen, H.A., Nguyen, H.V., Nguyen, T.: Build Code analysis with symbolic evaluation. In: Proceedings of the 34th International Conference on Software Engineering (ICSE), pp. 650–660 (2012)

  • Tu, Q., Godfrey, M.W.: The build-time software architecture view. In: Proceedings of the 17th International Conference on Software Maintenance (ICSM), pp. 398–407 (2001)

  • Vakilian, M., Sauciuc, R., Morgenthaler, J.D., Mirrokni, V.: Automated decomposition of build targets. In: Proceedings of the 37th International Conference on Software Engineering (ICSE), pp. 123–133 (2015)

  • Wolf, T., Schröter, A., Damian, D., Nguyen, T.: Predicting build failures using social network analysis on developer communication. In: Procedings of the 31st International Conference on Software Engineering (ICSE), pp. 1–11. Washington, DC (2009)

  • Yu, Y., Dayani-Fard, H., Mylopoulos, J.: Removing false code dependencies to speedup software build processes. In: Proceedings of the 13th IBM Centre for Advanced Studies Conference (CASCON), pp. 343–352 (2003)

  • Yu, Y., Dayani-Fard, H., Mylopoulos, J., Andritsos, P.: Reducing build time through precompilations for evolving large software. In: Proceedings of the 21st International Conference on Software Maintenance (ICSM), pp. 59–68 (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shane McIntosh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McIntosh, S., Adams, B., Nagappan, M. et al. Identifying and understanding header file hotspots in C/C++ build processes. Autom Softw Eng 23, 619–647 (2016). https://doi.org/10.1007/s10515-015-0183-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-015-0183-5

Keywords

Navigation