Skip to main content
Log in

How changes affect software entropy: an empirical study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context Software systems continuously change for various reasons, such as adding new features, fixing bugs, or refactoring. Changes may either increase the source code complexity and disorganization, or help to reducing it. Aim This paper empirically investigates the relationship of source code complexity and disorganization—measured using source code change entropy—with four factors, namely the presence of refactoring activities, the number of developers working on a source code file, the participation of classes in design patterns, and the different kinds of changes occurring on the system, classified in terms of their topics extracted from commit notes. Method We carried out an exploratory study on an interval of the life-time span of four open source systems, namely ArgoUML, Eclipse-JDT, Mozilla, and Samba, with the aim of analyzing the relationship between the source code change entropy and four factors: refactoring activities, number of contributors for a file, participation of classes in design patterns, and change topics. Results The study shows that (i) the change entropy decreases after refactoring, (ii) files changed by a higher number of developers tend to exhibit a higher change entropy than others, (iii) classes participating in certain design patterns exhibit a higher change entropy than others, and (iv) changes related to different topics exhibit different change entropy, for example bug fixings exhibit a limited change entropy while changes introducing new features exhibit a high change entropy. Conclusions Results provided in this paper indicate that the nature of changes (in particular changes related to refactorings), the software design, and the number of active developers are factors related to change entropy. Our findings contribute to understand the software aging phenomenon and are preliminary to identifying better ways to contrast it.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://argouml.tigris.org

  2. http://www.eclipse.org

  3. http://www.mozilla.org

  4. http://www.samba.org

  5. http://java.uom.gr/~nikos/pattern-detection.html

  6. https://javacc.dev.java.net

  7. http://cran.r-project.org/web/packages/topicmodels

  8. http://www.r-project.org

  9. http://www.rcost.unisannio.it/mdipenta/emse-icpc/entropy-rawdata.tgz

  10. In the following for Eclipse-JDT we refer to classes rather than to files, knowing that in the discussed cases there is a correspondence between a class and a file.

References

  • Aversano L, Canfora G, Cerulo L, Del Grosso C, Di Penta M (2007) An empirical study on the evolution of design patterns. In: ESEC-FSE ’07: proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM Press, New York, pp 385–394

  • Aversano L, Cerulo L, Di Penta M (2009) The relationship between design patterns defects and crosscutting concern scattering degree: an empirical study. IET Softw 3(5):395–409

    Article  Google Scholar 

  • Bianchi A, Caivano D, Lanubile F, Visaggio G (2001) Evaluating software degradation through entropy. In: METRICS ’01: Proceedings of the 7th international symposium on software metrics. IEEE Computer Society, Washington, DC, p 210

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Canfora G, Cerulo L, Di Penta M, Pacilio F (2010) An exploratory study of factors influencing change entropy. In: The 18th IEEE international conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, 30 June–2 July 2010. IEEE Computer Society, Washington, DC, pp 134–143

  • Capiluppi A, Fernández-Ramil J, Higman J, Sharp HC, Smith N (2007) An empirical study of the evolution of an agile-developed software system. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, 20–26 May 2007. IEEE Computer Society, Washington, DC, pp 511–518

  • Chapin N (1995) An entropy metric for software maintainability. In: Proceedings of the 28th Hawaii international conference on system sciences, pp 522–523

  • Chikofsky EJ, Cross JH II (1990) Reverse engineering and design recovery: a taxonomy. IEEE Softw 7(1):13–17

    Article  Google Scholar 

  • Di Penta M, Germán DM (2009) Who are source code contributors and how do they change? In: 16th working conference on reverse engineering, WCRE 2009, 13–16 October 2009, Lille, France. IEEE Computer Society, Washington, DC, pp 11–20

  • Di Penta M, Germán DM, Guéhéneuc Y-G, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE 2010, Cape Town, South Africa, 1–8 May 2010. ACM, New York, pp 145–154

  • Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12

    Article  Google Scholar 

  • Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley, Reading

    Google Scholar 

  • Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE ’03: Proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, pp 13–23

  • Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object oriented software. Addison-Wesley, Reading

    Google Scholar 

  • Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates, Hillsdale

    Google Scholar 

  • Harrison W (1992) An entropy-based measure of software complexity. IEEE Trans Softw Eng 18(11):1025–1029

    Article  Google Scholar 

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: 31st international conference on software engineering, ICSE 2009, 16–24 May 2009, Vancouver, Canada, pp 78–88

  • Hassan AE, Holt RC (2003) The chaos of software development. In: IWPSE ’03: Proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, p 84

  • Holm S (1979) A simple sequentially rejective Bonferroni test procedure. Scand J Statist 6:65–70

    MATH  MathSciNet  Google Scholar 

  • Kuhn A, Ducasse S, Gírba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49:230–243

    Article  Google Scholar 

  • Lehman MM (1980) Programs life cycles and laws of software evolution. Proc IEEE 68(9):1060–1076

    Article  Google Scholar 

  • Lehman MM, Belady LA (1985) Software evolution—processes of software change. Academic, London

    Google Scholar 

  • Linstead E, Baldi P (2009) Mining the coherence of gnome bug reports with statistical topic models. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR ’09. IEEE Computer Society, Washington, DC, pp 99–102

  • Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: an empirical case study. In: Proceedings of the first international symposium on empirical software engineering and measurement, ESEM 2007, 20–21 September 2007, Madrid, Spain. IEEE Computer Society, Washington, DC, pp 364–373

  • Parnas DL (1994) Software aging. In: Proceedings of the international conference on software engineering, pp 279–287

  • Ratzinger J, Sigmund T, Gall H (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 international working conference on mining software repositories, MSR 2008, Leipzig, Germany, 10–11 May 2008. ACM, New York, pp 35–38

    Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 625–656

    Article  MATH  MathSciNet  Google Scholar 

  • Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall, London

    MATH  Google Scholar 

  • Thomas SW, Adams B, Hassan AE, Blostein D (2010) Validating the use of topic models for software evolution. In: IEEE international workshop on source code analysis and manipulation. IEEE Computer Society, Los Alamitos, pp 55–64

    Google Scholar 

  • Tsantalis N, Chatzigeorgiou A, Stephanides G, Halkidis ST (2006) Design pattern detection using similarity scoring. IEEE Trans Softw Eng 32(11):896–909

    Article  Google Scholar 

  • van Rijsbergen CJ, Robertson SE, Porter MF (1980) New models in probabilistic information retrieval. In: British Library research and development report, no. 5587. British Library, London

  • Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: Proceedings of the 26th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 563–572

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimiliano Di Penta.

Additional information

This paper is an extension of the paper “An Exploratory Study of Factors Influencing Change Entropy” (Canfora et al. 2010).

Appendix: Detailed Analyses

Appendix: Detailed Analyses

Table 8 ArgoUML: comparison of change entropy among different design patterns
Table 9 Eclipse-JDT: comparison of change entropy among different design patterns
Table 10 ArgoUML: Cliff’s delta resulting from the comparison of change entropy for different topics
Table 11 Eclipse-JDT: Cliff’s delta resulting from the comparison of change entropy for different topics
Table 12 Mozilla: Cliff’s delta resulting from the comparison of change entropy for different topics
Table 13 Samba: Cliff’s delta resulting from the comparison of change entropy for different topics

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canfora, G., Cerulo, L., Cimitile, M. et al. How changes affect software entropy: an empirical study. Empir Software Eng 19, 1–38 (2014). https://doi.org/10.1007/s10664-012-9214-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-012-9214-z

Keywords

Navigation