Skip to main content
Log in

On the impact of software evolution on software clustering

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The evolution of a software project is a rich data source for analyzing and improving the software development process. Recently, several research groups have tried to cluster source code artifacts based on information about how the code of a software system evolves. The results of these evolutionary approaches seem promising, but a direct comparison to traditional software clustering approaches based on structural code dependencies is still missing. To fill this gap, we conducted several clustering experiments with an established software clustering tool comparing and combining the evolutionary and the structural approach. These experiments show that the evolutionary approach could produce meaningful clustering results. While the traditional approach provides better results because of a more reliable data density of the structural data, the combination of both approaches is able to improve the overall clustering quality. A review of related studies shows that this approach of combining dependency information is also successful in other software engineering applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. http://depfind.sourceforge.net/

  2. http://www.codeanalyzer.teel.ws/

  3. http://www.jhotdraw.org/

  4. The Wilcoxon Test is used instead of a Friedman Test when only comparing two variables instead of three or more.

References

  • Adams B, Jiang ZM, Hassan AE (2010) Identifying crosscutting concerns using historical code changes. In: ICSE ’10: proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, New York, NY, USA, pp 305–314

  • Andritsos P, Tzerpos V (2005) Information-theoretic software clustering. IEEE Trans Softw Eng 31(2):150–165

    Article  Google Scholar 

  • Anquetil N, Fourrier C, Lethbridge TC (1999) Experiments with clustering as a software remodularization method. In: WCRE ’99: proceedings of the 6th working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 235–255

  • Arafat O, Riehle D (2009) The comment density of open source software code. In: ICSE 09: 31st international conference on software engineering - companion volume. IEEE, pp 195–198

  • Ball T, Kim JM, Porter AA, Siy HP (1997) If your version control system could talk ... In: ICSE ’97 workshop on process modeling and empirical studies of software engineering. ACM Press

  • Bavota G, De Lucia A, Marcus A, Oliveto R (2010) Software re-modularization based on structural and semantic metrics. In: WCRE ’10: proceedings of the 17th working conference on reverse engineering. IEEE Computer Society, pp 195–204

  • Beck F, Diehl S (2010a) Evaluating the impact of software evolution on software clustering. In: WCRE ’10: proceedings of the 17th working conference on reverse engineering. IEEE Computer Society, pp 99–108

  • Beck F, Diehl S (2010b) Visual comparison of software architectures. In: SoftVis ’10: proceedings of the ACM 2010 symposium onzzisualization. Salt Lake City, Utah, pp 183–192

  • Beck F, Diehl S (2011) On the congruence of modularity and code coupling. In: ESEC/FSE ’11: proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, New York, pp 354–364

    Chapter  Google Scholar 

  • Beyer D, Noack A (2005) Clustering software artifacts based on frequent common changes. In: IWPC ’05: proceedings of the 13th international workshop on program comprehension. IEEE Computer Society, pp 259–268

  • Bittencourt RA, Santos GJS, Guerrero DDS, Murphy GC (2010) Improving automated mapping in reflexion models using information retrieval techniques. In: WCRE ’10: proceedings of the 17th working conference on reverse engineering. IEEE Computer Society, pp 163–172

  • Breu S, Zimmermann T (2006) Mining aspects from version history. In: ASE ’06: proceedings of the 21st IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 221–230

  • Canfora G, Cerulo L, Di Penta M (2006) On the use of line co-change for identifying crosscutting concern code. In: ICSM ’06: proceedings of the 22nd IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 213–222

  • Cataldo M, Mockus A, Roberts JA, Herbsleb JD (2009) Software dependencies, work dependencies, and their impact on failures. IEEE Trans Softw Eng 35(6):864–878

    Article  Google Scholar 

  • Cook TD, Campbell DT (1979) Quasi-experimentation: design & analysis issues for field settings. Houghton Mifflin

  • Fluri B, Gall HC, Pinzger M (2005) Fine-grained analysis of change couplings. In: SCAM ’05: proceedings of the fifth IEEE international workshop on source code analysis and manipulation. IEEE Computer Society, Washington, DC, pp 66–74

    Google Scholar 

  • Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC

    Google Scholar 

  • Gargiulo J, Mancoridis S (2001) Gadget: a tool for extracting the dynamic structure of Java programs. In: SEKE ’01: proceedings of the 13th international conference on software engineering and knowledge engineering, pp 244–251

  • German DM, Hassan AE, Robles G (2009) Change impact graphs: determining the impact of prior code changes. Inf Softw Technol 51:1394–1408

    Article  Google Scholar 

  • Hansen KM, Jonasson K, Neukirchen H (2011) An empirical study of software architectures’ effect on product quality. J Syst Softw 84(7):1233–1243

    Article  Google Scholar 

  • Hattori L, Santos GD, Cardoso F, Sampaio M (2008) Mining software repositories for software change impact analysis: a case study. In: SBBD ’08: proceedings of the 23rd Brazilian symposium on databases, sociedade Brasileira de computacao, Porto Alegre, Brazil, pp 210–223

  • Kagdi H, Maletic JI (2007) Combining single-version and evolutionary dependencies for software-change prediction. In: MSR 07’: proceedings of the fourth international workshop on mining software repositories. IEEE Computer Society, Washington, DC

    Google Scholar 

  • Kagdi HH, Gethers M, Poshyvanyk D, Collard ML (2010) Blending conceptual and evolutionary couplings to support change impact analysis in source code. In: WCRE ’10: proceedings of the 17th working conference on reverse engineering. IEEE Computer Society, pp 119–128

  • Kim S, Zimmermann T, Whitehead JE, Zeller A (2007) Predicting faults from cached history. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 489–498

    Google Scholar 

  • Koschke R, Eisenbarth T (2000) A framework for experimental evaluation of clustering techniques. In: IWPC ’00: proceedings of the 8th international workshop on program comprehension. IEEE Computer Society, Washington, DC, pp 201–210

    Chapter  Google Scholar 

  • Kuhn A, Ducasse S, Girba T (2005) Enriching reverse engineering with semantic clustering. In: WCRE ’05: proceedings of the 12th working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 133–142

    Chapter  Google Scholar 

  • Ma KL (2008) Stargate: a unified, interactive visualization of software projects. In: PacificVis ’08: proceedings of the IEEE VGTC pacific visualization symposium 2008, pp 191–198

  • Maarek YS, Berry DM, Kaiser GE (1991) An information retrieval approach for automatically constructing software libraries. IEEE Trans Softw Eng 17(8):800–813

    Article  Google Scholar 

  • Mancoridis S, Mitchell BS, Rorres C, Chen Y, Gansner ER (1998) Using automatic clustering to produce high-level system organizations of source code. In: IWPC ’98: proceedings of the 6th international workshop on program comprehension. IEEE Computer Society, Washington, DC, pp 45–52

    Google Scholar 

  • Mancoridis S, Mitchell BS, Chen Y, Gansner ER (1999) Bunch: a clustering tool for the recovery and maintenance of software system structures. In: ICSM ’99: proceedings of the IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 50–59

    Google Scholar 

  • Maqbool O, Babri HA (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng 33(11):759–780

    Article  Google Scholar 

  • Melton H, Tempero E (2007) The CRSS metric for package design quality. In: Proceedings of the thirtieth Australasian conference on computer science. Australian Computer Society, Inc., Darlinghurst, Australia, ACSC ’07, pp 201–210

  • Mitchell BS (2002) A heuristic approach to solving the software clustering problem. PhD thesis, Drexel University

  • Mitchell BS, Mancoridis S (2001) Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: ICSM ’01: proceedings of the 17th IEEE international conference on software maintenance, pp 744–753

  • Mitchell BS, Mancoridis S (2007) On the evaluation of the Bunch search-based software modularization algorithm. Soft Comput 12(1):77–93

    Article  Google Scholar 

  • Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15(12):1053–1058

    Article  Google Scholar 

  • Patel C, Lhadj AH, Rilling J (2009) Software clustering using dynamic analysis and static dependencies. In: CSMR 09: proceedings of the 2009 European conference on software maintenance and reengineering. IEEE Computer Society, Washington, DC, pp 27–36

    Chapter  Google Scholar 

  • Shtern M, Tzerpos V (2004) A framework for the comparison of nested software decompositions. In: WCRE ’04: proceedings of the 11th working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 284–292

    Chapter  Google Scholar 

  • Shtern M, Tzerpos V (2007) Lossless comparison of nested software decompositions. In: WCRE ’07: proceedings of the 14th working conference on reverse engeering. IEEE Computer Society, Washington, DC, pp 249–258

    Chapter  Google Scholar 

  • Sindhgatta R, Pooloth K (2007) Identifying software decompositions by applying transaction clustering on source code. In: COMPSAC ’07: 31st annual international computer software and applications conference, vol 1, pp 317–326

  • Stevens WP, Myers GJ, Constantine LL (1974) Structured design. IBM Syst J 13(2):115–139

    Article  Google Scholar 

  • Tzerpos V, Holt RC (1999) MoJo: a distance metric for software clusterings. In: WCRE ’99: proceedings of the 6th working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 187–193

    Google Scholar 

  • Vanya A, Hofland L, Klusener S, van de Laar P, van Vliet H (2008) Assessing software archives with evolutionary clusters. In: ICPC ’08: proceedings of the 16th IEEE international conference on program comprehension. IEEE Computer Society, Los Alamitos, CA, pp 192–201

    Chapter  Google Scholar 

  • Voinea L, Telea A (2006) CVSgrab: mining the history of large software projects. In: EuroVis ’06: joint eurographics - IEEE VGTC symposium on visualization, eurographics association, pp 187–194

  • Wen Z, Tzerpos V (2004) An effectiveness measure for software clustering algorithms. In: IWPC ’04: proceedings of the 12th international workshop on program comprehension, pp 194–203

  • Wen Z, Tzerpos V (2005) Software clustering based on omnipresent object detection. In: IWPC ’05: proceedings of the 13th international workshop on program comprehension. IEEE Computer Society, Washington, DC, pp 269–278

    Google Scholar 

  • Wierda A, Dortmans E, Somers LL (2006) Using version information in architectural clustering - a case study. In: CSMR ’06: proceedings of the conference on software maintenance and reengineering. IEEE Computer Society, Washington, DC, pp 214–228

    Google Scholar 

  • Wong S, Cai Y (2009) Predicting change impact from logical models. In: ICSM ’09: IEEE international conference on software maintenance. IEEE Computer Society, pp 467–470

  • Wu J, Hassan AE, Holt RC (2005) Comparison of clustering algorithms in the context of software evolution. In: ICSM ’05: proceedings of the 21st IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 525–535

    Google Scholar 

  • Xiao C, Tzerpos V (2005) Software clustering based on dynamic dependencies. In: CSMR ’05: proceedings of the 9th European conference on software maintenance and reengineering. IEEE Computer Society, Washington, DC, pp 124–133

    Google Scholar 

  • Zhou Y, Würsch M, Giger E, Gall HC, Lü J (2008) A bayesian network based approach for change coupling prediction. In: WCRE ’08: proceedings of the 15th working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 27–36

    Chapter  Google Scholar 

  • Zimmermann T, Weiβgerber P (2004) Preprocessing CVS data for fine-grained analysis. In: MSR ’04: proceedings of the 1st international workshop on mining software repositories. IEEE Computer Society, pp 2–6

  • Zimmermann T, Diehl S, Zeller A (2003) How history justifies system architecture (or not). In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC

    Google Scholar 

  • Zimmermann T, Weiβgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: proceedings of the 26th international conference on software engineering. IEEE Computer Society, pp 563–572

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabian Beck.

Additional information

Editors: Giuliano Antoniol and Martin Pinzger

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beck, F., Diehl, S. On the impact of software evolution on software clustering. Empir Software Eng 18, 970–1004 (2013). https://doi.org/10.1007/s10664-012-9225-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-012-9225-9

Keywords

Navigation