Skip to main content
Log in

Case study on which relations to use for clustering-based software architecture recovery

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Clustering-based software architecture recovery is an area that has received significant attention in the software engineering community over the years. Its key concept is the compilation and clustering of a system-wide graph that consists of source code entities as nodes, and source code relations as edges. However, the related research has mostly focused on investigating different clustering methods and techniques, and consequently there is limited work on addressing the question of what is a minimal set of relations that can be easily extracted from the system’s source code, and yet can be accurately used for extracting its architecture. In this paper, we report on results obtained from an architecture recovery case study we have conducted, by considering all possible combinations which can be generated from thirteen commonly used source code relations. We have examined the similarity of the extracted architectures obtained by using each different relation combination for different systems, against the corresponding architecture which is obtained by applying all thirteen relations and whch we consider as the ground truth architecture. For this purpose, we have also examined whether the use of all these thirteen relations is indeed adequate to yield a ground truth architecture, by applying this architecture extraction process on five large sofware systems for which their ground truth architecture has been independently established. The overall results of our study indicate that there is small set of relations for procedural systems, and another similar set for object oriented systems, that can be easily extracted from the source code and yet used to yield an architecture that is close to the ground truth architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Listing 1
Listing 2
Listing 3
Fig. 3
Fig. 4
Listing 4
Fig. 5
Listing 5
Listing 6
Listing 7
Listing 8
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. A list of software architecture definitions can be found on http://www.sei.cmu.edu/architecture/start/glossary/classicdefs.cfm

References

  • Adams B, Tromp W, De Meuter H, Hassan A (2009) Can we refactor conditional compilation into aspects? In: Proceedings of the 8th ACM International Conference on Aspect-oriented Software Development, AOSD 09, pages 243–254, New York, ACM

  • Akers RL, Baxter ID, Mehlich M, Ellis B, Luecke K (2005) C++ component model reengineering by automatic transformation. In: CrossTalk, The Journal of Defense Software Engineering

  • Allen R (1997) A formal approach to software architecture, Ph.D. thesis, Carnegie Mellon School of Computer Science

  • Andritsos P, Tzerpos V (2005) Information-theoretic software clustering. IEEE Trans Softw Eng 150–165

  • Anquetil N, Lethbridge T (1998) Extracting concepts from file names: a new file clustering criterion. In: Proceedings of the international conference on software engineering. Association of Computing Machinery (ACM) Press, pp 84–93

  • Bass L, Clements P, Kazman R (2012) Software architecture in practice. Addison-Wesley Professional, 3rd Edn

  • Bass L, Clements P, Kazman R (2013) Software architecture in practice. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA

    Google Scholar 

  • Bauer M, Trifu M (2004) Architecture-aware adaptive clustering of oo systems. In: Proceedings of the conference on software maintenance and reengineering. IEEE Computer Society Press, pp 3–12

  • Bois BD, et al. (2007) Supporting reengineering scenarios with FETCH: an experience report. ECEASST 8

  • Bojic D, Velasevic D (2000) A use-case driven method of architecture recovery for program understanding and reuse reengineering. In: IEEE Conference on Software Maintenance and Reengineering, CSMR’00. pp 23–33

  • Boughanmi F (2010) Multi-language and heterogeneously-licensed software analysis. In: Proceedings of 17th working conference on reverse engineering. pp 293–296

  • Bowman IT, Holt R (1998) Software architecture recovery using conway’s law. In: Proceedings of the 1998 conference of the centre for advanced studies on collaborative research. CASCON ’98, 6 IBM Press

  • Canfora G, Czeranski J, Koschke R (2000) Revisiting the delta-ic approach to component recovery. In: Proceedings of the working conference on reverse engineering. IEEE Computer Society Press

  • Chiricota Y, Jourdan Y, Melanon F (2003) G. Software Components capture using graph clustering. In: Proceedings of the workshop on program comprehension. IEEE Computer Society Press, pp 217–226

  • Corazza A, et al. (2011) Investigating the use of lexical information for software system clustering, CSMR, IEEE Computer Society 35–44

  • DeBaud JM, Moopen B, Rugaber S (1994) Domain analysis and reverse engineering. ICSM IEEE Comput Soc 326–335

  • Ducasse S, Pollet D (2009) Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans Softw Eng 99(1)

  • Ducasse S, Tichelaar S (2003) Dimensions of reengineering environment infrastructures. Int J Softw Maint Res Pract 15:345–373

    Article  Google Scholar 

  • Feiler PH (2014) AADL and model-based engineering. Ada Lett ACM 34(3):17–18

    Article  Google Scholar 

  • Fischer M, Pinzger M, Gall H (2003) Analyzing and relating bug report data for feature tracking. In: Proceedings of the 10th working conference on reverse engineering, WCRE ’03. IEEE Computer Society, Washington, pp. 90–,

  • Fleck G, et al. (2016) Experience report on building astm based tools for multi-language reverse engineering. In: Proceedings of 23rd conference on software analysis, evolution, and reengineering, pp 283–687

  • Garcia J, et al. (2011) Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, ASE ’11, IEEE, pp 552–555

  • Garcia J, Popescu D, Mattmann C, Medvidovic N, Cai Y (2011) Enhancing architectural recovery using concerns, 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11. IEEE 552–555

  • Garcia J, Ivkovic I, Medvidovic N (2013) A comparative analysis of software architecture recovery techniques, 28th International Conference on Automated Software Engineering, ASE 2013. IEEE 486–496

  • Garcia J, Krka I, Mattmann C, Medvidovic N (2013) Obtaining ground-truth software architectures. In: Proceedings of the 2013 international conference on software engineering. ICSE ’13 IEEE Press, pp 901–910

  • Garlan D, Monroe R T, Wile D (1997) Acme: An architecture description interchange language. In: Proceedings of CASCON’97, Toronto, Ontario, pp 169–183

  • Imber M (1991) The CASE data interchange format (CDIF) standards. In: Long, F (ed) Software engineering environments, Ellis Horwood. pp 457–474

  • Jackson D (2012) Software Abstractions: logic, language, and analysis MIT press

  • Jerding D, Rugaber S (2000) Using visualization for architectural localization and extraction. Sci Comput Program:267–284

  • Kobayashi K, et al. (2012) Feature-gathering dependency-based software clustering using dedication and modularity. In: Proceedings of the 28th international conference on software maintenance. IEEE Computer Society, pp 462–471

  • Koschke R, Canfora G, Czeranski J (2006) Revisiting the approach to component recovery, Science of Computer Programming, Special Issue on Software Analysis, Evolution and Re-engineering, pp 171–188

  • Kruchten P (1995) The 4+1 view model of architecture. IEEE Softw 12(6):42–50

    Article  Google Scholar 

  • Kuhn H W (1955) The Hungarian method for the assignment problem. Nav Res Logist Q:83–97

  • Lethbridge T, Tichelaar S, Ldereder E (2004) The dagstuhl middle metamodel: A schema for reverse engineering. Electr Notes Theor Comput Sci 94:7–18

    Article  Google Scholar 

  • Lung CH (1998) Software architecture recovery and restructuring through clustering techniques. In: Proceedings of the third international workshop on software architecture, ISAW ’98, ACM, pp 101–104

  • Lung C-H (1998) Software architecture recovery and restructuring through clustering techniques. In: Proceedings of the Third International Workshop on Software Architecture. Association of Computing Machinery (ACM) Press, pp 101–104

  • Lungu M, Lanza M, Nierstrasz O (2014) Evolutionary and collaborative software architecture recovery with softwarenaut. Sci Comput Program 79:204–223. In: Proceedings of the Conference on Software Maintenance and Reengineering, CSMR ’00, pages 23–, Washington, DC, USA, 2000 IEEE Computer Society

    Article  Google Scholar 

  • Lutellier T, Chollak D, Garcia J, Tan L, Rayside D, Medvidović N, Kroeger R (2015) Comparing software architecture recovery techniques using accurate dependencies. In: Proceedings of the 37th international conference on software engineering - vol 2, ICSE ’15. IEEE Press, Piscataway, pp 69–78

  • Mancoridis S, et al. (1999) Bunch: A clustering tool for the recovery and maintenance of software system structures. In: Proceedings IEEE international conference on software maintenance. IEEE Computer Society Press, pp 50–59

  • Mancoridis S, Holt R C (1996) Recovering the structure of software systems using tube graph interconnection clustering. In: Proceedings of international conference on software maintenance 1996. IEEE, pp 23–32

  • Mahdavi K, Harman M, Hierons RM (2003) A multiple hill climbing approach to software module clustering. In: Proceedings of the international conference on software maintenance, september. IEEE Computer Society Press, pp 315–324

  • Maqbool O, Babri HA (2004) The weighted combined algorithm: A linkage algorithm for software clustering. In: Proceedings of the conference on software maintenance and reengineering. IEEE Computer Society Press, pp 15–24

  • Maqbool O, Babri H (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng IEEE:759–780

  • Medvidovic N (1995) Formal definition of the chiron-2 software architectural style

  • Mendonça NC, Kramer J (2001) An approach for recovering distributed system architectures, Automated Software Engg, Kluwer Academic Publishers, pp 311–354

  • Muller H, Wong K, Tilley S (1992) A reverse engineering environment based on spatial and visual software interconnection models, ACM SIGSOFT Symposium on Software Development Environments. Association of Computing Machinery (ACM) Press, pp 88–98

  • Murphy GC, Notkin D, models K S (1995) Software reflexion Bridging the gap between source and high-level models. SIGSOFT Softw Eng Notes 20(4):18–28

    Article  Google Scholar 

  • Overbey JL, Johnson RE (2008) Generating rewritable abstract syntax trees. In: Gaševic D, Lämmel R, Wyk E V (eds) Software language engineering: first international conference (SLE 2008), Vol. 5452 of Lecture Notes in Computer Science. Springer, Berlin, pp 114–133

  • Patel C, Hamou-Lhadj A, Rilling J (2009) Software clustering using dynamic analysis and static dependencies. In: 13th European conference on software maintenance and reengineering, CSMR 2009, Architecture-Centric Maintenance of Large-SCale Software Systems, Kaiserslautern Germany, 24-27 March 2009, pp 27–36

  • Pinzger M et al. (2004) Architecture recovery for product families. In: van der Linden F (ed) Software product-family engineering, Lecture notes in computer science, vol 3014. Springer, Berlin, pp 332–351

  • Rayside D, Reuss S, Hedges E, Kontogiannis K (2000) The effect of call graph construction algorithms for object-oriented programs on automatic clustering. In: Proceedings workshop on program comprehension, 2000. Proceedings. IWPC, pp 191–200

  • Sartipi K, Kontogiannis K (2001) A graph pattern matching approach to software architecture recovery. In: Proceedings of the IEEE international conference on software maintenance (ICSM’01). IEEE Computer Society, pp 408–419

  • Sartipi K, Kontogiannis K (2003) A user-assisted approach to component clustering. J Softw Maint Evol Res Pract 15(4):265–295

    Article  Google Scholar 

  • Tilley S, Weiderman N, Woods S, Bergey J, Smith D (1999) Why reengineering projects fail. In: TECHNICAL REPORT CMU SEI99TR010 ESCTR99010. Software Engineering Institute

  • Tzerpos V, Holt RC (1996) A hybrid process for recovering software architecture CASCON

  • Tzerpos V, Holt RC (2000a) ACDC: An algorithm for comprehension-driven clustering. In: Proceedings of the seventh working conference on reverse engineering. IEEE, pp 258–267

  • Tzerpos V, Holt RC (2000b) On the stability of software clustering algorithms, 8th international workshop on program comprehension (IWPC 2000). IEEE:211–218

  • van Deursen A, Kuipers T (1999) Identifying objects using cluster and concept analysis. In: Proceedings of the international conference on software engineering. Association of Computing Machinery (ACM) Press, pp 246–255

  • Van Rompaey B, et al. (2009) SERIOUS: software evolution, refactoring, improvement of operational and usable systems. In: CSMR. IEEE Computer Society, pp 277–280

  • Van Rompaey B, Demeyer S (2009) Establishing traceability links between unit test cases and units under test. In: 13th European conference on software maintenance and reengineering, CSMR 2009, Architecture-Centric Maintenance of Large-SCale Software Systems. Kaiserslautern Germany, pp 209–218

  • Vasconcelos A, Werner C (2004) Software architecture recovery based on dynamic analysis. In: XVIII Brazilian symposium on software engineering. Workshop on Modern Software Maintenance

  • Wilson RJ (1986) Introduction to graph theory. Wiley

  • Wu J, Hassan AE, Holt RC (2005) Comparison of clustering algorithms in the context of software evolution. ICSM IEEE Comput Soc:525–535

  • Wu J, Hassan AE, Holt RC (2005) Comparison of clustering algorithms in the context of software evolution. In: Proceedings of the 21st IEEE international conference on software maintenance. ICSM ’05 IEEE Computer Society, pp 525–535

  • Xing Z, Stroulia E (2005) UMLDiff: An Algorithm for Object-oriented Design Differencing. In: Proceedings of 20th international conference on automated software engineering. ACM, pp 54–65

  • Zou Y, Kontogiannis K (2001) Towards a portable xml-based source code representation. In: Proceedings of international conference on software engineering (ICSE) 2001 workshops of XML technologies and software engineering (XSE), Toronto, Canada

Download references

Acknowledgments

We would like to thank Stergios Ientsek for his contribution related to enhancements of the extraction tools and the production of RSF files, and the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kostas Kontogiannis.

Additional information

Communicated by: Paolo Tonella

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stavropoulou, I., Grigoriou, M. & Kontogiannis, K. Case study on which relations to use for clustering-based software architecture recovery. Empir Software Eng 22, 1717–1762 (2017). https://doi.org/10.1007/s10664-016-9459-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9459-z

Keywords

Navigation