Abstract
Clustering-based software architecture recovery is an area that has received significant attention in the software engineering community over the years. Its key concept is the compilation and clustering of a system-wide graph that consists of source code entities as nodes, and source code relations as edges. However, the related research has mostly focused on investigating different clustering methods and techniques, and consequently there is limited work on addressing the question of what is a minimal set of relations that can be easily extracted from the system’s source code, and yet can be accurately used for extracting its architecture. In this paper, we report on results obtained from an architecture recovery case study we have conducted, by considering all possible combinations which can be generated from thirteen commonly used source code relations. We have examined the similarity of the extracted architectures obtained by using each different relation combination for different systems, against the corresponding architecture which is obtained by applying all thirteen relations and whch we consider as the ground truth architecture. For this purpose, we have also examined whether the use of all these thirteen relations is indeed adequate to yield a ground truth architecture, by applying this architecture extraction process on five large sofware systems for which their ground truth architecture has been independently established. The overall results of our study indicate that there is small set of relations for procedural systems, and another similar set for object oriented systems, that can be easily extracted from the source code and yet used to yield an architecture that is close to the ground truth architecture.
Similar content being viewed by others
Notes
A list of software architecture definitions can be found on http://www.sei.cmu.edu/architecture/start/glossary/classicdefs.cfm
References
Adams B, Tromp W, De Meuter H, Hassan A (2009) Can we refactor conditional compilation into aspects? In: Proceedings of the 8th ACM International Conference on Aspect-oriented Software Development, AOSD 09, pages 243–254, New York, ACM
Akers RL, Baxter ID, Mehlich M, Ellis B, Luecke K (2005) C++ component model reengineering by automatic transformation. In: CrossTalk, The Journal of Defense Software Engineering
Allen R (1997) A formal approach to software architecture, Ph.D. thesis, Carnegie Mellon School of Computer Science
Andritsos P, Tzerpos V (2005) Information-theoretic software clustering. IEEE Trans Softw Eng 150–165
Anquetil N, Lethbridge T (1998) Extracting concepts from file names: a new file clustering criterion. In: Proceedings of the international conference on software engineering. Association of Computing Machinery (ACM) Press, pp 84–93
Bass L, Clements P, Kazman R (2012) Software architecture in practice. Addison-Wesley Professional, 3rd Edn
Bass L, Clements P, Kazman R (2013) Software architecture in practice. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA
Bauer M, Trifu M (2004) Architecture-aware adaptive clustering of oo systems. In: Proceedings of the conference on software maintenance and reengineering. IEEE Computer Society Press, pp 3–12
Bois BD, et al. (2007) Supporting reengineering scenarios with FETCH: an experience report. ECEASST 8
Bojic D, Velasevic D (2000) A use-case driven method of architecture recovery for program understanding and reuse reengineering. In: IEEE Conference on Software Maintenance and Reengineering, CSMR’00. pp 23–33
Boughanmi F (2010) Multi-language and heterogeneously-licensed software analysis. In: Proceedings of 17th working conference on reverse engineering. pp 293–296
Bowman IT, Holt R (1998) Software architecture recovery using conway’s law. In: Proceedings of the 1998 conference of the centre for advanced studies on collaborative research. CASCON ’98, 6 IBM Press
Canfora G, Czeranski J, Koschke R (2000) Revisiting the delta-ic approach to component recovery. In: Proceedings of the working conference on reverse engineering. IEEE Computer Society Press
Chiricota Y, Jourdan Y, Melanon F (2003) G. Software Components capture using graph clustering. In: Proceedings of the workshop on program comprehension. IEEE Computer Society Press, pp 217–226
Corazza A, et al. (2011) Investigating the use of lexical information for software system clustering, CSMR, IEEE Computer Society 35–44
DeBaud JM, Moopen B, Rugaber S (1994) Domain analysis and reverse engineering. ICSM IEEE Comput Soc 326–335
Ducasse S, Pollet D (2009) Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans Softw Eng 99(1)
Ducasse S, Tichelaar S (2003) Dimensions of reengineering environment infrastructures. Int J Softw Maint Res Pract 15:345–373
Feiler PH (2014) AADL and model-based engineering. Ada Lett ACM 34(3):17–18
Fischer M, Pinzger M, Gall H (2003) Analyzing and relating bug report data for feature tracking. In: Proceedings of the 10th working conference on reverse engineering, WCRE ’03. IEEE Computer Society, Washington, pp. 90–,
Fleck G, et al. (2016) Experience report on building astm based tools for multi-language reverse engineering. In: Proceedings of 23rd conference on software analysis, evolution, and reengineering, pp 283–687
Garcia J, et al. (2011) Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, ASE ’11, IEEE, pp 552–555
Garcia J, Popescu D, Mattmann C, Medvidovic N, Cai Y (2011) Enhancing architectural recovery using concerns, 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11. IEEE 552–555
Garcia J, Ivkovic I, Medvidovic N (2013) A comparative analysis of software architecture recovery techniques, 28th International Conference on Automated Software Engineering, ASE 2013. IEEE 486–496
Garcia J, Krka I, Mattmann C, Medvidovic N (2013) Obtaining ground-truth software architectures. In: Proceedings of the 2013 international conference on software engineering. ICSE ’13 IEEE Press, pp 901–910
Garlan D, Monroe R T, Wile D (1997) Acme: An architecture description interchange language. In: Proceedings of CASCON’97, Toronto, Ontario, pp 169–183
Imber M (1991) The CASE data interchange format (CDIF) standards. In: Long, F (ed) Software engineering environments, Ellis Horwood. pp 457–474
Jackson D (2012) Software Abstractions: logic, language, and analysis MIT press
Jerding D, Rugaber S (2000) Using visualization for architectural localization and extraction. Sci Comput Program:267–284
Kobayashi K, et al. (2012) Feature-gathering dependency-based software clustering using dedication and modularity. In: Proceedings of the 28th international conference on software maintenance. IEEE Computer Society, pp 462–471
Koschke R, Canfora G, Czeranski J (2006) Revisiting the approach to component recovery, Science of Computer Programming, Special Issue on Software Analysis, Evolution and Re-engineering, pp 171–188
Kruchten P (1995) The 4+1 view model of architecture. IEEE Softw 12(6):42–50
Kuhn H W (1955) The Hungarian method for the assignment problem. Nav Res Logist Q:83–97
Lethbridge T, Tichelaar S, Ldereder E (2004) The dagstuhl middle metamodel: A schema for reverse engineering. Electr Notes Theor Comput Sci 94:7–18
Lung CH (1998) Software architecture recovery and restructuring through clustering techniques. In: Proceedings of the third international workshop on software architecture, ISAW ’98, ACM, pp 101–104
Lung C-H (1998) Software architecture recovery and restructuring through clustering techniques. In: Proceedings of the Third International Workshop on Software Architecture. Association of Computing Machinery (ACM) Press, pp 101–104
Lungu M, Lanza M, Nierstrasz O (2014) Evolutionary and collaborative software architecture recovery with softwarenaut. Sci Comput Program 79:204–223. In: Proceedings of the Conference on Software Maintenance and Reengineering, CSMR ’00, pages 23–, Washington, DC, USA, 2000 IEEE Computer Society
Lutellier T, Chollak D, Garcia J, Tan L, Rayside D, Medvidović N, Kroeger R (2015) Comparing software architecture recovery techniques using accurate dependencies. In: Proceedings of the 37th international conference on software engineering - vol 2, ICSE ’15. IEEE Press, Piscataway, pp 69–78
Mancoridis S, et al. (1999) Bunch: A clustering tool for the recovery and maintenance of software system structures. In: Proceedings IEEE international conference on software maintenance. IEEE Computer Society Press, pp 50–59
Mancoridis S, Holt R C (1996) Recovering the structure of software systems using tube graph interconnection clustering. In: Proceedings of international conference on software maintenance 1996. IEEE, pp 23–32
Mahdavi K, Harman M, Hierons RM (2003) A multiple hill climbing approach to software module clustering. In: Proceedings of the international conference on software maintenance, september. IEEE Computer Society Press, pp 315–324
Maqbool O, Babri HA (2004) The weighted combined algorithm: A linkage algorithm for software clustering. In: Proceedings of the conference on software maintenance and reengineering. IEEE Computer Society Press, pp 15–24
Maqbool O, Babri H (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng IEEE:759–780
Medvidovic N (1995) Formal definition of the chiron-2 software architectural style
Mendonça NC, Kramer J (2001) An approach for recovering distributed system architectures, Automated Software Engg, Kluwer Academic Publishers, pp 311–354
Muller H, Wong K, Tilley S (1992) A reverse engineering environment based on spatial and visual software interconnection models, ACM SIGSOFT Symposium on Software Development Environments. Association of Computing Machinery (ACM) Press, pp 88–98
Murphy GC, Notkin D, models K S (1995) Software reflexion Bridging the gap between source and high-level models. SIGSOFT Softw Eng Notes 20(4):18–28
Overbey JL, Johnson RE (2008) Generating rewritable abstract syntax trees. In: Gaševic D, Lämmel R, Wyk E V (eds) Software language engineering: first international conference (SLE 2008), Vol. 5452 of Lecture Notes in Computer Science. Springer, Berlin, pp 114–133
Patel C, Hamou-Lhadj A, Rilling J (2009) Software clustering using dynamic analysis and static dependencies. In: 13th European conference on software maintenance and reengineering, CSMR 2009, Architecture-Centric Maintenance of Large-SCale Software Systems, Kaiserslautern Germany, 24-27 March 2009, pp 27–36
Pinzger M et al. (2004) Architecture recovery for product families. In: van der Linden F (ed) Software product-family engineering, Lecture notes in computer science, vol 3014. Springer, Berlin, pp 332–351
Rayside D, Reuss S, Hedges E, Kontogiannis K (2000) The effect of call graph construction algorithms for object-oriented programs on automatic clustering. In: Proceedings workshop on program comprehension, 2000. Proceedings. IWPC, pp 191–200
Sartipi K, Kontogiannis K (2001) A graph pattern matching approach to software architecture recovery. In: Proceedings of the IEEE international conference on software maintenance (ICSM’01). IEEE Computer Society, pp 408–419
Sartipi K, Kontogiannis K (2003) A user-assisted approach to component clustering. J Softw Maint Evol Res Pract 15(4):265–295
Tilley S, Weiderman N, Woods S, Bergey J, Smith D (1999) Why reengineering projects fail. In: TECHNICAL REPORT CMU SEI99TR010 ESCTR99010. Software Engineering Institute
Tzerpos V, Holt RC (1996) A hybrid process for recovering software architecture CASCON
Tzerpos V, Holt RC (2000a) ACDC: An algorithm for comprehension-driven clustering. In: Proceedings of the seventh working conference on reverse engineering. IEEE, pp 258–267
Tzerpos V, Holt RC (2000b) On the stability of software clustering algorithms, 8th international workshop on program comprehension (IWPC 2000). IEEE:211–218
van Deursen A, Kuipers T (1999) Identifying objects using cluster and concept analysis. In: Proceedings of the international conference on software engineering. Association of Computing Machinery (ACM) Press, pp 246–255
Van Rompaey B, et al. (2009) SERIOUS: software evolution, refactoring, improvement of operational and usable systems. In: CSMR. IEEE Computer Society, pp 277–280
Van Rompaey B, Demeyer S (2009) Establishing traceability links between unit test cases and units under test. In: 13th European conference on software maintenance and reengineering, CSMR 2009, Architecture-Centric Maintenance of Large-SCale Software Systems. Kaiserslautern Germany, pp 209–218
Vasconcelos A, Werner C (2004) Software architecture recovery based on dynamic analysis. In: XVIII Brazilian symposium on software engineering. Workshop on Modern Software Maintenance
Wilson RJ (1986) Introduction to graph theory. Wiley
Wu J, Hassan AE, Holt RC (2005) Comparison of clustering algorithms in the context of software evolution. ICSM IEEE Comput Soc:525–535
Wu J, Hassan AE, Holt RC (2005) Comparison of clustering algorithms in the context of software evolution. In: Proceedings of the 21st IEEE international conference on software maintenance. ICSM ’05 IEEE Computer Society, pp 525–535
Xing Z, Stroulia E (2005) UMLDiff: An Algorithm for Object-oriented Design Differencing. In: Proceedings of 20th international conference on automated software engineering. ACM, pp 54–65
Zou Y, Kontogiannis K (2001) Towards a portable xml-based source code representation. In: Proceedings of international conference on software engineering (ICSE) 2001 workshops of XML technologies and software engineering (XSE), Toronto, Canada
Acknowledgments
We would like to thank Stergios Ientsek for his contribution related to enhancements of the extraction tools and the production of RSF files, and the anonymous reviewers for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Paolo Tonella
Rights and permissions
About this article
Cite this article
Stavropoulou, I., Grigoriou, M. & Kontogiannis, K. Case study on which relations to use for clustering-based software architecture recovery. Empir Software Eng 22, 1717–1762 (2017). https://doi.org/10.1007/s10664-016-9459-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9459-z