Automated clustering to support the reflexion method
Introduction
Software architecture is described by many views. The most popular view addressed in research is the module view [1]. The module view describes the modules of a system, their layering and composition into subsystems, and the provided and required interfaces of these elements. The module view is required for many purposes such as allocating working packages to teams, global change impact analysis, and evaluating the maintainability of the system.
Far too often, the module view that was initially designed does not reflect the real implementation due to changes made in the source without updating the documented module view. Murphy and colleagues [2] developed the reflexion model technique to reconstruct the mapping from the specified or hypothesized decomposition to the concrete module view. The basic idea of the reflexion model is to create a hypothesized view from existing documentation or interviews with architects. Source entities are extracted from a system (global variables, routines, types, classes, interfaces, packages, files, subdirectories, etc.) along with their respective dependencies forming the concrete module view. These elements are mapped to the hypothesized view. A tool then computes resemblances and differences between the two views. Iteratively, the hypothesized and concrete views and/or the mapping are refined based on the findings.
The technique was successfully used in several case studies. The most interesting case study – reported by Murphy and Notkin [3] – is the analysis of Microsoft Excel, which consists of about 1.2 MLOC of C code. Koschke and Simon extended the original reflexion model, so that hypothesized modules can be hierarchical, and applied it to two different compilers [4].
The most challenging part of the reflexion method is to determine the mapping of concrete source entities onto the hypothesized entities of the hypothesized model. The original reflexion method does not provide any support for this – although sometimes naming conventions may be leveraged. Unfortunately, naming conventions often do not exist or are used inconsistently.
The key point of the reflexion method is to start with an initial hypothesis on the expected module view and then to validate the hypothesis against the implementation. In contrast, software clustering techniques group source entities together – typically based on some notion of coupling and cohesion – to form hypothesized entities. The advantage of clustering techniques is that they can be completely automated. Yet, these techniques are not targeted towards the expectations of the analyst and often fail to find the components a human would find [5].
Contributions. In an earlier publication [6], we proposed combining the reflexion method with a Human-Guided Mapping Generation Method (HuGMe) to accommodate the automatic clustering of the source model with the user’s hypothesized knowledge about the system architecture. This paper extends the previous paper by evaluating variations of automatic clustering analysis techniques for turning the manual mapping activity into a semi-automated approach in more depth. Two clustering techniques are adjusted to create additional candidate mappings based on a partial mapping and the targeted hypothesized model. We compare and evaluate these variations with an oracle mapping for three case studies and also summarize the results of the case study from the earlier publication [6]. We closely study the influence of the degree of completeness of the existing mapping to make reliable suggestions.
The remainder of the paper is organized as follows. Section 2 describes the original reflexion method and other related research on automated software clustering. Section 3 describes how to integrate automated clustering techniques into the reflexion method. The experimental setup to evaluate the support of two clustering techniques is introduced in Section 4. Section 5 uses this experimental setup for three case studies to investigate various factors of influence. Section 6 states known assumptions and limitations of the method, and Section 7 provides our concluding thoughts.
Section snippets
Related research
This section describes related research. We start with a detailed description of the reflexion technique and introduce concepts used in the description of our extension. We then summarize research in the wider area of software clustering.
Integration of reflexion method and clustering techniques
This section describes how we have integrated automated clustering techniques into the traditional reflexion method which resulted in HuGMe, our new combined approach.
Evaluation scheme
In this section we provide details on the evaluation scheme used for the following case studies. Before defining the variables used, we first provide an informal description of the basic evaluation approach. To compare the effectiveness of the two attraction functions, we need to determine:
- •
Is a free concrete entity mapped at all?
- •
If it is mapped, is the mapping correct?
Case studies
As a follow-on to our previous work [6], we performed three additional case studies of varying size, implementation language, and application domains to evaluate the semi-automated mapping of HuGMe and the underlying clustering alternatives. In this section, we describe the three new case studies and then provide a brief summary of the results from the earlier study of a Java program. We then compare the results and discuss the findings from the four studies.
To be able to replicate our study,
Limitations of the method
Both attraction functions derive the attraction values from source relationships between concrete entities and hypothesized dependencies between hypothesized entities. This approach shares the same drawbacks as other clustering techniques based on source dependencies. Similar to those techniques, our clustering algorithm yields hypothesized entities featuring high cohesion and low coupling. As stated by Andritsos et al. [42], this approach is problematic when the developers of the system did
Conclusions
Our case study demonstrates the supportive aspect of clustering techniques for establishing the reflexion mapping. The clustering technique of the HuGMe method was able to achieve a mapping quality where a very high fraction of the automatic mapping decisions turned out to be correct. Moreover, the existence of conceptual components and dependencies simplifies automated clustering as it has a more focused target (the conceptual components expected by a human analyst) and can leverage existing
Acknowledgements
We thank Chris Callendar for the hypothesized view and associated mappings for the Tetris case study, Ian Bull for technical support and Jody Ryall for editing assistance. We also thank the anonymous reviewers for their helpful comments and suggestions.
References (46)
- et al.
Software salvaging and the call dominance tree
Journal of Systems and Software
(1995) - et al.
Applied Software Architecture Object Technology Series
(2000) - G. C. Murphy, D. Notkin, K. Sullivan, Software reflexion models: bridging the gap between source and high-level models,...
- G. C. Murphy, D. Notkin, Reengineering with reflexion models: A case study, IEEE Computer 30 (8) (1997) 29–36,...
- et al.
Hierarchical reflexion models
- R. Koschke, Atomic architectural component recovery for program understanding and evolution, Ph.d. thesis, University...
- et al.
Equipping the reflexion method with automated clustering
- et al.
A case study of applying an eclectic approach to identify objects in code
- et al.
Extracting and restructuring the design of large systems
IEEE Software
(1990) - et al.
System structure analysis: clustering with data bindings
IEEE TSE
(1985)
Identifying objects in a conventional procedural language: an example of data design recovery
A new approach to finding objects in programs
Journal Software Maintenance and Evolution
An object finder for program structure understanding in software maintenance
Journal Software Maintenance and Evolution
A measure for composite module cohesion
Applying concept formation methods to object identification in procedural code
A graph-based object identification process for procedural programs
Recovering abstract data types and object instances from a conventional procedural language
Finding components in a hierarchy of modules: a step towards architectural understanding
Using neural networks to modularize software
Machine Learning
Using automatic clustering to produce high-level system organizations of source code
Extracting concepts from file names: a new file clustering criterion
A metric-based approach to detect abstract data types and state encapsulations
Journal Automated Software Engineering
Cited by (46)
To automatically map source code entities to architectural modules with Naive Bayes
2022, Journal of Systems and SoftwareCitation Excerpt :We implement the attraction functions in Java as part of our open-source tool suite for architectural analysis10 (Olsson et al., 2021). The implementation of CountAttract is based on the description in Christl et al. (2007), and the implementations of IRAttract and LSIAttract are based on the descriptions in Bittencourt et al. (2010). Since the implementations are based on the textual descriptions and not source code, we cannot be certain that our implementations are correct, but we find similar results provided in the publications to validate the algorithms.
The WGB method to recover implemented architectural rules
2018, Information and Software TechnologyUsing Automatically Recommended Seed Mappings for Machine Learning-Based Code-to-Architecture Mappers
2023, Proceedings of the ACM Symposium on Applied ComputingOptimized Machine Learning Input for Evolutionary Source Code to Architecture Mapping
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Investigating the Effect of Partial and Real-Time Feedback in INMAP Code-to-Architecture Mapping
2023, Proceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023An Integrated Approach to Package and Class Code-to-Architecture Mapping Using InMap
2023, Proceedings - IEEE 20th International Conference on Software Architecture, ICSA 2023