Efficiently detecting structural design pattern instances based on ordered sequences
Introduction
Design patterns, proposed by GoF (Gang of Four: Gamma, Helms, Johnson and Vlisides), are meant to solve recurring design problems in object-oriented software systems, so as to improve the reusability, maintainability, comprehensibility, evolvability, and robustness of applications (Gamma et al., 1995). Although design patterns are not universally good or bad, they typically improve certain aspects of software quality if chosen wisely (Zhang, Budgen, 2012, Ampatzoglou, Frantzeskou, Stamelos, 2012). According to the definitions of GoF, design patterns can be classified as creational, which address object instantiation issues, structural, which concentrate on object composition and relations among the run-time object structures, and behavioural, which focus on the internal dynamics and object interaction in the system (DeLucia et al., 2010b).
Due to the short development cycle and ever-changing user requirements, many legacy software systems lack adequate documentation and are therefore very difficult to comprehend, modify and maintain. Fortunately, design patterns implemented in source code reveal much about the high-level abstract designs. Therefore, detecting design pattern instances from source code helps developers and maintainers to understand the original design and implementation (Walter, Alkhaeir, 2016, Scanniello, Gravin, Risi, 2015). Moreover, finding out which design pattern was used and whether and where there are design patterns that are conflicting may be essential information for refactoring (Gaitani, Zafeiris, Diamantidis, Giakoumakis, 2015, Christopoulou, Giakoumakis, Zafeiris, Soukara, 2012, Zafeiris, Poulias, Diamantidis, 2016, Ouni, Kessentini, Oacute;Cinnéide, Sahraoui, Deb, Inoue 2017). To detect design pattern instances, a large number of methodologies, approaches and tools have been proposed during the past few years, among which many transform the source code and design patterns into directed weighted graphs, and then detect isomorphic sub-graphs as design pattern instances. Although it is relatively easy to obtain the structural elements of source code (e.g., classes, attributes, methods, relationships between classes) and transform them into graphs, such graph-based approaches always fail to execute efficiently, because detecting sub-graph isomorphism is an NP-complete problem (Garey and Johnson, 1979). However, we believe a real time response for detecting pattern instance is highly important. Otherwise, there will be no practical value especially when a tool for detecting pattern instance in source code is developed and integrated as a plugin of an IDE (Wang, Tzerpos, 2004, DeLucia, Deufemia, Gravino, Risi, 2010, Alnusair, Zhao, Yan, 2013). Unfortunately, how to detect pattern instances with quick response while keeping the good precision and recall still remains a challenging problem.
To address the above mentioned problem, in this paper we propose a novel approach called DePATOS for efficient discovery of directed isomorphic sub-graphs to detect structural design pattern instances. In software engineering, structural design patterns are design patterns that ease the design by identifying a simple way to realize relationships between entities. In our approach, firstly, the source code of the software system is parsed and then described in a series of XML files automatically. Afterwards, they are transformed into Class Relationship Graphs, in which the vertices represent classes (including interfaces) and the edges represent relationships between classes. Finally, the Class Relationship Graphs are traversed for detecting their sub-graphs according to the predefined Ordered Sequences of design patterns. The discovered sub-graphs are then regarded as the candidate design pattern instances. Because the Ordered Sequence gives a well designed search order with which the most representative (or the least frequently occurred) roles (or classes) are discovered first, our approach greatly reduces the search space during the process of detecting isomorphic sub-graphs. Meanwhile, it can easily handle the pattern variant, i.e., the pattern that has slight changes from the standard GoF pattern. Because the specification of a pattern variant can be derived from its standard one by adding, removing or modifying elements in its XML-formatted definition file, our approach itself remains unchanged for detecting instances of pattern variants because it does not care what the Ordered Sequence actually is.
The main contributions of our work are threefold. 1) We propose an efficient graph-based approach for the detection of design pattern instances, in which an optimized search order can be determined so that the most representative classes are discovered first while filtering out a large number of irrelevant classes. With significantly reduced search space, the performance can therefore be greatly boosted, especially on large-scaled software. 2) Our approach achieves the highest recall, i.e., 100% recall on the analyzed benchmark while keeping a high precision. In other words, we never miss even one instance as long as the Ordered Sequence for the design pattern is given, which is verified in both theory and practice. 3) We elaborate our approach in great detail, providing the whole process and all necessary information used to detect instances of all GoF structural design patterns and some frequently occurring variants, so that readers are able to follow it exactly and easily. Moreover, we conduct extensive experiments on both small-scale and large-scale software and present all the corresponding results. To the best of our knowledge, most current works in the field of detecting design patterns fail to do this.
The rest of the paper is structured as follows. After the discussion of state of the art in design pattern detection in 2 Related work, 3 Preliminaries introduces the definitions used in the proposed approach. Afterwards, in Section 4, we present the process of discovering design pattern instances, which covers four phases: modelling source code, defining design patterns, obtaining design patterns’ Ordered Sequences, and detecting candidate design pattern instances. Section 5 presents the experiments on six open-source software systems in detail. Following a discussion of threats that could affect the validity of our study in Section 6, the last section concludes the paper and outlines future work. Interested readers may refer to the appendices, in which Appendix I presents the structural profile of 30 open source projects and Appendix II presents the formal definitions, the Class Relationship Graphs and the Ordered Sequences of all GoF structural design patterns and some frequently occurring variants, which are employed in our proposed approach.
Section snippets
Related work
During the past years, design patterns have attracted much attention among software engineers and researchers, as the patterns are the experiences of the experts of the domain captured in a higher-level abstraction. A vast amount of efforts has been put into the field of design patterns, among which pattern detection is one of the most active topics (Mayvan et al., 2017).
Preliminaries
In this section, we give a brief introduction of the concepts used in our proposed approach.
The process of detecting pattern instances
For DePATOS, the process of detecting design pattern instances comprises the following four main phases:
- (1)
Modelling system source code
The source code of the software system is scanned and then transformed into a Class Relationship Graph (called CRGs), in which the vertices represent classes (including interfaces) and the edges represent relationships between classes.
- (2)
Constructing repository of design pattern definitions
All standard GoF structural design patterns and their common variants are
Experiments
To evaluate DePATOS, we performed a series of experiments on six open-source systems. This section reports and discusses the tested system, testing baseline, testing results and running performance in detail.
Threats to validity
This section discusses the main threats to validity. Generally speaking, as for our approach, several aspects need to be considered, such as construct validity, internal validity and external validity.
Construct validity is the appropriateness of inferences made on the basis of observations. In the context of our study, this is mainly in relation to how the precision and recall are measured. In order to calculate the recall, we determine a Gold Standard for each design pattern and each analysed
Conclusion and future work
Design patterns in the object-oriented programming domain represent important architectural information that can support a rapid understanding of software design. Detecting instances of design patterns in source code helps facilitate the understanding and maintenance of legacy systems. Unfortunately, how to detect pattern instance with quick response while keeping the good precision and recall still remains challenging.
In this paper, we have presented an efficient approach called DePATOS for
Acknowledgments
The work is supported by Natural Science Foundation of China (No. 61100043, No. 61702144), Zhejiang Provincial Natural Science Foundation (No.LY12F02003), the Key Project of Science and Technology of Zhejiang Province (No. 2017C01010).
Dongjin Yu is currently a professor at Hangzhou Dianzi University, China. His research efforts include program comprehension, software architecture, business process management, business intelligence and big data. He is especially interested in the novel approaches to constructing enterprise information systems effectively and efficiently by emerging advanced information technologies. He is the director of Computer Software Institute of Hangzhou Dianzi University. He is a member of ACM and
References (54)
- et al.
A methodology to assess the impact of design patterns on software quality
Inf. Softw. Technol.
(2012) - et al.
Source code and design conformance, design pattern detection from source code by classification approach
Appl. Soft Comput.
(2015) - et al.
Automated refactoring to the strategy design pattern
Inf. Softw. Technol.
(2012) - et al.
Automated refactoring to the null object design pattern
Inf. Softw. Technol.
(2015) - et al.
Documenting design-pattern instances: a family of experiments on source-code comprehensibility
ACM Trans. Softw. Eng. Methodol.
(2015) - et al.
DPVK - an eclipse plug-in to detect design patterns in eiffel systems
Electron. Notes Theoret.Comput. Sci.
(2004) Declarative reasoning about the structure of object-oriented systems
Technol Object-Oriented Lang.
(1998)- et al.
From sub-patterns to patterns: an approach to the detection of structural design pattern instances by subgraph mining and merging
IEEE 37th Annual Computer Software and Applications Conference, COMPSAC
(2013) - et al.
What do we know about the effectiveness of software design patterns
IEEE Trans. Softw. Eng.
(2012) - et al.
Rule-based detection of design patterns in program code
Int. J. Softw. Tools Technol. Transf.
(2013)
Building and mining a repository of design pattern instances: Practical and research benefits
Entertainment Comput.
A model-driven graph-matching approach for design pattern detection
20th Working Conference on Reverse Engineering
Design pattern detection using a DSL-driven graph matching approach
J. Softw.
Search strategies for subgraph isomorphism algorithms
J. Multivariate Anal.
A (sub) graph isomorphism algorithm for matching large graphs
IEEE Trans. Pattern Anal. Mach. Intell.
Towards automating dynamic analysis for behavioral design pattern detection
IEEE Int. Conf. Softw. Maintenance Evol.
A two phase approach to design pattern recovery
the European Conference on Software Maintenance and Reengineering
An eclipse plug-in for the detection of design pattern instances through static and dynamic analysis
IEEE Int. Conf. Softw. Maintenance
Improving behavioral design pattern detection through model checking
26th IEEE International Conference on Software Maintenance
Detecting the behavior of design patterns through model checking and dynamic analysis
ACM Trans. Softw. Eng. Methodol.
DP-miner: Design pattern discovery using matrix
ECBS’07
A matrix-based approach to recovering design patterns
IEEE Trans. Syst. Man Cybern. Part A
Design pattern mining enhanced by matching learning
21st IEEE international conference of software maintenance (ICSM)
Understanding the relevance of micro-structures for design patterns detection
J. Syst. Soft.
Design Patterns: Elements of Reusable Object Oriented Software
Computers and intractability: A guide to the theory of NP-completeness
Graphgrep: A fast and universal method for querying graphs
Int. Conf. Pattern Recognit.
Cited by (30)
Feature-based software design pattern detection
2022, Journal of Systems and SoftwareCitation Excerpt :During the past years, with the growing amount of electronically available information, there is substantial interest and a substantial body of work among software engineers and academic researchers in design pattern detection. A majority of the approaches to the detection of design patterns transform the source code and design patterns into some intermediate representations such as rules, models, graphs, productions and languages (Yu et al., 2018). For example, Bernardi et al. (2013) exploited a meta-model which contains a set of properties characterising the structures and behaviours of the source code and design patterns and a matching algorithm is performed to identify the implemented patterns.
GEML: A grammar-based evolutionary machine learning approach for design-pattern detection
2021, Journal of Systems and SoftwareCitation Excerpt :We provide results from 10 out of the 11 Gamma’s DPs available in this project, since the rest of P-Mart projects cannot provide training instances for the Prototype pattern. Three recent DPD methods are chosen for comparison: DePATOS, which detects structural patterns using a sub-graph isomorphism algorithm (Yu et al., 2018); MLDA, a rule-based approach that analyses method signatures (Al-Obeidallah et al., 2018); and SparT, a method based on ontologies that combines structural, behavioural and semantic information (Xiong and Li, 2019). All these works provide results on JHotDraw that can be contrasted, since their authors both report absolute values of recovered instances, and give access to the DP implementations found.
Development of sandbox components with microservices architecture and design patterns in games
2021, Procedia Computer ScienceMatching UML class models using graph edit distance
2019, Expert Systems with ApplicationsCitation Excerpt :In this approach the class model is represented as vertices and edges of a graph. An attributed graph is often used to pertain the properties of classes and relations (Bernardi et al., 2014) or to enhance processing (Mayvan & Rasoolzadegan, 2017; Yu et al., 2018). In the work of Mayvan and Rasoolzadegan (2017) the class relation types are encoded as prime numbers assigned to edges and the vertex weight is computed as the multiplication of incident edge weights.
A new benchmark for evaluating pattern mining methods based on the automatic generation of testbeds
2019, Information and Software TechnologyCitation Excerpt :For example, the description of the GoF (Gang of Four) design patterns [31], as the most popular patterns [5], consists of ten different parts: (1) pattern name, (2) intent (description of the problem that the pattern seeks to solve), (3) motivation (a scenario illustrating a problem in a design), (4) applicability (situations in which the pattern can be applied), (5) participants (pattern roles i.e. participating objects and classes in the pattern), (6) structure (a diagram of the participants in the pattern structure), (7) collaborations (how participants collaborate in the pattern), (8) consequences (trade-offs and results of applying the pattern), (9) implementation (issues regarding the implementation of the pattern), and (10) related patterns (other patterns which closely relate to the pattern) [31]. Participants, Structure, and Collaborations are the most commonly used parts in the process of detecting design patterns [3,6,22,56,81]. Design patterns’ variants: Design patterns mainly involve crosscutting structures in the relationship between pattern roles and classes in a real implementation.
Dongjin Yu is currently a professor at Hangzhou Dianzi University, China. His research efforts include program comprehension, software architecture, business process management, business intelligence and big data. He is especially interested in the novel approaches to constructing enterprise information systems effectively and efficiently by emerging advanced information technologies. He is the director of Computer Software Institute of Hangzhou Dianzi University. He is a member of ACM and IEEE, and a senior member of China Computer Federation (CCF). He is also a member of Technical Committee of Software Engineering CCF (TCSE CCF) and a member of Technical Committee of Service Computing CCF (TCSC CCF).
Ping Zhang is currently a postgraduate at Hangzhou Dianzi University in China. Her research effort mainly focuses on graph-based design pattern mining and detection of code clones.
Jiazha Yang is currently a postgraduate at Hangzhou Dianzi University in China. His current research interests mainly include program comprehension and design pattern mining.
Zhenli Chen received his master and bachelor degrees in computer science from Hangzhou Dianzi University, China. He has participated in some government funded projects related with software engineering. His current research interests mainly include program comprehension and information retrieval.
Chengfei Liu received the BS, MS and Ph.D degrees in Computer Science from Nanjing University, China in 1983, 1985 and 1988, respectively. Currently he is a Professor in the Department of Computer Science and Software Engineering, Swinburne University of Technology, Australia. His current research interests include keyword search on structured data, graph data management, social networks, query processing and refinement for advanced database applications, and data-centric workflows. He is a member of IEEE, and a member of ACM.
Jie Chen is an Assistant Professor in the College of Computer Science at Hangzhou Dianzi University, China. She received her BS degree in Software Engineering from Xiamen University (XMU) in 2009, also a BS degree (secondary one) in Economics from Department of Economics at the same university. She received the Ph.D degree from the Lab of Internet Software Technologies, Institute of Software Chinese Academy of Sciences (ISCAS) in 2016. She was a visiting scholar in the Department of Computer Science, University of Massachusetts Amherst from September 2012 to September 2013. Her research interests is in software process simulation, resource scheduling and code analysis.