Design pattern recovery through visual language parsing and source code analysis

https://doi.org/10.1016/j.jss.2009.02.012Get rights and content

Abstract

In this paper we propose an approach for recovering structural design patterns from object-oriented source code. The recovery process is organized in two phases. In the first phase, the design pattern instances are identified at a coarse-grained level by considering the design structure only and exploiting a parsing technique used for visual language recognition. Then, the identified candidate patterns are validated by a fine-grained source code analysis phase. The recognition process is supported by a tool, namely design pattern recovery environment, which allowed us to assess the retrieval effectiveness of the proposed approach on six public-domain programs and libraries.

Introduction

A design pattern can be seen as a set of classes, related through inheritances, aggregations and delegations, which represents a partial solution to a common non-trivial design problem (Gamma et al., 1995). Design patterns are widely used to separate an interface from the different possible implementations, to wrap legacy systems, to encapsulate command requests, to use different platforms, and so on (Gamma et al., 1995). They represent a useful technique in forward engineering since they allow reusing successful practices, to improve communication between designers, and to share knowledge between software engineers. However, patterns can also be used for reverse engineering OO software systems in order to capture relevant information on the design and code, and improve program understanding (Antoniol et al., 2001, Brown, 1996, Niere et al., 2002, Shull et al., 1996, Tsantalis et al., 2006a). As a matter of fact, the use of patterns during the design phases affects the corresponding code, and the extraction of design pattern information from design and code can help the comprehension of the adopted solution for a system. This information can be used to highlight wished properties of the design model, which can be reused whenever a similar problem is encountered. Indeed, as also highlighted in Antoniol et al. (2001) when a software system has been designed using documented and well-known design patterns they can exhibit good properties such as modularity, separation of concerns, and ease of extension. Moreover, the information on the recovered design patterns can improve the system documentation and can guide the restructuring of the system. As a matter of fact, the recovery of design pattern instances from design documents and corresponding source code can be crucial for the identification of traceability links between different software artifacts in order to make the code easier to maintain and modify. In particular, this information can be profitably exploited to highlight the rationale of implemented solutions in order to support and simplify the conceptual modeling of the system to be restructured (Antoniol et al., 2001).

According to (Gamma et al., 1995), design patterns are classified as structural, which concentrate on object composition and their relations in the run-time object structures, creational, which address object instantiation issues, and behavioral, which focus on the internal dynamics and object interaction in the system. In this paper we present an approach to recover structural design patterns from OO source code, which is based on the use of visual language grammars and parsing techniques. A preliminary analysis is carried out to extract the structural information needed to recover design patterns. In particular, the class diagram information, such as the name and type of classes, methods, and fields, inheritance and association relationships, and so on, are stored in a suitable data structure that speeds up the recovery process. The recovery process combines a diagram-level analysis, by using a parser for visual languages, with a source code level analysis. In particular, the recovery process is organized in two phases. In the first phase, design pattern instances are identified based on the design structure only by using a recovery technique based on visual language parsing (Costagliola et al., 2005). The design pattern recovery problem is reduced to the problem of recognizing subsentences in a class diagram, where each subsentence corresponds to a design pattern specified by a grammar. In the second phase the identified candidate patterns are validated by performing a source code analysis, which eliminates false positives and consequently increases the precision (Salton and McGill, 1983) of the recovery approach. To validate the proposed design pattern recovery approach, we have developed a tool, named Design Pattern Recovery Environment (DPRE), which supports the whole recovery process.

In this paper, we extend the work presented in Costagliola et al., 2005, Costagliola et al., 2006, De Lucia et al., 2007 by:

  • presenting a recovery technique supporting design pattern definitions that include multi-level inheritance relationships;

  • providing a detailed description of the proposed approach, including the visual parsing phase and the source code analysis phase;

  • presenting a classification and an analysis of the design pattern recovery approaches proposed in the literature;

  • evaluating the approach and tool on six public-domain software systems and libraries of different size, ranging from 8 to 560 KLOC;

  • providing a detailed comparison with related approaches that used the same software systems for the evaluation.

The paper is organized as follows. In Section 2, we describe related work on design pattern recovery. Section 3 presents the proposed design pattern recovery process while Section 4 describes the tool DPRE supporting it. The results of the case studies are reported and discussed in Section 5. Conclusion and future work are given in Section 6.

Section snippets

Related work

In this section we present a discussion of related work by considering the employed pattern identification strategy, the representation used for coding design patterns, the kind of support they provide for recognition (i.e., manual, semi-automatic or automatic pattern recovery), the type of design patterns they are able to recover, the software analyzed to assess the effectiveness of the proposed pattern recovery strategies, and the precision values obtained. This information is summarized in

The design pattern recovery process

A design pattern is composed of a small number of classes that, through delegation and inheritance, provides a robust and modifiable solution (Gamma et al., 1995). Design patterns are classified as structural, which concentrate on object composition and their relations in the run-time object structures, creational, which address object instantiation issues, and behavioral, which focus on the internal dynamics and object interaction in the system.

The design pattern recovery process we propose

DPRE: a tool for structural DP recovery

In the following we present the DPRE tool supporting the proposed design pattern recovery process. The tool is implemented in Java and the latest version 1.4a is downloadable from http://www.sesa.dmi.unisa.it/dpr.

Fig. 4.1 shows a screen-shot of the DPRE during the design pattern recovery process of JHotDraw 5.1, one of the case studies presented in following section. DPRE allows users to select (by clicking on the Browse button) the directory containing the source code and to accomplish the

Case studies

In order to assess the effectiveness of the proposed design pattern recovery approach, we have carried out a set of case studies by considering public software and libraries. In the following, we present the obtained results and discuss them also considering the results obtained by other approaches on the same software systems.

Conclusions and future work

Software system maintenance requires a deep comprehension of the existing system in order to modify and integrate it with new or changing requirements. Design patterns represent useful architectural information that can support a rapid understanding of software design and source code. In reverse engineering of OO software systems they allow to capture relevant information which help the comprehension of the adopted solution (Antoniol et al., 2001, Brown, 1996, Niere et al., 2002, Shull et al.,

Andrea De Lucia received the Laurea degree in Computer Science from the University of Salerno, Italy, in 1991, the MSc degree in Computer Science from the University of Durham, U.K., in 1996, and the PhD in Electronic Engineering and Computer Science from the University of Naples ‘Federico II’, Italy, in 1996. He is a full professor of Software Engineering and the Director of the International Summer School on Software Engineering at the Department of Mathematics and Informatics of the

References (49)

  • G. Antoniol et al.

    Object-oriented design pattern recovery

    Journal of Systems and Software

    (2001)
  • G. Costagliola et al.

    A classification framework to support the design of visual languages

    Journal of Visual Languages and Computing

    (2002)
  • G. Antoniol et al.

    Inference of object-oriented design patterns

    Journal of Software Maintenance and Evolution: Research and Practice

    (2001)
  • Apache Ant,...
  • Aversano, L., Canfora, G., Cerulo, L., Del Grosso, C., Di Penta M., 2007. An empirical study on the evolution of design...
  • Balanyi Z., Ferenc, R., Mining design patterns from C++ source code. In: Proceedings of International Conference on...
  • Baxter, I.D., Yahin, A., Moura, L., Sant’Anna, M., Bier, L., 1998. Clone detection using abstract syntax trees. In:...
  • Beyer, D., Lewerentz, C., 2003. CrocoPat: efficient pattern analysis in object-oriented programs. In: Proceedings of...
  • D. Beyer et al.

    Efficient relational calculation for software analysis

    Transactions on Software Engineering

    (2005)
  • Brown, K., 1996. Design Reverse-Engineering and Automated Design Pattern Detection in Smalltalk, Master Thesis, North...
  • Celenta, S., De Lucia, A., Deufemia, V., Gravino, C., Risi, M., 2006. Analyzing software evolution through design...
  • Costagliola, G., De Lucia, A., Deufemia, V., Gravino, C., Risi, M., 2005. Design pattern recovery by visual language...
  • Costagliola, G., De Lucia, A., Deufemia, V., Gravino, C., Risi, M., 2006. Case studies of visual language based design...
  • G. Costagliola et al.

    A parsing methodology for the implementation of visual systems

    IEEE Transactions on Software Engineering

    (1997)
  • G. Costagliola et al.

    A framework for modeling and implementing visual notations with applications to software engineering

    ACM Transactions on Software Engineering and Methodology

    (2004)
  • Crahen, E., Alphonce, C., Ventura, P., 2002. QuickUML: a beginner’s UML tool. In: Proceedings of ACM SIGPLAN Conference...
  • De Lucia, A., Deufemia, V., Gravino, C., Risi, M., 2007. A two phase approach to design pattern recovery. In:...
  • De Lucia, A., Deufemia, V., Gravino, C., Risi, M., 2009. Behavioral pattern identification through visual language...
  • Dong, J., Lad, D.S., Zhao, Y., 2007. DP-miner: design pattern discovery using matrix. In: Proceedings of IEEE...
  • Dong, J., Zhao, Y., 2007. Experiments on design pattern discovery. In: Proceedings of International Workshop on...
  • DPD4RE: First International Workshop on Design Patterns Detection for Reverse Engineering, Benevento, Italy, October...
  • Eclipse JDT,...
  • Ferenc, R., Beszedes, A., Tarkiainen, M., Gymothy, T., 2002. Columbus – reverse engineering tool and schema for C++....
  • Fülöp, L.J., Ferenc, R., Beszedes, A., Lelle, J., 2005. Design Pattern Mining Enhanced by Machine Learning. In:...
  • Cited by (84)

    • GEML: A grammar-based evolutionary machine learning approach for design-pattern detection

      2021, Journal of Systems and Software
      Citation Excerpt :

      As the authors acknowledge, incomplete definitions or inappropriate semantics negatively impact the detection process. Relations defined on the basis of a visual language represent another way to specify the properties of structural DPs (Lucia et al., 2009). In order to reduce false positives of the detection process, this method supports the definition of negative criteria, i.e., those properties that do not indicate the presence of a DP.

    • A new benchmark for evaluating pattern mining methods based on the automatic generation of testbeds

      2019, Information and Software Technology
      Citation Excerpt :

      Parsing-based: parsing-based approaches usually use the visual language grammar of the design patterns to obtain their corresponding graph representation. Afterwards, they utilize visual language parsing methods to mine pattern instances [15,17–19,48]. Miscellaneous: there are some other techniques which cannot be categorized in the above groups, such as machine learning [14], bit vector compression [35], and model checking [8,10] techniques.

    View all citing articles on Scopus

    Andrea De Lucia received the Laurea degree in Computer Science from the University of Salerno, Italy, in 1991, the MSc degree in Computer Science from the University of Durham, U.K., in 1996, and the PhD in Electronic Engineering and Computer Science from the University of Naples ‘Federico II’, Italy, in 1996. He is a full professor of Software Engineering and the Director of the International Summer School on Software Engineering at the Department of Mathematics and Informatics of the University of Salerno, Italy. Previously he was at the Research Centre on Software Technology (RCOST) of the University of Sannio, Italy. Prof. De Lucia is actively consulting in industry and has been involved in several research and technology transfer projects conducted in cooperation with industrial partners. His research interests include software maintenance, program comprehension, reverse engineering, reengineering, migration, global software engineering, software configuration management, workflow management, document management, empirical software engineering, visual languages, web engineering, and e-learning. He has published more than 100 papers on these topics in international journals, books, and conference proceedings. He has also edited books and special issues of international journals and serves on the editorial and reviewer boards of international journals and on the organizing and program committees of several international conferences in the field of software engineering. Prof. De Lucia is a member of the IEEE, the IEEE Computer Society, and the executive committee of the IEEE Technical Council on Software Engineering.

    Vincenzo Deufemia graduated in Computer Science (cum laude) in 1999. He received the PhD degree in Computer Science from the University of Salerno, Italy, in 2003. Since 2006 he is assistant professor in the Department of Mathematics and Informatics at Salerno University. His main research focuses on recovery of design pattern from source code, sketch understanding, visual languages, parsing technologies, and data warehousing. On these topics, he published several peer-reviewed articles in international journals, books, and conference proceedings. He has served as program committee member for several international conferences.

    Carmine Gravino received the Laurea degree in Computer Science (cum laude) in 1999, and his PhD in Computer Science from the University of Salerno (Italy) in 2003. Since march 2006 he is assistant professor in the Department of Mathematics and Informatics at the University of Salerno. His research interests include software metrics to estimate web application development effort, software-development environments, and design pattern recovery from object-oriented code.

    Michele Risi received the Laurea degree in computer science in 2001 and the PhD degree in computer science from the University of Salerno, Italy, in 2005. He is currently a research fellow in the Department of Mathematics and Informatics at the University of Salerno. His research interests include grammar formalisms and parsing techniques for visual languages, sketch understanding, design pattern recovery from object-oriented code, reverse engineering of web applications and human-computer interaction in 3D visualization.

    View full text