A case study of TTCN-3 test scripts clone analysis in an industrial telecommunication setting

https://doi.org/10.1016/j.infsof.2017.01.008Get rights and content

Abstract

Context: This paper presents a novel experiment focused on detecting and analyzing clones in test suites written in TTCN-3, a standard telecommunication test script language, for different industrial projects.

Objective: This paper investigates frequencies, types, and similarity distributions of TTCN-3 clones in test scripts from three industrial projects in telecommunication. We also compare the distribution of clones in TTCN-3 test scripts with the distribution of clones in C/C++ and Java projects from the telecommunication domain. We then perform a statistical analysis to validate the significance of differences between these distributions.

Method: Similarity is computed using CLAN, which compares metrics syntactically derived from script fragments. Metrics are computed from the Abstract Syntax Trees produced by a TTCN-3 parser called Titan developed by Ericsson as an Eclipse plugin. Finally, clone classification of similar script pairs is computed using the Longest Common Subsequence algorithm on token types and token images.

Results: This paper presents figures and diagrams reporting TTCN-3 clone frequencies, types, and similarity distributions. We show that the differences between the distribution of clones in test scripts and the distribution of clones in applications are statistically significant. We also present and discuss some lessons that can be learned about the transferability of technology from this study.

Conclusion: About 24% of fragments in the test suites are cloned, which is a very high proportion of clones compared to what is generally found in source code. The difference in proportion of Type-1 and Type-2 clones is statistically significant and remarkably higher in TTCN-3 than in source code. Type-1 and Type-2 clones represent 82.9% and 15.3% of clone fragments for a total of 98.2%. Within the projects this study investigated, this represents more and easier potential re-factoring opportunities for test scripts than for code.

Introduction

Clone analysis involves finding similar code fragments in source code as well as interpreting and using the results to tackle design, testing, and other software engineering issues [1], [2], [3].

There are four main types of clones defined in the literature [4] as follows:

  • Type-1: identical code fragments, except for changes in whitespace, layout, and comments (changes affecting layout, with no impact on lexical or syntactic information). They are often referred to as “identical” clones.

  • Type-2: syntactically identical fragments except for variations in names of identifiers and types, and in values of literals. They possibly have the same characteristics of Type-1 clones with regards to changes in whitespace, layout, and comments.

    Some Type-2 clones are “parametric” clones [4], because they can be normalized by: replacing identifiers with a single parametric identifier (they represent instances of the “consistent renaming” pattern [5]); or replacing constants with a single parameter; or replacing types with a “generic” type or with a “template”. Also, they can be easily transformed using parametric changes, parametric templates, or parametric transformations. Type-2 clones that are not parametric may represent cases of an “inconsistent renaming” pattern.

  • Type-3: fragments with modifications such as changed, added, or removed statements, and possibly with some of the characteristics of Type-1 and Type-2 clones (changes affecting syntactic information). They are referred to as: “similar” (only) clones, “gapped” clones, or “near-miss” clones, in the literature.

  • Type-4: code fragments performing the same computation but implemented by different syntactic variants. In the literature, they are referred to as “semantic” clones.

Type-1, Type-2, and Type-3 clones are based on textual similarity, while Type-4 clones are based on semantic similarity. A finer classification of clone differences has been proposed in the context of object oriented refactoring [1] and a set of mutation based clone differences has been investigated in the context of clone-detection tool evaluation [5].

There are extensive publications on experiments and techniques for analyzing clones in source code. Clone detection in tests has been investigated using the Clone Miner and Clone Analyzer tools [6] on tests written in Java for Android [7].

In this paper, we analyze clones in proprietary industrial test scripts written in Testing and Test Control Notation Version 3 (TTCN-3) language [8]. To the best of our knowledge, this is the first study of TTCN-3 clones in test scripts.

TTCN-3 differs from application programming languages in many aspects. It is a language designed to write tests for telecommunication applications. It is standardized and maintained by the European Telecommunication Standards Institute (ETSI) and is also adopted by the International Telecommunication Union (ITU-T). As stated in its introduction [9]:

TTCN-3 provides all the constructs and features necessary for black box testing. It embodies: a rich typing system and powerful matching mechanisms, support for both message-based and procedure-based communication, timer handling, dynamic test configuration including concurrent test behaviour, the concept of verdicts and verdict resolution, and much more.

TTCN-3 also supports standard imperative constructs as C++ and Java, such as assignments, conditional statements (if-else), loops (for, while, do), break, continue, and functions [8], [9].

The TTCN-3 language is also promoted by the 3rd Generation Partnership Project (3GPP). As part of ISO 9646, the use of this language is widespread throughout the telecommunication industry. Thus, not only it is natural for a company like Ericsson to use TTCN-3, but it is also mandatory.

As a proponent of TTCN-3, Ericsson has a small team dedicated to developing tools to support users inside their company. That team has created a complete programming environment as an Eclipse plugin called Titan. That plugin includes API to access a TTCN-3 parser and some of its components, notably the Abstract Syntax Tree (AST).

A practical problem was to make our clone detection technology, CLAN (CLone ANalyzer) [10], able to extract data from Titan in order to perform clone detection in TTCN-3. Practical issues are discussed in Section 6.

Some of Ericsson developers suspected that cloning is one of the main idioms for test scripts reuse in the testing process for some systems. Management believes that the practice of cloning in test environments increases the cost of maintenance and also leads to possible inconsistencies between tests and evolving software versions.

In the long run, Ericsson hopes to reduce the effort in maintenance, design, and comprehension of their TTCN-3 test scripts, in order to improve the quality of the maintenance process of test suites.

As Ericsson had previous experience with clone detection technologies [11] and they believe reduction in cloning could benefit some of the aspects they seek to improve, they elected to do clone analysis on TTCN-3.

Before exploring ways to apply solutions, a rigorous verification by means of quantitative figures was required, in order to measure the extent of the existence of clones in TTCN-3.

In this paper, we make an industrial case study of the clones in test scripts written in the TTCN-3 language and compare them to clones in C/C++ and Java applications.

This study has the following goals:

  • To verify the existence of clones in TTCN-3 scripts;

  • To quantify to what extent clones exist in TTCN-3 scripts;

  • To categorize the clones according to their type;

  • To compute the distribution of clone types in TTCN-3 scripts;

  • To compare the distribution of the clones in TTCN-3 scripts with that of clones in C/C++ and Java systems.

To achieve these goals, we report distributions and statistics of the clones in 500 kLOC (Lines of Code) of TTCN-3 scripts. Moreover, we complement our analysis by a statistical comparison of the distribution of the clones in TTCN-3 with that of the clones previously identified in Ericsson’s C/C++ and Java code, as reported in [11]. This previous work gave a detailed description of the distribution of the clones in C/C++ and Java code in the same industrial setting. The numbers used for comparison are drawn directly from that published work.

The paper is organized as follows: Section 2 describes the Ericsson testing environment and process; Section 3 presents experiments and results; Section 4 discusses the results, lessons learned, and threats to validity; Section 5 presents related research; Section 6 describes practical issues and the adaptation of our clone detection technology to TTCN-3; Section 7 discusses further research objectives, while Section 8 concludes the paper.

Section snippets

Industrial testing context

We analyzed scripts from a legacy testing environment, which was only very slightly automated. Although other more sophisticated testing environments are used within Ericsson, scripts in this testing environment are written and modified by testers by hand. In a sense, the analyzed test scripts are legacy test code maintained by hand by developers.

In this context, a test suite is a subset of tests. Each test targets some aspect of one specific feature in a product. A test may be composed of

Hardware, software environment, and dataset

The system architecture for the experiment is based on a virtual machine with Ubuntu Server 11.10, 1 CPU, 8GB of RAM and 200 GB of hard drive disk space and relies on Apache/MySQL/PHP, Java 7, and CLAN.

Three sets of test scripts for different projects have been analyzed. Table 1 shows the size of each set. They are all of moderate size. Processing systems of such a size is achieved quickly, since the CLAN clustering algorithm execution is very fast compared to other tools [10]. The exact

Discussion and lessons learned

This experiment differs from previous work, because it analyzes TTCN-3 language, that is specific to test scripts. Observed distributions of TTCN-3 clones are similar to those observed in test clones written in Java [7].

Furthermore, this paper compares the similarity distribution of test clones to that of applications clones. The distributions are significantly different, in contrast with a previous study that investigated and compared application clones in scripting languages and in imperative

Related work

Clones are extensively investigated in the literature with hundreds of papers dedicated to their detection and application as reported in surveys of the field [4], [27]. In the following, we relate our work to previously published papers and explain the differences and similarities between this current study and the previous work. With respect to the currently published work analyzing test clones [7], we are the first to investigate industrial test clones written in TTCN-3 and compare these to

Practical considerations for a successful integration in industry

Technological transfer for large scale deployment has to conform to hardware and other practical restrictions, that often exist in industrial environments. A technology that is not flexible enough to cope with that, will simply never be adopted.

For large scale deployment, the available standard hardware configuration is very often the only option, unless an appropriate large scale investment and organizational effort is envisaged.

Although a discussion about practical restrictions is not a key

Further research

As mentioned before, clones belonging to a test that was not selected for the current test suite, do not need to be updated immediately. Therefore, it would be interesting to study the “late propagation” phenomenon [98] in test suites. It would be also interesting to track test clones, clone genealogies, and clone modifications in test suite evolution, in comparison with application clone evolution.

Further research is required to better understand the trade-off between the cost and effort

Conclusion

We reported results of an experiment on clone detection within industrial test suites in TTCN-3. Figures shown in Section 3 present different characteristics of the clone population in TTCN-3. We found that around 24% of fragments in the test suites are cloned, which is a very high proportion of clones compare to what is generally found in source code.

The distribution of clone types shows a high percentage of Type-1 clones (82.9%), a smaller number of Type-2 clones (15.3%), and few Type-3

Acknowledgments

This research was funded by Ericsson. We wish to thank Renaud Lepage, Mario Bonja, and Fanny Lalonde Lévesque for their contributions to this paper and the related discussions. We also wish to thank Kristof Szabados for his technical assistance and insights into the Titan environment. We also wish to thank the reviewers for their helpful comments on this paper.

References (100)

  • 2014....
  • 2014....
  • S. Bellon et al.

    Comparison and evaluation of clone detection tools

    IEEE Trans. Softw. Eng. - IEEE Comput. Soc. Press

    (2007)
  • E. Merlo et al.

    Large scale multi-language clone analysis in a telecommunication industrial setting

    Software Clones (IWSC), 2013 7th International Workshop on

    (2013)
  • K. Kontogiannis et al.

    Pattern matching for clone and concept detection

    J. Autom. Softw. Eng.

    (1996)
  • T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to algorithms, MIT Press,second...
  • S. Cha

    Comprehensive survey on distance/similarity measures between probability density functions

    Int. J. Math. Models Methods Appl. Sci.

    (2007)
  • P. Jaccard

    Nouvelles recherches sur la distribution florale

    Bulletin de la Société Vaudoise des Sciences Naturelles

    (1908)
  • C. Roy et al.

    An empirical study of function clones in open source software

    Reverse Engineering, 2008. WCRE ’08. 15th Working Conference on

    (2008)
  • 2016....
  • M. Hackerott et al.

    An hypothesis test technique for determining a difference in sampled parts defective utilizing fisher’s exact test ic production

    IEEE Trans. Semicond. Manuf.

    (1990)
  • C.K. Roy et al.

    Are scripting languages really different?

    Proceedings of the 4th International Workshop on Software Clones

    (2010)
  • N. Tsantalis et al.

    Assessing the refactorability of software clones

    IEEE Trans. Softw. Eng.

    (2015)
  • T. Mende et al.

    Supporting the grow-and-prune model in software product lines evolution using clone detection

    12th European Conference on Software Maintenance and Reengineering, CSMR 2008, April 1-4, 2008, Athens, Greece

    (2008)
  • R. Koschke et al.

    Extending the reflexion method for consolidating software variants into product lines

    Softw. Qual. J.

    (2009)
  • S. Easterbrook, J. Singer, M.-A. Storey, D. Damian, Guide to Advanced Empirical Software Engineering, Springer London,...
  • D.E. Perry et al.

    Empirical studies of software engineering: a roadmap

    Proceedings of the conference on The future of Software engineering

    (2000)
  • D.I. Sjøberg et al.

    A survey of controlled experiments in software engineering

    Softw. Eng., IEEE Trans.

    (2005)
  • Y. Dang et al.

    Code clone detection experience at microsoft

    Proceedings of the 5th International Workshop on Software Clones

    (2011)
  • R. Koschke

    Large-scale inter-system clone detection using suffix trees and hashing

    J. Softw.

    (2013)
  • I. Keivanloo

    Leveraging clone detection for internet-scale source code search

    Program Comprehension (ICPC), 2012 IEEE 20th International Conference on

    (2012)
  • Y. Dang et al.

    Xiao: tuning code clones at hands of engineers in practice

    Proceedings of the 28th Annual Computer Security Applications Conference

    (2012)
  • Y. Yamanaka et al.

    Industrial application of clone change management system

    Software Clones (IWSC), 2012 6th International Workshop on

    (2012)
  • E. Tuzun et al.

    A case study on applying clone technology to an industrial application framework

    Software Clones (IWSC), 2012 6th International Workshop on

    (2012)
  • R. Koschke

    Large-scale inter-system clone detection using suffix trees

    CSMR

    (2012)
  • J.R. Cordy et al.

    DebCheck: Efficient checking for open source code clones in software systems

    ICPC

    (2011)
  • W. Wang et al.

    Investigating intentional clone refactoring

    ECEASST

    (2014)
  • W. Wang et al.

    Recommending clones for refactoring using design, context, and history

    30th IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, September 29 - October 3, 2014

    (2014)
  • G. Krishnan et al.

    Unification and refactoring of clones

    Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on

    (2014)
  • T. Kamiya

    Agec: An execution-semantic clone detection tool

    Proc. ICPC

    (2013)
  • Y. Yuan et al.

    Boreas: an accurate and scalable token-based approach to code clone detection

    Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering

    (2012)
  • T. Lavoie et al.

    An accurate estimation of the levenshtein distance using metric trees and manhattan distance

    IWSC

    (2012)
  • H. Sajnani et al.

    A parallel and efficient approach to large scale clone detection

    IWSC

    (2013)
  • H. Murakami et al.

    Gapped code clone detection with lightweight source code analysis

    Proc. ICPC

    (2013)
  • T. Kamiya et al.

    CCFinder: a multilinguistic token-based code clone detection system for large scale source code

    IEEE Trans. Softw. Eng.

    (2002)
  • J.R. Cordy et al.

    The NiCad Clone Detector

    Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension

    (2011)
  • M.S. Uddin et al.

    Simcad: an extensible and faster clone detection tool for large scale software systems

    ICPC

    (2013)
  • B. Hummel et al.

    Index-based code clone detection: incremental, distributed, scalable

    Software Maintenance (ICSM), 2010 IEEE International Conference on

    (2010)
  • L. Jiang et al.

    Deckard: scalable and accurate tree-based detection of code clones

    Software Engineering, 2007. ICSE 2007. 29th International Conference on

    (2007)
  • R. Koschke et al.

    Clone detection using abstract syntax suffix trees

    Working Conference on Reverse Engineering

    (2006)
  • Cited by (0)

    View full text