research-article

LCCSS: A Similarity Metric for Identifying Similar Test Code

Authors:
Lucas Pereira da Silva

Federal University of Santa Catarina

Federal University of Santa Catarina
View Profile

,
Patrícia Vilain

Federal University of Santa Catarina

Federal University of Santa Catarina
View Profile

SBCARS '20: Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and ReuseOctober 2020Pages 91–100https://doi.org/10.1145/3425269.3425283

Published:30 October 2020Publication History

SBCARS '20: Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse

Pages 91–100

ABSTRACT

Test code maintainability is a common concern in software testing. In order to achieve good maintainability, test methods should be clearly structured, well named, small in size, and, mainly, test code duplication should be avoided. Several strategies exist to avoid test code duplication, such as implicit setup and delegated setup. However, prior to applying these strategies, first it is necessary to identify the duplicate code, which can be a time-consuming task. To address this problem, we automate the identification of duplicate test code through the application of code similarity metrics. We propose a novel similarity metric, called Longest Common Contiguous Start Sub-Sequence (LCCSS), to identify refactoring candidates. LCCSS is a metric used to measure similarity between pairs of tests. The most similar pairs are reported as strong candidates to be refactored through the implicit setup strategy. We also develop a framework, called Róża, that can use different similarity metrics to identify test code duplication. An experiment shows that LCCSS and Simian, a clone detection tool, have both identified pairs of tests to be refactored through the implicit setup strategy with maximum precision in all the eleven standard recall levels. But, unlike Simian, LCCSS does not need to be calibrated for each project.

References

Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. 1999. Modern information retrieval. Vol. 463. ACM press New York. Google ScholarDigital Library
I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. 1998. Clone detection using abstract syntax trees. In Proceedings. International Conference on Software Maintenance. 368--377. 10.1109/ICSM.1998.738528 Google ScholarDigital Library
Kent Beck. 2003. Test-driven development: by example. Addison-Wesley Professional. Google ScholarDigital Library
Stefan Berner, Roland Weber, and Rudolf K. Keller. 2005. Observations and Lessons Learned from Automated Testing. In Proceedings of the 27th International Conference on Software Engineering (ICSE '05). ACM, New York, NY, USA, 571--579. 10.1145/1062455.1062556 Google ScholarDigital Library
C. A. Christensen, S. Gundersborg, K. D. Linde, and K. Torp. 2006. A Unit-Test Framework for Database Applications. In 2006 10th International Database Engineering and Applications Symposium (IDEAS'06). 11--20. 10.1109/IDEAS.2006.7 Google ScholarDigital Library
Vašek Chvátal, David A Klarner, and Donald Ervin Knuth. 1972. Selected combinatorial research problems. Computer Science Department, Stanford University.Google Scholar
J. R. Cordy and C. K. Roy. 2011. The NiCad Clone Detector. In 2011 IEEE 19th International Conference on Program Comprehension. 219--220. Google ScholarDigital Library
L. P. da Silva and P. Vilain. 2016. Execution and code reuse between test classes. In 2016 IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA). 99--106. 10.1109/SERA.2016.7516134Google Scholar
S. R. Dalal, A. Jain, N. Karunanithi, J. M. Leaton, C. M. Lott, G. C. Patton, and B. M. Horowitz. 1999. Model-based testing in practice. In Proceedings of the 1999 International Conference on Software Engineering. 285--294. 10.1145/302405.302640 Google ScholarDigital Library
Brett Daniel, Qingzhou Luo, Mehdi Mirzaaghaei, Danny Dig, Darko Marinov, and Mauro Pezzè. 2011. Automated GUI Refactoring and Test Script Repair. In Proceedings of the First International Workshop on End-to-End Test Script Engineering (ETSE '11). Association for Computing Machinery, New York, NY, USA, 38--41. 10.1145/2002931.2002937 Google ScholarDigital Library
John L Donaldson, Ann-Marie Lancaster, and Paula H Sposato. 1981. A plagiarism detection system. In ACM SIGCSE Bulletin, Vol. 13. ACM, 21--25. Issue 1. Google ScholarDigital Library
S. Ducasse, M. Rieger, and S. Demeyer. 1999. A language independent approach for detecting duplicated code. In Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 109--118. 10.1109/ICSM.1999.792593 Google ScholarDigital Library
M. Erfani, I. Keivanloo, and J. Rilling. 2013. Opportunities for Clone Detection in Test Case Recommendation. In 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops. 65--70. Google ScholarDigital Library
Martin Fowler. 2018. Refactoring: improving the design of existing code. Addison-Wesley Professional.Google Scholar
Steve Freeman and Nat Pryce. 2009. Growing Object-Oriented Software, Guided by Tests. Addison-Wesley. Google ScholarDigital Library
M. Greiler, A. van Deursen, and M. Storey. 2013. Automated Detection of Test Fixture Strategies and Smells. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 322--331. 10.1109/ICST.2013.45 Google ScholarDigital Library
Michaela Greiler, Andy Zaidman, Arie Van Deursen, and Margaret-Anne Storey. 2013. Strategies for avoiding text fixture smells during software evolution. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 387--396. Google ScholarDigital Library
Eduardo Martins Guerra and Clovis Torres Fernandes. 2007. Refactoring test code safely. In International Conference on Software Engineering Advances (ICSEA 2007). IEEE, 44--44. Google ScholarDigital Library
R. W. Hamming. 1950. Error detecting and error correcting codes. The Bell System Technical Journal 29, 2 (April 1950), 147--160. 10.1002/j.1538-7305.1950.tb00463.xGoogle ScholarCross Ref
M. Jean Harrold, Rajiv Gupta, and Mary Lou Soffa. 1993. A Methodology for Controlling the Size of a Test Suite. ACM Trans. Softw. Eng. Methodol. 2, 3 (July 1993), 270--285. 10.1145/152388.152391 Google ScholarDigital Library
Hadi Hemmati, Andrea Arcuri, and Lionel Briand. 2010. Reducing the Cost of Model-Based Testing through Test Case Diversity. In Testing Software and Systems. Springer Berlin Heidelberg, Berlin, Heidelberg, 63--78. Google ScholarDigital Library
H. Hemmati and L. Briand. 2010. An Industrial Investigation of Similarity Measures for Model-Based Test Case Selection. In 2010 IEEE 21st International Symposium on Software Reliability Engineering. 141--150. 10.1109/ISSRE.2010.9 Google ScholarDigital Library
Hadi Hemmati, Lionel Briand, Andrea Arcuri, and Shaukat Ali. 2010. An Enhanced Test Case Selection Approach for Model-based Testing: An Industrial Case Study. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE '10). ACM, New York, NY, USA, 267--276. 10.1145/1882291.1882331 Google ScholarDigital Library
R. Holmes and G. C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. 117--125. 10.1109/ICSE.2005.1553554 Google ScholarDigital Library
T. Kamiya, S. Kusumoto, and K. Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (July 2002), 654--670. 10.1109/TSE.2002.1019480 Google ScholarDigital Library
Yoshio Kataoka, David Notkin, Michael D. Ernst, and William G. Griswold. 2001. Automated Support for Program Refactoring Using Invariants. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM '01) (ICSM '01). IEEE Computer Society, Washington, DC, USA, 736-. 10.1109/ICSM.2001.972794 Google ScholarDigital Library
J. Krinke. 2001. Identifying similar code with program dependence graphs. In Proceedings Eighth Working Conference on Reverse Engineering. 301--309. 10.1109/WCRE.2001.957835 Google ScholarDigital Library
M. Landhäußer and W. F. Tichy. 2012. Automated test-case generation by cloning. In 2012 7th International Workshop on Automation of Software Test (AST). 83--88. 10.1109/IWAST.2012.6228995 Google ScholarDigital Library
Yves Ledru, Alexandre Petrenko, Sergiy Boroday, and Nadine Mandran. 2012. Prioritizing test cases with string distances. Automated Software Engineering 19, 1 (Mar 2012), 65--95. 10.1007/s10515-011-0093-0 Google ScholarDigital Library
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710. Issue 8.Google Scholar
Douglas Hiura Longo, Beatriz Wilges, Patrícia Vilain, and Renato Cislaghi. 2015. Fixture Setup through Object Notation for Implicit Test Fixtures. Journal of Computer Science 11, 6 (2015), 794.Google ScholarCross Ref
Udi Manber et al. 1994. Finding Similar Files in a Large File System.. In Usenix Winter, Vol. 94. 1--10. Google ScholarDigital Library
Gerard Meszaros. 2007. xUnit test patterns: Refactoring test code. Pearson Education. Google ScholarDigital Library
Breno Miranda, Emilio Cruciani, Roberto Verdecchia, and Antonia Bertolino. 2018. FAST Approaches to Scalable Similarity-based Test Case Prioritization. In Proceedings of the 40th International Conference on Software Engineering (ICSE '18). ACM, New York, NY, USA, 222--232. 10.1145/3180155.3180210 Google ScholarDigital Library
Iman Hemati Moghadam and Mel Ó Cinnéide. 2011. Code-Imp: A Tool for Automated Search-based Refactoring. In Proceedings of the 4th Workshop on Refactoring Tools (WRT '11). ACM, New York, NY, USA, 41--44. 10.1145/1984732.1984742 Google ScholarDigital Library
Karl J Ottenstein. 1976. An algorithmic approach to the detection and prevention of plagiarism. ACM Sigcse Bulletin 8, 4(1976), 30--41. Google ScholarDigital Library
Chaiyong Ragkhitwetsagul, Jens Krinke, and David Clark. 2018. A comparison of code similarity analysers. Empirical Software Engineering 23, 4 (Aug 2018), 2464--2519. 10.1007/s10664-017-9564-7 Google ScholarDigital Library
C. K. Roy and J. R. Cordy. 2008. NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization. In 2008 16th IEEE International Conference on Program Comprehension. 172--181. 10.1109/ICPC.2008.41 Google ScholarDigital Library
Rajeev Tiwari and Noopur Goel. 2013. Reuse: Reducing Test Effort. SIGSOFT Softw. Eng. Notes 38, 2 (March 2013), 1--11. 10.1145/2439976.2439982 Google ScholarDigital Library
W. T. Tsai, A. Saimi, L. Yu, and R. Paul. 2003. Scenario-based object-oriented testing framework. In Third International Conference on Quality Software, 2003. Proceedings. 410--417. 10.1109/QSIC.2003.1319129 Google ScholarDigital Library
Arie Van Deursen, Leon Moonen, Alex Van Den Bergh, and Gerard Kok. 2001. Refactoring test code. In Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP). 92--95.Google Scholar
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media. Google ScholarDigital Library
W. E. Wong, J. R. Horgan, S. London, and H. Agrawal. 1997. A study of effective regression testing in practice. In Proceedings The Eighth International Symposium on Software Reliability Engineering. 264--274. 10.1109/ISSRE.1997.630875 Google ScholarDigital Library
Jifeng Xuan, Benoit Cornu, Matias Martinez, Benoit Baudry, Lionel Seinturier, and Martin Monperrus. 2016. B-Refactoring: Automatic test code refactoring to improve dynamic analysis. Information and Software Technology 76 (2016), 65--80. Google ScholarDigital Library

Index Terms

LCCSS: A Similarity Metric for Identifying Similar Test Code
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Similarity measures
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Looking for More Confidence in Refactoring? How to Assess Adequacy of Your Refactoring Tests
QSIC '08: Proceedings of the 2008 The Eighth International Conference on Quality Software

Refactoring is an important technique in today's software development practice. If applied correctly, it can significantly improve software design without altering behavior. During refactoring, developers rely on regression testing. However, without ...
Read More
A test driven approach for aspectualizing legacy software using mock systems

Aspect-based refactoring, called aspectualization, involves moving program code that implements cross-cutting concerns into aspects. Such refactoring can improve the maintainability of legacy systems. Long compilation and weave times, and the lack of an ...
Read More
Test coverage and impact analysis for detecting refactoring faults: a study on the extract method refactoring
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Refactoring validation by automated testing is a common practice in agile development processes. However, this practice can be misleading when the test suite is not adequate. Particularly, refactoring faults can be tricky and difficult to detect. While ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SBCARS '20: Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse
October 2020
172 pages
ISBN:9781450387545
DOI:10.1145/3425269
General Chairs:
Everton Cavalcante
UFRN, Brazil
,
Francisco Dantas
UERN, Brazil
,
Thais Batista
UFRN, Brazil
,
Program Chair:
Gustavo Pinto
UFPA, Brazil
Copyright © 2020 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
implicit setup
measure
metric
refactoring
similarity
testing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate23of79submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LCCSS: A Similarity Metric for Identifying Similar Test Code

SBCARS '20: Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse

ABSTRACT

References

Cited By

Index Terms

Recommendations

Looking for More Confidence in Refactoring? How to Assess Adequacy of Your Refactoring Tests

A test driven approach for aspectualizing legacy software using mock systems

Test coverage and impact analysis for detecting refactoring faults: a study on the extract method refactoring