An extended assessment of type-3 clones as detected by state-of-the-art tools

Tiarks, Rebecca; Koschke, Rainer; Falke, Raimar

doi:10.1007/s11219-010-9115-6

An extended assessment of type-3 clones as detected by state-of-the-art tools

Published: 16 November 2010

Volume 19, pages 295–331, (2011)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Rebecca Tiarks¹,
Rainer Koschke¹ &
Raimar Falke¹

360 Accesses
Explore all metrics

Abstract

Code reuse through copying and pasting leads to so-called software clones. These clones can be roughly categorized into identical fragments (type-1 clones), fragments with parameter substitution (type-2 clones), and similar fragments that differ through modified, deleted, or added statements (type-3 clones). Although there has been extensive research on detecting clones, detection of type-3 clones is still an open research issue due to the inherent vagueness in their definition. In this paper, we analyze type-3 clones detected by state-of-the-art tools and investigate type-3 clones in terms of their syntactic differences. Then, we derive their underlying semantic abstractions from their syntactic differences. Finally, we investigate whether there are code characteristics that indicate that a tool-suggested clone candidate is a real type-3 clone from a human’s perspective. Our findings can help developers of clone detectors and clone refactoring tools to improve their tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Code Clone, Its Behavior and Applications

NiCad: A Modern Clone Detector

Various Code Clone Detection Techniques and Tools: A Comprehensive Survey

Notes

The semantic information is not needed here.

References

Baker, B. S. (1995). On finding duplication and near-duplication in large software systems. In L. Wills, P. Newcomb, & E. Chikofsky (Eds.), Proceedings of WCRE (pp. 86–95).
Balazinska, M., Merlo, E. M., Dagenais, M., Lague, B., & Kontogiannis, K. (1999). Measuring clone based reengineering opportunities. In IEEE symposium on software metrics (pp. 292–303). IEEE Computer Society Press.
Balazinska, M., Merlo, E., Dagenais, M., Lague, B., & Kontogiannis, K. (2000). Advanced clone-analysis to support object-oriented system refactoring. In WCRE (pp. 98–107). IEEE Computer Society Press.
Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In T. M. Koshgoftaar & K. Bennett (Eds.), ICSM, (pp. 368–378).
Bellon, S., Koschke, R., Antoniol, G., Krinke, J., & Merlo, E. (2007). Comparison and evaluation of clone detection tools. IEEE Computer Society Transactions on Software Engineering, 33, 577–591.
Google Scholar
Chen, X., Kwong, S., & Li, M. (2000) A compression algorithm for dna sequences and its applications in genome comparison. In RECOMB ’00: Proceedings of the fourth annual international conference on computational molecular biology (p. 107). New York, NY, USA: ACM. doi:10.1145/332306.332352.
Chen, X., Francia, B., Li, M., Mckinnon, B., & Seker, A. (2004). Shared information and program plagiarism detection. Transactions on Information Theory, 50(7), 1545–1551. doi:10.1109/TIT.2004.830793.
Article MathSciNet Google Scholar
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.
Article MATH MathSciNet Google Scholar
Ducasse, S., Rieger, M., & Demeyer, S. (1999). A language independent approach for detecting duplicated code. In ICSM ’99: Proceedings of the IEEE international conference on software maintenance (p. 109). Washington, DC, USA: IEEE Computer Society.
Evans, W. S., Fraser, C. W., & Ma, F. (2007). Clone detection via structural abstraction. In WCRE (pp. 150–159).
Falke, R., Koschke, R., & Frenzel, P. (2008). Empirical evaluation of clone detection using syntax suffix trees. Empirical Software Engineering, 13(6), 601–643. doi:10.1007/s10664-008-9073-9.
Article Google Scholar
Frenzel, P., Koschke, R., Breu, A. P. J., & Angstmann, K. (2007). Extending the reflection method for consolidating software variants into product lines. In WCRE (pp. 160–169). IEEE Computer Society Press.
Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2004). Aries: Refactoring support environment based on code clone analysis. In IASTED Conference on software engineering and applications (pp. 222–229).
Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2007). Method and implementation for investigating code clones in a software system. Information and Software Technology, 49(9–10), 985–998.
Article Google Scholar
Jia, Y., Binkley, D., Harman, M., Krinke, J., & Matsushita, M. (2009) Kclone: A proposed approach to fast precise code clone detection. In Proceedings of CSMR’09 (pp. 12–16).
Kamiya, T., Kusumoto, S., & Inoue, K. (2002). CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Computer Society Transactions on Software Engineering, 28(7), 654–670.
Article Google Scholar
Kapser, C., Anderson, P., Godfrey, M., Koschke, R., Rieger, M., van Rysselberghe, F., & Weißgerber, P. (2007). Subjectivity in clone judgment: Can we ever agree? In Duplication, redundancy, and similarity in software, dagstuhl seminar proceedings, No. 06301.
Kapser, C. J., & Godfrey, M. W. (2003a) A taxonomy of clones in source code: The re-engineers most wanted list. In Proceedings of IWDSC’03.
Kapser, C. J., & Godfrey, M. W. (2003b) Toward a taxonomy of clones in source code: A case study. In Evolution of large scale industrial software architectures (pp. 67–78).
Kapser, C. J., & Godfrey, M. W. (2006). Supporting the analysis of clones in software systems: Research articles. Journal of Software Maintenance and Evolution, 18(2), 61–82.
Article Google Scholar
Koschke, R. (2007). Survey of research on software clones. In R. Koschke, E. Merlo, & A. Walenstein (Eds.), Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings.
Koschke, R. (2008a). Frontiers in software clone management. In Proceedings of the international conference on software maintenance.
Koschke, R. (2008b). Identifying and removing software clones, chap. 2 (pp. 15–39). Berlin: Springer.
Google Scholar
Koschke, R., Girard, J. F., Würthner, M. (1998). Intermediate representations for reverse engineering. In WCRE (pp. 241–250). IEEE Computer Society Press.
Koschke, R., Frenzel, P., Breu, A. P., & Angstmann, K. (2009). Extending the reflexion method for consolidating software variants into product lines. Software Quality Journal, 17(4), 331–366.
Article Google Scholar
Krinke, J. (2001). Identifying similar code with program dependence graphs. In WCRE (pp. 301–309).
Li, M., Chen, X., Li, X., Ma, B., & Vitányi, P. M. B. (2004). The similarity metric. Transactions on Information Theory, 50(12), 3250–3264.
Article MathSciNet Google Scholar
Mayrand, J., Leblanc, C., & Merlo, E. (1996). Experiment on the automatic detection of function clones in a software system using metrics. In ICSM (p. 244). IEEE Computer Society.
Mende, T., Beckwermert, F., Koschke, R., & Meier, G. (2008). Supporting the grow-and-prune model in software product lines evolution using clone detection. In European Conference on Software Maintenance and Reengineering (pp. 163–172). IEEE Computer Society Press.
Mende, T., Koschke, R., & Beckwermert, F. (2009). An evaluation of code similarity identification for the grow-and-prune model. Journal of Software Maintenance and Evolution: Research and Practice, 21(2), 143–169.
Article Google Scholar
Nevill-Manning, C. G., & Witten, I. H. (1997). Linear-time, incremental hierarchy inference for compression. In DCC (pp. 3–11). Washington, DC, USA: IEEE Computer Society.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Google Scholar
Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. Technical report no. 2007-541. Ontario, Canada: School of Computing, Queen’s University at Kingston.
Roy, C. K., Cordy, J. R., & Koschke, R. (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Journal of Science of Computer Programming doi:10.1016/j.scico.2009.02.007, accepted for publication.
Selkow, S. M. (1977). The tree-to-tree editing problem. Information Processing Letters, 6(6), 184–186. doi:10.1016/0020-0190(77)90064-3.
Article MATH MathSciNet Google Scholar
Shasha, D., & Zhang, K. (1989). Fast parallel algorithms for the unit cost editing distance between trees. In SPAA ’89: Proceedings of the first annual ACM symposium on parallel algorithms and architectures (pp. 117–126). New York, NY, USA: ACM. doi:10.1145/72935.72949.
Smith, R., & Horwitz, S. (2009). Detecting and measuring similarity in code clones.
Tai, K. C. (1979). The tree-to-tree correction problem. J ACM, 26(3), 422–433. doi:10.1145/322139.322143.
Article MATH MathSciNet Google Scholar
Tiarks, R., Koschke, R., & Falke, R. (2009). An assessment of type-3 clones as detected by state-of-the-art tools. In Workshop source code analysis and manipulation (pp. 67–76). IEEE Computer Society Press.
Valiente, G. (2002). Algorithms on trees and graphs, 1st Ed.. New York: Springer.
MATH Google Scholar
Walenstein, A. (2007). Code clones: Reconsidering terminology. In Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, No. 06301.
Walenstein, A., Jyoti, N., Li, J., Yang, Y., & Lakhotia, A. (2003). Problems creating task-relevant clone detection reference data. In WCRE. IEEE Computer Society Press.
Walenstein, A., El-Ramly, M., Cordy, J. R., S W, Mahdavi, K., Pizka, M., Ramalingam, G., & von Gudenberg, J. W. (2007a). Similarity in programs. In Duplication, redundancy, and similarity in software.
Walenstein, A., Venable, M., Hayes, M., Thompson, C., & Lakhotia, A. (2007b) Exploiting similarity between variants to defeat malware. In Proceedings of BlackHat 2007 DC Briefings.
Zhang, K. (1995). Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition, 28(3), 463–474. doi:10.1016/0031-3203(94)00109-Y.
Article Google Scholar
Zhang, K., & Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Scientific Computing, 18(6), 1245–1262. doi:10.1137/0218082.
MATH MathSciNet Google Scholar
Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. Transactions on Information Theory, 23(3), 337–343. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1055714.

Download references

Acknowledgments

We want to thank Pierre Frenzel for sharing his validated clones with us. Furthermore, we want to thank our industrial partner for giving us the opportunity to analyze industrial code of a software product line and the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

University of Bremen, 28359, Bremen, Germany
Rebecca Tiarks, Rainer Koschke & Raimar Falke

Authors

Rebecca Tiarks
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Koschke
View author publications
You can also search for this author in PubMed Google Scholar
Raimar Falke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rainer Koschke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tiarks, R., Koschke, R. & Falke, R. An extended assessment of type-3 clones as detected by state-of-the-art tools. Software Qual J 19, 295–331 (2011). https://doi.org/10.1007/s11219-010-9115-6

Download citation

Published: 16 November 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11219-010-9115-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extended assessment of type-3 clones as detected by state-of-the-art tools

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Code Clone, Its Behavior and Applications

NiCad: A Modern Clone Detector

Various Code Clone Detection Techniques and Tools: A Comprehensive Survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now