Skip to main content
Log in

An extended assessment of type-3 clones as detected by state-of-the-art tools

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Code reuse through copying and pasting leads to so-called software clones. These clones can be roughly categorized into identical fragments (type-1 clones), fragments with parameter substitution (type-2 clones), and similar fragments that differ through modified, deleted, or added statements (type-3 clones). Although there has been extensive research on detecting clones, detection of type-3 clones is still an open research issue due to the inherent vagueness in their definition. In this paper, we analyze type-3 clones detected by state-of-the-art tools and investigate type-3 clones in terms of their syntactic differences. Then, we derive their underlying semantic abstractions from their syntactic differences. Finally, we investigate whether there are code characteristics that indicate that a tool-suggested clone candidate is a real type-3 clone from a human’s perspective. Our findings can help developers of clone detectors and clone refactoring tools to improve their tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The semantic information is not needed here.

References

  • Baker, B. S. (1995). On finding duplication and near-duplication in large software systems. In L. Wills, P. Newcomb, & E. Chikofsky (Eds.), Proceedings of WCRE (pp. 86–95).

  • Balazinska, M., Merlo, E. M., Dagenais, M., Lague, B., & Kontogiannis, K. (1999). Measuring clone based reengineering opportunities. In IEEE symposium on software metrics (pp. 292–303). IEEE Computer Society Press.

  • Balazinska, M., Merlo, E., Dagenais, M., Lague, B., & Kontogiannis, K. (2000). Advanced clone-analysis to support object-oriented system refactoring. In WCRE (pp. 98–107). IEEE Computer Society Press.

  • Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In T. M. Koshgoftaar & K. Bennett (Eds.), ICSM, (pp. 368–378).

  • Bellon, S., Koschke, R., Antoniol, G., Krinke, J., & Merlo, E. (2007). Comparison and evaluation of clone detection tools. IEEE Computer Society Transactions on Software Engineering, 33, 577–591.

    Google Scholar 

  • Chen, X., Kwong, S., & Li, M. (2000) A compression algorithm for dna sequences and its applications in genome comparison. In RECOMB ’00: Proceedings of the fourth annual international conference on computational molecular biology (p. 107). New York, NY, USA: ACM. doi:10.1145/332306.332352.

  • Chen, X., Francia, B., Li, M., Mckinnon, B., & Seker, A. (2004). Shared information and program plagiarism detection. Transactions on Information Theory, 50(7), 1545–1551. doi:10.1109/TIT.2004.830793.

    Article  MathSciNet  Google Scholar 

  • Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.

    Article  MATH  MathSciNet  Google Scholar 

  • Ducasse, S., Rieger, M., & Demeyer, S. (1999). A language independent approach for detecting duplicated code. In ICSM ’99: Proceedings of the IEEE international conference on software maintenance (p. 109). Washington, DC, USA: IEEE Computer Society.

  • Evans, W. S., Fraser, C. W., & Ma, F. (2007). Clone detection via structural abstraction. In WCRE (pp. 150–159).

  • Falke, R., Koschke, R., & Frenzel, P. (2008). Empirical evaluation of clone detection using syntax suffix trees. Empirical Software Engineering, 13(6), 601–643. doi:10.1007/s10664-008-9073-9.

    Article  Google Scholar 

  • Frenzel, P., Koschke, R., Breu, A. P. J., & Angstmann, K. (2007). Extending the reflection method for consolidating software variants into product lines. In WCRE (pp. 160–169). IEEE Computer Society Press.

  • Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2004). Aries: Refactoring support environment based on code clone analysis. In IASTED Conference on software engineering and applications (pp. 222–229).

  • Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2007). Method and implementation for investigating code clones in a software system. Information and Software Technology, 49(9–10), 985–998.

    Article  Google Scholar 

  • Jia, Y., Binkley, D., Harman, M., Krinke, J., & Matsushita, M. (2009) Kclone: A proposed approach to fast precise code clone detection. In Proceedings of CSMR’09 (pp. 12–16).

  • Kamiya, T., Kusumoto, S., & Inoue, K. (2002). CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Computer Society Transactions on Software Engineering, 28(7), 654–670.

    Article  Google Scholar 

  • Kapser, C., Anderson, P., Godfrey, M., Koschke, R., Rieger, M., van Rysselberghe, F., & Weißgerber, P. (2007). Subjectivity in clone judgment: Can we ever agree? In Duplication, redundancy, and similarity in software, dagstuhl seminar proceedings, No. 06301.

  • Kapser, C. J., & Godfrey, M. W. (2003a) A taxonomy of clones in source code: The re-engineers most wanted list. In Proceedings of IWDSC’03.

  • Kapser, C. J., & Godfrey, M. W. (2003b) Toward a taxonomy of clones in source code: A case study. In Evolution of large scale industrial software architectures (pp. 67–78).

  • Kapser, C. J., & Godfrey, M. W. (2006). Supporting the analysis of clones in software systems: Research articles. Journal of Software Maintenance and Evolution, 18(2), 61–82.

    Article  Google Scholar 

  • Koschke, R. (2007). Survey of research on software clones. In R. Koschke, E. Merlo, & A. Walenstein (Eds.), Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings.

  • Koschke, R. (2008a). Frontiers in software clone management. In Proceedings of the international conference on software maintenance.

  • Koschke, R. (2008b). Identifying and removing software clones, chap. 2 (pp. 15–39). Berlin: Springer.

    Google Scholar 

  • Koschke, R., Girard, J. F., Würthner, M. (1998). Intermediate representations for reverse engineering. In WCRE (pp. 241–250). IEEE Computer Society Press.

  • Koschke, R., Frenzel, P., Breu, A. P., & Angstmann, K. (2009). Extending the reflexion method for consolidating software variants into product lines. Software Quality Journal, 17(4), 331–366.

    Article  Google Scholar 

  • Krinke, J. (2001). Identifying similar code with program dependence graphs. In WCRE (pp. 301–309).

  • Li, M., Chen, X., Li, X., Ma, B., & Vitányi, P. M. B. (2004). The similarity metric. Transactions on Information Theory, 50(12), 3250–3264.

    Article  MathSciNet  Google Scholar 

  • Mayrand, J., Leblanc, C., & Merlo, E. (1996). Experiment on the automatic detection of function clones in a software system using metrics. In ICSM (p. 244). IEEE Computer Society.

  • Mende, T., Beckwermert, F., Koschke, R., & Meier, G. (2008). Supporting the grow-and-prune model in software product lines evolution using clone detection. In European Conference on Software Maintenance and Reengineering (pp. 163–172). IEEE Computer Society Press.

  • Mende, T., Koschke, R., & Beckwermert, F. (2009). An evaluation of code similarity identification for the grow-and-prune model. Journal of Software Maintenance and Evolution: Research and Practice, 21(2), 143–169.

    Article  Google Scholar 

  • Nevill-Manning, C. G., & Witten, I. H. (1997). Linear-time, incremental hierarchy inference for compression. In DCC (pp. 3–11). Washington, DC, USA: IEEE Computer Society.

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  • Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. Technical report no. 2007-541. Ontario, Canada: School of Computing, Queen’s University at Kingston.

  • Roy, C. K., Cordy, J. R., & Koschke, R. (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Journal of Science of Computer Programming doi:10.1016/j.scico.2009.02.007, accepted for publication.

  • Selkow, S. M. (1977). The tree-to-tree editing problem. Information Processing Letters, 6(6), 184–186. doi:10.1016/0020-0190(77)90064-3.

    Article  MATH  MathSciNet  Google Scholar 

  • Shasha, D., & Zhang, K. (1989). Fast parallel algorithms for the unit cost editing distance between trees. In SPAA ’89: Proceedings of the first annual ACM symposium on parallel algorithms and architectures (pp. 117–126). New York, NY, USA: ACM. doi:10.1145/72935.72949.

  • Smith, R., & Horwitz, S. (2009). Detecting and measuring similarity in code clones.

  • Tai, K. C. (1979). The tree-to-tree correction problem. J ACM, 26(3), 422–433. doi:10.1145/322139.322143.

    Article  MATH  MathSciNet  Google Scholar 

  • Tiarks, R., Koschke, R., & Falke, R. (2009). An assessment of type-3 clones as detected by state-of-the-art tools. In Workshop source code analysis and manipulation (pp. 67–76). IEEE Computer Society Press.

  • Valiente, G. (2002). Algorithms on trees and graphs, 1st Ed.. New York: Springer.

    MATH  Google Scholar 

  • Walenstein, A. (2007). Code clones: Reconsidering terminology. In Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, No. 06301.

  • Walenstein, A., Jyoti, N., Li, J., Yang, Y., & Lakhotia, A. (2003). Problems creating task-relevant clone detection reference data. In WCRE. IEEE Computer Society Press.

  • Walenstein, A., El-Ramly, M., Cordy, J. R., S W, Mahdavi, K., Pizka, M., Ramalingam, G., & von Gudenberg, J. W. (2007a). Similarity in programs. In Duplication, redundancy, and similarity in software.

  • Walenstein, A., Venable, M., Hayes, M., Thompson, C., & Lakhotia, A. (2007b) Exploiting similarity between variants to defeat malware. In Proceedings of BlackHat 2007 DC Briefings.

  • Zhang, K. (1995). Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition, 28(3), 463–474. doi:10.1016/0031-3203(94)00109-Y.

    Article  Google Scholar 

  • Zhang, K., & Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Scientific Computing, 18(6), 1245–1262. doi:10.1137/0218082.

    MATH  MathSciNet  Google Scholar 

  • Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. Transactions on Information Theory, 23(3), 337–343. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1055714.

Download references

Acknowledgments

We want to thank Pierre Frenzel for sharing his validated clones with us. Furthermore, we want to thank our industrial partner for giving us the opportunity to analyze industrial code of a software product line and the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rainer Koschke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tiarks, R., Koschke, R. & Falke, R. An extended assessment of type-3 clones as detected by state-of-the-art tools. Software Qual J 19, 295–331 (2011). https://doi.org/10.1007/s11219-010-9115-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-010-9115-6

Keywords

Navigation