Skip to main content
Log in

An investigation of the fault-proneness of clone evolutionary patterns

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs in order to properly propagate any changes between clones. A clone pair may experience many changes during the creation and maintenance of a software system. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. When a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this paper, we examine clone genealogies to identify fault-prone “patterns” of states and changes. We explore the use of clone genealogy information in fault prediction. We conduct a quasi-experiment with four long-lived software systems (i.e., Apache Ant, ArgoUML, JEdit, Maven) and identify clones using the NiCad and iClones clone detection tools. Overall, we find that the size of the clone can impact the fault-proneness of a clone pair. However, there is no clear impact of the time interval between changes to a clone pair on the fault-proneness of the clone pair. We also discover that adding clone genealogy information can increase the explanatory power of fault prediction models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://github.com/swatlab/clone_genealogies

References

  • An, L., & Khomh, F. (2015). An empirical study of crash-inducing commits in mozilla firefox, Proceedings of the 11th international conference on predictive models and data analytics in software engineering (p. 5). ACM.

    Google Scholar 

  • Arisholm, E., & Briand, L. C. (2006). Predicting fault-prone components in a java legacy system, Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering (ISESE) (pp. 8–17). NY, USA: ACM.

    Chapter  Google Scholar 

  • Aversano, L., Cerulo, L., & Di Penta, M. (2007). How clones are maintained: An empirical study, Proceedings of the 11th European conference on software maintenance and reengineering (pp. 81 –90).

    Google Scholar 

  • Barbour, L., Khomh, F., & Zou, Y. (2011). Late propagation in software clones, Proceedings of the 27th IEEE international conference on software maintenance (ICSM) (pp. 273 –282).

    Google Scholar 

  • Barbour, L., Khomh, F., & Zou, Y. (2013). An empirical study of faults in late propagation clone genealogies. Journal of Software: Evolution and Process, 25, 1139–1165.

    Google Scholar 

  • Bernstein, A., Ekanayake, J., & Pinzger, M. (2007). Improving defect prediction using temporal features and non linear models, 9th international workshop on principles of software evolution (IWPSE) (pp. 11–18). NY, USA: ACM.

    Google Scholar 

  • Briand, L. C., Daly, J. W., & Wüst, J. K. (1999). A unified framework for coupling measurement in object-oriented systems. IEEE Transaction Software Engineering, 25, 91–121.

    Article  Google Scholar 

  • Cataldo, M., Mockus, A., Roberts, J. A., & Herbsleb, J. D. (2009). Software dependencies, work dependencies, and their impact on failures. IEEE Transaction Software Engineering, 35, 864–878.

    Article  Google Scholar 

  • Chidamber, S. R., & Kemerer, C. F. (1994). A metrics suite for object oriented design. IEEE Transaction Software Engineering, 20, 476–493.

    Article  Google Scholar 

  • Corley, C. S. (2016). whatthepatch - Python’s third party patch parsing library. Online; Accessed August 29th, 2016 https://pypi.python.org/pypi/whatthepatch.

  • Dmitrienko, A., Molenberghs, G., Chuang-Stein, C., & Offen, W. (2005). Analysis of clinical trials using SAS: a practical guide. SAS Institute.

  • El Emam, K., Melo, W., & Machado, J. C. (2001). The prediction of faulty classes using object-oriented design metrics. Journal of Systems and Software, 56, 63–75.

    Article  Google Scholar 

  • Fischer, M., Pinzger, M., & Gall, H. (2003). Populating a release history database from version control and bug tracking systems, 2003. ICSM 2003. Proceedings of the international conference on software maintenance (pp. 23–32). IEEE.

    Google Scholar 

  • Fowler, M. (2009). Refactoring: improving the design of existing code. Pearson Education India.

  • Göde, N., & Harder, J. (2011). Clone stability, Proceedings of the 15th European conference on software maintenance and reengineering.

    Google Scholar 

  • Göde, N., & Koschke, R. (2011). Frequency and risks of changes to clones, Proceedings of the 33rd international conference on software engineering (ICSE), ACM (pp. 311–320).

    Google Scholar 

  • Göde, N., & Harder, J. (2011). Oops!... I changed it again, Proceedings of the 5th international workshop on software clones (pp. 14–20). ACM.

    Google Scholar 

  • Gode, N., & Koschke, R. (2009). Incremental clone detection. In 13th European conference on software maintenance and reengineering, 2009. CSMR’09 (pp. 219–228). IEEE.

  • Graves, T. L., Karr, A. F., Marron, J. S., & Siy, H. (2000). Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26, 653–661.

    Article  Google Scholar 

  • Hassan, A. E. (2009). Predicting faults using the complexity of code changes, Proceedings of the 31st international conference on software engineering (ICSE).

    Google Scholar 

  • Harrell, F. E. (2013). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer Science and Business Media.

  • Juergens, E., Deissenboeck, F., Hummel, B., & Wagner, S. (2009). Do code clones matter?, Proceedings of the 31st international conference on software engineering (pp. 485–495). IEEE Computer Society.

    Google Scholar 

  • Kamiya, T., Kusumoto, S., & Inoue, K. (2002). Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28, 654–670.

    Article  Google Scholar 

  • Kapser, C., & Godfrey, M. W. (2006). Cloning considered harmful considered harmful, Proceedings of the 13th working conference on reverse engineering (pp. 19–28). DC, USA: IEEE Computer Society.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., Goel, N., Nandi, A., & McMullan, J. (1996). Detection of software modules with high debug code churn in a very large legacy system, Proceedings of the 7th international symposium on software reliability engineering. ISSRE ’96 (pp. 364–371). DC, USA: IEEE Computer Society.

    Chapter  Google Scholar 

  • Kim, M., Sazawal, V., Notkin, D., & Murphy, G. (2005). An empirical study of code clone genealogies, Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering. ESEC/FSE-13 (pp. 187–196). NY, USA: ACM.

    Google Scholar 

  • Krinke, J. (2007). A study of consistent and inconsistent changes to code clones. Proceedings of the 14th Working Conference on Reverse Engineering, 0, 170–178.

    Article  Google Scholar 

  • Kononenko, O., Baysal, O., Guerrouj, L., Cao, Y., & Godfrey, M. W. (2015). Investigating code review quality: Do people and participation matter? 2015 IEEE international conference on software maintenance and evolution (ICSME) (pp. 111–120). IEEE.

    Google Scholar 

  • Kutner, M., Nachtsheim, C., & Neter, J. (2004). Applied linear regression models. 4th International Edition McGraw-Hill/Irwin.

  • Lakhotia, A., Li, J., Walenstein, A., & Yang, Y. (2003). Towards a clone detection benchmark suite and results archive, 2003. 11th IEEE international workshop on program comprehension (pp. 285– 286).

    Chapter  Google Scholar 

  • McIntosh, S., Kamei, Y., Adams, B., & Hassan, A. E. (2015). An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering. To appear.

  • Mondal, M., Roy, C. K., & Schneider, K. A. (2016). A comparative study on the intensity and harmfulness of late propagation in near-miss code clones. Software Quality Journal (pp. 1–33).

    Article  Google Scholar 

  • Moser, R., Pedrycz, W., & Succi, G. (2008). A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, Proceedings of the international conference on software engineering (pp. 181–190). NY, USA: ACM.

    Google Scholar 

  • Nagappan, N., & Ball, T. (2005). Use of relative code churn measures to predict system defect density, Proceedings of the 27th international conference on software engineering (ICSE) (pp. 284–292). NY, USA: ACM.

    Google Scholar 

  • Nagappan, N., Ball, T., & Zeller, A. (2006). Mining metrics to predict component failures. In Proceedings of the 28th international conference on software engineering (ICSE) (pp. 452–461). NY, USA: ACM.

  • Rahman, F., Bird, C., & Devanbu, P. (2012). Clones: What is that smell? Empirical Software Engineering, 17, 503–530.

    Article  Google Scholar 

  • Rogerson, P. A. (2010). Statistical methods for geography: a student’s guide. Sage Publications.

  • Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. School of Computing TR 2007-541, Queen’s University, 115.

  • Roy, C., & Cordy, J. (2008). Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization, 2008. ICPC 2008. The 16th IEEE international conference on program comprehension (pp. 172 –181).

    Chapter  Google Scholar 

  • Sheskin, D. (2007). Handbook of parametric and nonparametric statistical procedures, 4th edn. : Chapman & All.

  • Svajlenko, J., & Roy, C. K. (2014). Evaluating modern clone detection tools, Proceedings 30th IEEE international conference on software maintenance and evolution (ICSME), IEEE (pp. 321–330).

    Google Scholar 

  • Śliwerski, J., Zimmermann, T., & Zeller, A. (2005). When do changes induce fixes? ACM sigsoft software engineering notes (Vol. 30, pp. 1–5). ACM.

    Google Scholar 

  • Thummalapenta, S., Cerulo, L., Aversano, L., & Di Penta, M. (2010). An empirical study on the maintenance of source code clones. Empirical Software Engineering, 15, 1–34.

    Article  Google Scholar 

  • Wheeler, D. A. (2016). SLOCCount. http://www.dwheeler.com/sloccount/ (2016) Online.

  • Wikipedia (2017). C-family programming languages. https://en.wikipedia.org/wiki/List_of_C-family_programmin_languages.

  • Xie, S., Khomh, F., & Zou, Y. (2013). An empirical study of the fault-proneness of clone mutation and clone migration, Proceedings of the 10th working conference on mining software repositories. MSR ’13 (pp. 149–158). Piscataway, NJ, USA: IEEE Press.

    Chapter  Google Scholar 

  • Xie, S., Khomh, F., Zou, Y., & Keivanloo, I. (2014). An empirical study on the fault-proneness of clone migration in clone genealogies, Proceedings of 2014 software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE) (pp. 94–103). Reengineering and Reverse Engineering (CSMR-WCRE).

  • Yin, R. K. (2002). Case study research: Design and methods, 3rd edn. : SAGE Publications.

  • Zimmermann, T., Premraj, R., & Zeller, A. (2007). Predicting defects for eclipse, Third international workshop on predictor models in software engineering.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their detailed feedback and useful suggestions that greatly contributed to improving this paper. This work has been partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Foutse Khomh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barbour, L., An, L., Khomh, F. et al. An investigation of the fault-proneness of clone evolutionary patterns. Software Qual J 26, 1187–1222 (2018). https://doi.org/10.1007/s11219-017-9375-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-017-9375-5

Keywords

Navigation