An investigation of the fault-proneness of clone evolutionary patterns

Barbour, Liliane; An, Le; Khomh, Foutse; Zou, Ying; Wang, Shaohua

doi:10.1007/s11219-017-9375-5

An investigation of the fault-proneness of clone evolutionary patterns

Published: 13 June 2017

Volume 26, pages 1187–1222, (2018)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Liliane Barbour¹,
Le An²,
Foutse Khomh ORCID: orcid.org/0000-0002-5704-4173²,
Ying Zou¹ &
…
Shaohua Wang³

528 Accesses
7 Citations
Explore all metrics

Abstract

Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs in order to properly propagate any changes between clones. A clone pair may experience many changes during the creation and maintenance of a software system. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. When a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this paper, we examine clone genealogies to identify fault-prone “patterns” of states and changes. We explore the use of clone genealogy information in fault prediction. We conduct a quasi-experiment with four long-lived software systems (i.e., Apache Ant, ArgoUML, JEdit, Maven) and identify clones using the NiCad and iClones clone detection tools. Overall, we find that the size of the clone can impact the fault-proneness of a clone pair. However, there is no clear impact of the time interval between changes to a clone pair on the fault-proneness of the clone pair. We also discover that adding clone genealogy information can increase the explanatory power of fault prediction models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Is Late Propagation a Harmful Code Clone Evolutionary Pattern? An Empirical Study

A Summary on the Stability of Code Clones and Current Research Trends

Code clones and developer behavior: results of two surveys of the clone research community

Article 07 August 2015

Notes

https://github.com/swatlab/clone_genealogies

References

An, L., & Khomh, F. (2015). An empirical study of crash-inducing commits in mozilla firefox, Proceedings of the 11th international conference on predictive models and data analytics in software engineering (p. 5). ACM.
Google Scholar
Arisholm, E., & Briand, L. C. (2006). Predicting fault-prone components in a java legacy system, Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering (ISESE) (pp. 8–17). NY, USA: ACM.
Chapter Google Scholar
Aversano, L., Cerulo, L., & Di Penta, M. (2007). How clones are maintained: An empirical study, Proceedings of the 11th European conference on software maintenance and reengineering (pp. 81 –90).
Google Scholar
Barbour, L., Khomh, F., & Zou, Y. (2011). Late propagation in software clones, Proceedings of the 27th IEEE international conference on software maintenance (ICSM) (pp. 273 –282).
Google Scholar
Barbour, L., Khomh, F., & Zou, Y. (2013). An empirical study of faults in late propagation clone genealogies. Journal of Software: Evolution and Process, 25, 1139–1165.
Google Scholar
Bernstein, A., Ekanayake, J., & Pinzger, M. (2007). Improving defect prediction using temporal features and non linear models, 9th international workshop on principles of software evolution (IWPSE) (pp. 11–18). NY, USA: ACM.
Google Scholar
Briand, L. C., Daly, J. W., & Wüst, J. K. (1999). A unified framework for coupling measurement in object-oriented systems. IEEE Transaction Software Engineering, 25, 91–121.
Article Google Scholar
Cataldo, M., Mockus, A., Roberts, J. A., & Herbsleb, J. D. (2009). Software dependencies, work dependencies, and their impact on failures. IEEE Transaction Software Engineering, 35, 864–878.
Article Google Scholar
Chidamber, S. R., & Kemerer, C. F. (1994). A metrics suite for object oriented design. IEEE Transaction Software Engineering, 20, 476–493.
Article Google Scholar
Corley, C. S. (2016). whatthepatch - Python’s third party patch parsing library. Online; Accessed August 29th, 2016 https://pypi.python.org/pypi/whatthepatch.
Dmitrienko, A., Molenberghs, G., Chuang-Stein, C., & Offen, W. (2005). Analysis of clinical trials using SAS: a practical guide. SAS Institute.
El Emam, K., Melo, W., & Machado, J. C. (2001). The prediction of faulty classes using object-oriented design metrics. Journal of Systems and Software, 56, 63–75.
Article Google Scholar
Fischer, M., Pinzger, M., & Gall, H. (2003). Populating a release history database from version control and bug tracking systems, 2003. ICSM 2003. Proceedings of the international conference on software maintenance (pp. 23–32). IEEE.
Google Scholar
Fowler, M. (2009). Refactoring: improving the design of existing code. Pearson Education India.
Göde, N., & Harder, J. (2011). Clone stability, Proceedings of the 15th European conference on software maintenance and reengineering.
Google Scholar
Göde, N., & Koschke, R. (2011). Frequency and risks of changes to clones, Proceedings of the 33rd international conference on software engineering (ICSE), ACM (pp. 311–320).
Google Scholar
Göde, N., & Harder, J. (2011). Oops!... I changed it again, Proceedings of the 5th international workshop on software clones (pp. 14–20). ACM.
Google Scholar
Gode, N., & Koschke, R. (2009). Incremental clone detection. In 13th European conference on software maintenance and reengineering, 2009. CSMR’09 (pp. 219–228). IEEE.
Graves, T. L., Karr, A. F., Marron, J. S., & Siy, H. (2000). Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26, 653–661.
Article Google Scholar
Hassan, A. E. (2009). Predicting faults using the complexity of code changes, Proceedings of the 31st international conference on software engineering (ICSE).
Google Scholar
Harrell, F. E. (2013). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer Science and Business Media.
Juergens, E., Deissenboeck, F., Hummel, B., & Wagner, S. (2009). Do code clones matter?, Proceedings of the 31st international conference on software engineering (pp. 485–495). IEEE Computer Society.
Google Scholar
Kamiya, T., Kusumoto, S., & Inoue, K. (2002). Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28, 654–670.
Article Google Scholar
Kapser, C., & Godfrey, M. W. (2006). Cloning considered harmful considered harmful, Proceedings of the 13th working conference on reverse engineering (pp. 19–28). DC, USA: IEEE Computer Society.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., Goel, N., Nandi, A., & McMullan, J. (1996). Detection of software modules with high debug code churn in a very large legacy system, Proceedings of the 7th international symposium on software reliability engineering. ISSRE ’96 (pp. 364–371). DC, USA: IEEE Computer Society.
Chapter Google Scholar
Kim, M., Sazawal, V., Notkin, D., & Murphy, G. (2005). An empirical study of code clone genealogies, Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering. ESEC/FSE-13 (pp. 187–196). NY, USA: ACM.
Google Scholar
Krinke, J. (2007). A study of consistent and inconsistent changes to code clones. Proceedings of the 14th Working Conference on Reverse Engineering, 0, 170–178.
Article Google Scholar
Kononenko, O., Baysal, O., Guerrouj, L., Cao, Y., & Godfrey, M. W. (2015). Investigating code review quality: Do people and participation matter? 2015 IEEE international conference on software maintenance and evolution (ICSME) (pp. 111–120). IEEE.
Google Scholar
Kutner, M., Nachtsheim, C., & Neter, J. (2004). Applied linear regression models. 4th International Edition McGraw-Hill/Irwin.
Lakhotia, A., Li, J., Walenstein, A., & Yang, Y. (2003). Towards a clone detection benchmark suite and results archive, 2003. 11th IEEE international workshop on program comprehension (pp. 285– 286).
Chapter Google Scholar
McIntosh, S., Kamei, Y., Adams, B., & Hassan, A. E. (2015). An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering. To appear.
Mondal, M., Roy, C. K., & Schneider, K. A. (2016). A comparative study on the intensity and harmfulness of late propagation in near-miss code clones. Software Quality Journal (pp. 1–33).
Article Google Scholar
Moser, R., Pedrycz, W., & Succi, G. (2008). A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, Proceedings of the international conference on software engineering (pp. 181–190). NY, USA: ACM.
Google Scholar
Nagappan, N., & Ball, T. (2005). Use of relative code churn measures to predict system defect density, Proceedings of the 27th international conference on software engineering (ICSE) (pp. 284–292). NY, USA: ACM.
Google Scholar
Nagappan, N., Ball, T., & Zeller, A. (2006). Mining metrics to predict component failures. In Proceedings of the 28th international conference on software engineering (ICSE) (pp. 452–461). NY, USA: ACM.
Rahman, F., Bird, C., & Devanbu, P. (2012). Clones: What is that smell? Empirical Software Engineering, 17, 503–530.
Article Google Scholar
Rogerson, P. A. (2010). Statistical methods for geography: a student’s guide. Sage Publications.
Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. School of Computing TR 2007-541, Queen’s University, 115.
Roy, C., & Cordy, J. (2008). Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization, 2008. ICPC 2008. The 16th IEEE international conference on program comprehension (pp. 172 –181).
Chapter Google Scholar
Sheskin, D. (2007). Handbook of parametric and nonparametric statistical procedures, 4th edn. : Chapman & All.
Svajlenko, J., & Roy, C. K. (2014). Evaluating modern clone detection tools, Proceedings 30th IEEE international conference on software maintenance and evolution (ICSME), IEEE (pp. 321–330).
Google Scholar
Śliwerski, J., Zimmermann, T., & Zeller, A. (2005). When do changes induce fixes? ACM sigsoft software engineering notes (Vol. 30, pp. 1–5). ACM.
Google Scholar
Thummalapenta, S., Cerulo, L., Aversano, L., & Di Penta, M. (2010). An empirical study on the maintenance of source code clones. Empirical Software Engineering, 15, 1–34.
Article Google Scholar
Wheeler, D. A. (2016). SLOCCount. http://www.dwheeler.com/sloccount/ (2016) Online.
Wikipedia (2017). C-family programming languages. https://en.wikipedia.org/wiki/List_of_C-family_programmin_languages.
Xie, S., Khomh, F., & Zou, Y. (2013). An empirical study of the fault-proneness of clone mutation and clone migration, Proceedings of the 10th working conference on mining software repositories. MSR ’13 (pp. 149–158). Piscataway, NJ, USA: IEEE Press.
Chapter Google Scholar
Xie, S., Khomh, F., Zou, Y., & Keivanloo, I. (2014). An empirical study on the fault-proneness of clone migration in clone genealogies, Proceedings of 2014 software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE) (pp. 94–103). Reengineering and Reverse Engineering (CSMR-WCRE).
Yin, R. K. (2002). Case study research: Design and methods, 3rd edn. : SAGE Publications.
Zimmermann, T., Premraj, R., & Zeller, A. (2007). Predicting defects for eclipse, Third international workshop on predictor models in software engineering.
Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their detailed feedback and useful suggestions that greatly contributed to improving this paper. This work has been partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Queen’s University, ON, Canada
Liliane Barbour & Ying Zou
SWAT, École Polytechnique de Montréal, QC, Canada
Le An & Foutse Khomh
School of Computing, Queen’s University, ON, Canada
Shaohua Wang

Authors

Liliane Barbour
View author publications
You can also search for this author in PubMed Google Scholar
Le An
View author publications
You can also search for this author in PubMed Google Scholar
Foutse Khomh
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zou
View author publications
You can also search for this author in PubMed Google Scholar
Shaohua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Foutse Khomh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barbour, L., An, L., Khomh, F. et al. An investigation of the fault-proneness of clone evolutionary patterns. Software Qual J 26, 1187–1222 (2018). https://doi.org/10.1007/s11219-017-9375-5

Download citation

Published: 13 June 2017
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11219-017-9375-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An investigation of the fault-proneness of clone evolutionary patterns

Abstract

Access this article

Similar content being viewed by others

Is Late Propagation a Harmful Code Clone Evolutionary Pattern? An Empirical Study

A Summary on the Stability of Code Clones and Current Research Trends

Code clones and developer behavior: results of two surveys of the clone research community

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An investigation of the fault-proneness of clone evolutionary patterns

Abstract

Access this article

Similar content being viewed by others

Is Late Propagation a Harmful Code Clone Evolutionary Pattern? An Empirical Study

A Summary on the Stability of Code Clones and Current Research Trends

Code clones and developer behavior: results of two surveys of the clone research community

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation