Abstract
Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs in order to properly propagate any changes between clones. A clone pair may experience many changes during the creation and maintenance of a software system. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. When a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this paper, we examine clone genealogies to identify fault-prone “patterns” of states and changes. We explore the use of clone genealogy information in fault prediction. We conduct a quasi-experiment with four long-lived software systems (i.e., Apache Ant, ArgoUML, JEdit, Maven) and identify clones using the NiCad and iClones clone detection tools. Overall, we find that the size of the clone can impact the fault-proneness of a clone pair. However, there is no clear impact of the time interval between changes to a clone pair on the fault-proneness of the clone pair. We also discover that adding clone genealogy information can increase the explanatory power of fault prediction models.
Similar content being viewed by others
References
An, L., & Khomh, F. (2015). An empirical study of crash-inducing commits in mozilla firefox, Proceedings of the 11th international conference on predictive models and data analytics in software engineering (p. 5). ACM.
Arisholm, E., & Briand, L. C. (2006). Predicting fault-prone components in a java legacy system, Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering (ISESE) (pp. 8–17). NY, USA: ACM.
Aversano, L., Cerulo, L., & Di Penta, M. (2007). How clones are maintained: An empirical study, Proceedings of the 11th European conference on software maintenance and reengineering (pp. 81 –90).
Barbour, L., Khomh, F., & Zou, Y. (2011). Late propagation in software clones, Proceedings of the 27th IEEE international conference on software maintenance (ICSM) (pp. 273 –282).
Barbour, L., Khomh, F., & Zou, Y. (2013). An empirical study of faults in late propagation clone genealogies. Journal of Software: Evolution and Process, 25, 1139–1165.
Bernstein, A., Ekanayake, J., & Pinzger, M. (2007). Improving defect prediction using temporal features and non linear models, 9th international workshop on principles of software evolution (IWPSE) (pp. 11–18). NY, USA: ACM.
Briand, L. C., Daly, J. W., & Wüst, J. K. (1999). A unified framework for coupling measurement in object-oriented systems. IEEE Transaction Software Engineering, 25, 91–121.
Cataldo, M., Mockus, A., Roberts, J. A., & Herbsleb, J. D. (2009). Software dependencies, work dependencies, and their impact on failures. IEEE Transaction Software Engineering, 35, 864–878.
Chidamber, S. R., & Kemerer, C. F. (1994). A metrics suite for object oriented design. IEEE Transaction Software Engineering, 20, 476–493.
Corley, C. S. (2016). whatthepatch - Python’s third party patch parsing library. Online; Accessed August 29th, 2016 https://pypi.python.org/pypi/whatthepatch.
Dmitrienko, A., Molenberghs, G., Chuang-Stein, C., & Offen, W. (2005). Analysis of clinical trials using SAS: a practical guide. SAS Institute.
El Emam, K., Melo, W., & Machado, J. C. (2001). The prediction of faulty classes using object-oriented design metrics. Journal of Systems and Software, 56, 63–75.
Fischer, M., Pinzger, M., & Gall, H. (2003). Populating a release history database from version control and bug tracking systems, 2003. ICSM 2003. Proceedings of the international conference on software maintenance (pp. 23–32). IEEE.
Fowler, M. (2009). Refactoring: improving the design of existing code. Pearson Education India.
Göde, N., & Harder, J. (2011). Clone stability, Proceedings of the 15th European conference on software maintenance and reengineering.
Göde, N., & Koschke, R. (2011). Frequency and risks of changes to clones, Proceedings of the 33rd international conference on software engineering (ICSE), ACM (pp. 311–320).
Göde, N., & Harder, J. (2011). Oops!... I changed it again, Proceedings of the 5th international workshop on software clones (pp. 14–20). ACM.
Gode, N., & Koschke, R. (2009). Incremental clone detection. In 13th European conference on software maintenance and reengineering, 2009. CSMR’09 (pp. 219–228). IEEE.
Graves, T. L., Karr, A. F., Marron, J. S., & Siy, H. (2000). Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26, 653–661.
Hassan, A. E. (2009). Predicting faults using the complexity of code changes, Proceedings of the 31st international conference on software engineering (ICSE).
Harrell, F. E. (2013). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer Science and Business Media.
Juergens, E., Deissenboeck, F., Hummel, B., & Wagner, S. (2009). Do code clones matter?, Proceedings of the 31st international conference on software engineering (pp. 485–495). IEEE Computer Society.
Kamiya, T., Kusumoto, S., & Inoue, K. (2002). Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28, 654–670.
Kapser, C., & Godfrey, M. W. (2006). Cloning considered harmful considered harmful, Proceedings of the 13th working conference on reverse engineering (pp. 19–28). DC, USA: IEEE Computer Society.
Khoshgoftaar, T. M., Allen, E. B., Goel, N., Nandi, A., & McMullan, J. (1996). Detection of software modules with high debug code churn in a very large legacy system, Proceedings of the 7th international symposium on software reliability engineering. ISSRE ’96 (pp. 364–371). DC, USA: IEEE Computer Society.
Kim, M., Sazawal, V., Notkin, D., & Murphy, G. (2005). An empirical study of code clone genealogies, Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering. ESEC/FSE-13 (pp. 187–196). NY, USA: ACM.
Krinke, J. (2007). A study of consistent and inconsistent changes to code clones. Proceedings of the 14th Working Conference on Reverse Engineering, 0, 170–178.
Kononenko, O., Baysal, O., Guerrouj, L., Cao, Y., & Godfrey, M. W. (2015). Investigating code review quality: Do people and participation matter? 2015 IEEE international conference on software maintenance and evolution (ICSME) (pp. 111–120). IEEE.
Kutner, M., Nachtsheim, C., & Neter, J. (2004). Applied linear regression models. 4th International Edition McGraw-Hill/Irwin.
Lakhotia, A., Li, J., Walenstein, A., & Yang, Y. (2003). Towards a clone detection benchmark suite and results archive, 2003. 11th IEEE international workshop on program comprehension (pp. 285– 286).
McIntosh, S., Kamei, Y., Adams, B., & Hassan, A. E. (2015). An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering. To appear.
Mondal, M., Roy, C. K., & Schneider, K. A. (2016). A comparative study on the intensity and harmfulness of late propagation in near-miss code clones. Software Quality Journal (pp. 1–33).
Moser, R., Pedrycz, W., & Succi, G. (2008). A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, Proceedings of the international conference on software engineering (pp. 181–190). NY, USA: ACM.
Nagappan, N., & Ball, T. (2005). Use of relative code churn measures to predict system defect density, Proceedings of the 27th international conference on software engineering (ICSE) (pp. 284–292). NY, USA: ACM.
Nagappan, N., Ball, T., & Zeller, A. (2006). Mining metrics to predict component failures. In Proceedings of the 28th international conference on software engineering (ICSE) (pp. 452–461). NY, USA: ACM.
Rahman, F., Bird, C., & Devanbu, P. (2012). Clones: What is that smell? Empirical Software Engineering, 17, 503–530.
Rogerson, P. A. (2010). Statistical methods for geography: a student’s guide. Sage Publications.
Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. School of Computing TR 2007-541, Queen’s University, 115.
Roy, C., & Cordy, J. (2008). Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization, 2008. ICPC 2008. The 16th IEEE international conference on program comprehension (pp. 172 –181).
Sheskin, D. (2007). Handbook of parametric and nonparametric statistical procedures, 4th edn. : Chapman & All.
Svajlenko, J., & Roy, C. K. (2014). Evaluating modern clone detection tools, Proceedings 30th IEEE international conference on software maintenance and evolution (ICSME), IEEE (pp. 321–330).
Śliwerski, J., Zimmermann, T., & Zeller, A. (2005). When do changes induce fixes? ACM sigsoft software engineering notes (Vol. 30, pp. 1–5). ACM.
Thummalapenta, S., Cerulo, L., Aversano, L., & Di Penta, M. (2010). An empirical study on the maintenance of source code clones. Empirical Software Engineering, 15, 1–34.
Wheeler, D. A. (2016). SLOCCount. http://www.dwheeler.com/sloccount/ (2016) Online.
Wikipedia (2017). C-family programming languages. https://en.wikipedia.org/wiki/List_of_C-family_programmin_languages.
Xie, S., Khomh, F., & Zou, Y. (2013). An empirical study of the fault-proneness of clone mutation and clone migration, Proceedings of the 10th working conference on mining software repositories. MSR ’13 (pp. 149–158). Piscataway, NJ, USA: IEEE Press.
Xie, S., Khomh, F., Zou, Y., & Keivanloo, I. (2014). An empirical study on the fault-proneness of clone migration in clone genealogies, Proceedings of 2014 software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE) (pp. 94–103). Reengineering and Reverse Engineering (CSMR-WCRE).
Yin, R. K. (2002). Case study research: Design and methods, 3rd edn. : SAGE Publications.
Zimmermann, T., Premraj, R., & Zeller, A. (2007). Predicting defects for eclipse, Third international workshop on predictor models in software engineering.
Acknowledgements
The authors would like to thank the anonymous reviewers for their detailed feedback and useful suggestions that greatly contributed to improving this paper. This work has been partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barbour, L., An, L., Khomh, F. et al. An investigation of the fault-proneness of clone evolutionary patterns. Software Qual J 26, 1187–1222 (2018). https://doi.org/10.1007/s11219-017-9375-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-017-9375-5