skip to main content
10.1145/2975961.2975964acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Automatic prediction of bug fixing effort measured by code churn size

Published:03 September 2016Publication History

ABSTRACT

During software maintenance, developers often receive many bug reports. Project managers often need to manage limited resources to resolve the many bugs that a project receives. To help project managers perform their job, past studies have proposed techniques that predict the amount of time that passes between a bug report being submitted and it being resolved. However, this time period might not be representative of the actual development effort, as developers might not work on the bug right away or all the time. In the open source development setting, developers are only volunteers and might not devote their full working hours to fix a bug in a particular open source project. In the industrial setting, developers might be asked to perform various tasks aside from fixing a particular bug.

In this work, we estimate bug fixing effort in terms of code churn size. Code churn size is the number of lines of code that is either added, deleted, or modified to fix the bug. Lines of code has traditionally been used to estimate effort. However, no past studies have proposed techniques to automatically predict code churn size. In this work, using code churn size as estimation for bug fixing effort, we propose a classification-based approach that predicts, given a bug report, whether the bug fixing effort would be high or low. We have evaluated our approach on 1,029 bug reports from hadoop-common and struts2. The result is promising; we can achieve an Area Under the Receiver Operating Curve (AUC) of 0.612 to predict bug fixing effort in terms of lines of code churned, which is a 22.4% improvement over a baseline.

References

  1. G. Antoniol, K. Ayari, M. D. Penta, F. Khomh, and Y.-G. Guéhéneuc. Is it a bug or an enhancement?: a text-based approach to classify change requests. In CASCON, page 23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In ICSE, pages 361–370. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bhattacharya and I. Neamtiu. Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In ICSM, pages 1–10. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. F. Bissyandé, F. Thung, S. Wang, D. Lo, L. Jiang, and L. Réveillère. Empirical evaluation of bug linking. In CSMR, pages 89–98, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Boehm, C. Abts, A. Brown, S. Chulani, B. Clark, E. Horowitz, R. Madachy, D. Reifer, and B. Steece. Software Cost Estimation with Cocomo II. Prentice Hall, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. ˇ Cubrani´ c. Automatic bug triage using text categorization. In SEKE. Citeseer, 2004.Google ScholarGoogle Scholar
  7. E. Giger, M. Pinzger, and H. Gall. Predicting the fix time of bugs. In RSSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann, 2nd edition, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Heemstra. Software cost estimation. Information and Software Technology, 34:627–639, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  10. K. Herzig, S. Just, and A. Zeller. It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In ICSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Hooimeijer and W. Weimer. Modeling bug report quality. In ASE, pages 34–43, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Hosseini, R. Nguyen, and M. W. Godfrey. A market-based bug allocation mechanism using predictive bug lifetimes. In CSMR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Jeong, S. Kim, and T. Zimmermann. Improving bug triage with bug tossing graphs. In ESEC/FSE, pages 111–120. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals. Predicting the severity of a reported bug. In MSR, pages 1–10, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals. Predicting the severity of a reported bug. In MSR, pages 1–10. IEEE, 2010.Google ScholarGoogle Scholar
  16. A. Lamkanfi, S. Demeyer, Q. D. Soetens, and T. Verdonck. Comparing mining algorithms for predicting the severity of a reported bug. In CSMR, pages 249–258. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T.-D. B. Le, S. Wang, and D. Lo. Multi-abstraction concern localization. In ICSM, pages 364–367, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Software Eng., 34(4):485–496, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. X. Ling, J. Huang, and H. Zhang. AUC: A better measure than accuracy in comparing learning algorithms. In Canadian Conference on AI, pages 329–341, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. K. Lukins, N. A. Kraft, and L. H. Etzkorn. Bug localization using latent dirichlet allocation. Information and Software Technology, 52(9):972–990, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Menzies and A. Marcus. Automated severity assessment of software defect reports. In ICSM, pages 346–355. IEEE, 2008.Google ScholarGoogle Scholar
  23. L. D. Panjer. Predicting eclipse bug lifetimes. In MSR, page 29, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Porter. An algorithm for suffix stripping. Program, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Rao and A. Kak. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In MSR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Romano and M. Pinzger. Using source code metrics to predict change-prone java interfaces. In ICSM, pages 303–312, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. Improving bug localization using structured information retrieval. In ASE, pages 345–355, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Tamrawi, T. T. Nguyen, J. M. Al-Kofahi, and T. N. Nguyen. Fuzzy set and cache-based approach for bug triaging. In ESEC/FSE, pages 365–375. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Zhang, L. Gong, and S. Versteeg. Predicting bug-fixing time: an empirical study of commercial software projects. In ICSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In ICSE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic prediction of bug fixing effort measured by code churn size

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SoftwareMining 2016: Proceedings of the 5th International Workshop on Software Mining
        September 2016
        42 pages
        ISBN:9781450345118
        DOI:10.1145/2975961

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 September 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader