Skip to main content

Predicting and Monitoring Bug-Proneness at the Feature Level

  • Conference paper
  • First Online:
Dependable Software Engineering. Theories, Tools, and Applications (SETTA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 13071))

  • 461 Accesses

Abstract

Enabling quick feature modification and delivery is important for a project’s success. Obtaining early estimates of software features’ bug-proneness is helpful for effectively allocating resources to the bug-prone features requiring further fixes. Researchers have proposed various studies on bug prediction at different granularity levels, such as class level, package level, method level, etc. However, there exists little work building predictive models at the feature level. In this paper, we investigated how to predict bug-prone features and monitor their evolution. More specifically, we first identified a project’s features and their involved files. Next, we collected a suite of code metrics and selected a relevant set of metrics as attributes to be used for six machine learning algorithms to predict bug-prone features. Through our evaluation, we have presented that using the machine learning algorithms with an appropriate set of code metrics, we can build effective models of bug prediction at the feature level. Furthermore, we build regression models to monitor growth trends of bug-prone features, which shows how these features accumulate bug-proneness over time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.atlassian.com/software/jira.

  2. 2.

    https://scitools.com/.

  3. 3.

    In this paper, a file means a source file which contains one or more classes. A feature often contains multiple files.

  4. 4.

    https://www.cs.waikato.ac.nz/~ml/weka/.

  5. 5.

    http://activemq.apache.org/.

  6. 6.

    http://camel.apache.org/.

  7. 7.

    http://cassandra.apache.org/.

  8. 8.

    http://hibernate.org/.

  9. 9.

    https://hive.apache.org/.

  10. 10.

    https://wicket.apache.org/.

References

  1. Antoniol, G., Ayari, K., Penta, M.D., Khomh, F., Gueheneuc, Y.-G.: Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, pp. 23:304–23:318 (2008)

    Google Scholar 

  2. Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)

    Article  Google Scholar 

  3. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)

    Article  MathSciNet  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  5. Cataldo, M., Mockus, A., Roberts, J.A., Herbsleb, J.D.: Software dependencies, work dependencies, and their impact on failures. IEEE Trans. Softw. Eng. 35(6), 864–878 (2009)

    Article  Google Scholar 

  6. Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)

    Article  Google Scholar 

  7. Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding, pp. 22–29 (1992)

    Google Scholar 

  8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006)

    Article  Google Scholar 

  10. de Carvalho, A.B., Pozo, A., Vergilio, S.R.: A symbolic fault-prediction model based on multiobjective particle swarm optimization. J. Syst. Softw. 83(5), 868–882 (2010)

    Article  Google Scholar 

  11. Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Maint. Evol.: Res. Pract. 25, 53–95 (2011)

    Article  Google Scholar 

  12. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  13. Gao, K., Khoshgoftaar, T.M., Napolitano, A.: Combining feature subset selection and data sampling for coping with highly imbalanced software data. Int. J. Softw. Eng. Knowl. Eng. 115–146 (2015)

    Google Scholar 

  14. Giger, E., D’Ambros, M., Pinzger, M., Gall, H.C.: Method-level bug prediction. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2012, pp. 171–180 (2012)

    Google Scholar 

  15. Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)

    Article  Google Scholar 

  16. Hair, J.F., Ringle, C.M., Sarstedt, M.: PLS-SEM: indeed a silver bullet. J. Mark. Theory Pract. 19(2), 139–151 (2011)

    Article  Google Scholar 

  17. Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)

    Google Scholar 

  18. Hata, H., Mizuno, O., Kikuno, T.: Bug prediction based on fine-grained module histories. In: Proceedings of the 34th International Conference on Software Engineering, pp. 200–210 (2012)

    Google Scholar 

  19. Haykin, S.: Neural Networks: A Comprehensive Foundation. 2nd edn. (2004)

    Google Scholar 

  20. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  21. Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 392–401 (2013)

    Google Scholar 

  22. Joseph, J., Hair, F., Hult, G.T.M., Ringle, C., Sarstedt, M.: A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). Sage, Thousand Oak (2013)

    MATH  Google Scholar 

  23. Kamei, Y., et al.: A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39(6), 757–773 (2013)

    Article  Google Scholar 

  24. Kim, S., Zimmermann, T., James Whitehead, J.E., Zeller, A.: Predicting faults from cached history. In: Proceedings of 29thInternational Conference on Software Engineering, pp. 489–498 (2007)

    Google Scholar 

  25. Koru, A.G., Zhang, D., Emam, K.E., Liu, H.: An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans. Softw. Eng. 35(2), 293–304 (2009)

    Article  Google Scholar 

  26. Lewis, D.D.: Naive (bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666

    Chapter  Google Scholar 

  27. Malhotra, R., Khanna, M.: An empirical study for software change prediction using imbalanced data. Empir. Softw. Eng. 22(6), 2806–2851 (2017). https://doi.org/10.1007/s10664-016-9488-7

    Article  Google Scholar 

  28. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)

    Article  MathSciNet  Google Scholar 

  29. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)

    Article  Google Scholar 

  30. Mishra, B., Engg, C., Shukla, K.: Defect prediction for object oriented software using support vector based fuzzy classification model. Int. J. Comput. Appl. (2012)

    Google Scholar 

  31. Mo, R., Cai, Y., Kazman, R., Feng, Q.: Assessing an architecture’s ability to support feature evolution. In: Proceedings of the 26th Conference on Program Comprehension, pp. 297–307 (2018)

    Google Scholar 

  32. Mockus, A., Weiss, D.M.: Predicting risk of software changes. Bell Labs Tech. J. 5(2), 169–180 (2000)

    Article  Google Scholar 

  33. Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of 28th International Conference on Software Engineering, pp. 452–461 (2006)

    Google Scholar 

  34. Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31(4), 340–355 (2005)

    Article  Google Scholar 

  35. Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans. Softw. Eng. 33(10), 675–686 (2007)

    Article  Google Scholar 

  36. Schröter, A., Zimmermann, T., Zeller, A.: Predicting component failures at design time. In: Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering, ISESE 2006, pp. 18–27 (2006)

    Google Scholar 

  37. Shatnawi, R.: Improving software fault-prediction for imbalanced data. In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 54–59 (2012)

    Google Scholar 

  38. Sliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories, pp. 1–5 (2005)

    Google Scholar 

  39. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)

    Article  Google Scholar 

  40. Stone, M.: Cross-validation choice and assessment of statistical predictions. J. Roy. Stat. Soc. 36, 111–133 (1974)

    MATH  Google Scholar 

  41. Syer, M.D., Nagappan, M., Adams, B., Hassan, A.E.: Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans. Softw. Eng. 41(2), 176–197 (2015)

    Article  Google Scholar 

  42. Wan, Z., Xia, X., Hassan, A.E., Lo, D., Yin, J., Yang, X.: Perceptions, expectations, and challenges in defect prediction. IEEE Trans. Softw. Eng. 46, 1241–1266 (2018)

    Article  Google Scholar 

  43. Yan, M., Fang, Y., Lo, D., Xia, X., Zhang, X.: File-level defect prediction: unsupervised vs. supervised models. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 344–353 (2017)

    Google Scholar 

  44. Yang, Y., et al.: An empirical study on dependence clusters for effort-aware fault-proneness prediction. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pp. 296–307 (2016)

    Google Scholar 

  45. Yang, Y., et al.: Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 157–168 (2016)

    Google Scholar 

  46. Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE 2009, pp. 91–100 (2009)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under the grant No. 62002129, the Hubei Provincial Natural Science Foundation of China under the grant No. 2020CFB473, and the Fundamental Research Funds for the Central Universities under the grant No. CCNU19TD003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ran Mo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, S., Mo, R., Xiong, P., Zhang, S., Zhao, Y., Li, Z. (2021). Predicting and Monitoring Bug-Proneness at the Feature Level. In: Qin, S., Woodcock, J., Zhang, W. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2021. Lecture Notes in Computer Science(), vol 13071. Springer, Cham. https://doi.org/10.1007/978-3-030-91265-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91265-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91264-2

  • Online ISBN: 978-3-030-91265-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics