Predicting and Monitoring Bug-Proneness at the Feature Level

Wei, Shaozhi; Mo, Ran; Xiong, Pu; Zhang, Siyuan; Zhao, Yang; Li, Zengyang

doi:10.1007/978-3-030-91265-9_11

Shaozhi Wei¹¹,
Ran Mo¹¹,
Pu Xiong¹¹,
Siyuan Zhang¹¹,
Yang Zhao¹¹ &
…
Zengyang Li¹¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 13071))

Included in the following conference series:

International Symposium on Dependable Software Engineering: Theories, Tools, and Applications

461 Accesses

Abstract

Enabling quick feature modification and delivery is important for a project’s success. Obtaining early estimates of software features’ bug-proneness is helpful for effectively allocating resources to the bug-prone features requiring further fixes. Researchers have proposed various studies on bug prediction at different granularity levels, such as class level, package level, method level, etc. However, there exists little work building predictive models at the feature level. In this paper, we investigated how to predict bug-prone features and monitor their evolution. More specifically, we first identified a project’s features and their involved files. Next, we collected a suite of code metrics and selected a relevant set of metrics as attributes to be used for six machine learning algorithms to predict bug-prone features. Through our evaluation, we have presented that using the machine learning algorithms with an appropriate set of code metrics, we can build effective models of bug prediction at the feature level. Furthermore, we build regression models to monitor growth trends of bug-prone features, which shows how these features accumulate bug-proneness over time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.atlassian.com/software/jira.
2.
https://scitools.com/.
3.
In this paper, a file means a source file which contains one or more classes. A feature often contains multiple files.
4.
https://www.cs.waikato.ac.nz/~ml/weka/.
5.
http://activemq.apache.org/.
6.
http://camel.apache.org/.
7.
http://cassandra.apache.org/.
8.
http://hibernate.org/.
9.
https://hive.apache.org/.
10.
https://wicket.apache.org/.

References

Antoniol, G., Ayari, K., Penta, M.D., Khomh, F., Gueheneuc, Y.-G.: Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, pp. 23:304–23:318 (2008)
Google Scholar
Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)
Article Google Scholar
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)
Article MathSciNet Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Cataldo, M., Mockus, A., Roberts, J.A., Herbsleb, J.D.: Software dependencies, work dependencies, and their impact on failures. IEEE Trans. Softw. Eng. 35(6), 864–878 (2009)
Article Google Scholar
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Article Google Scholar
Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding, pp. 22–29 (1992)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006)
Article Google Scholar
de Carvalho, A.B., Pozo, A., Vergilio, S.R.: A symbolic fault-prediction model based on multiobjective particle swarm optimization. J. Syst. Softw. 83(5), 868–882 (2010)
Article Google Scholar
Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Maint. Evol.: Res. Pract. 25, 53–95 (2011)
Article Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
MATH Google Scholar
Gao, K., Khoshgoftaar, T.M., Napolitano, A.: Combining feature subset selection and data sampling for coping with highly imbalanced software data. Int. J. Softw. Eng. Knowl. Eng. 115–146 (2015)
Google Scholar
Giger, E., D’Ambros, M., Pinzger, M., Gall, H.C.: Method-level bug prediction. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2012, pp. 171–180 (2012)
Google Scholar
Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)
Article Google Scholar
Hair, J.F., Ringle, C.M., Sarstedt, M.: PLS-SEM: indeed a silver bullet. J. Mark. Theory Pract. 19(2), 139–151 (2011)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)
Google Scholar
Hata, H., Mizuno, O., Kikuno, T.: Bug prediction based on fine-grained module histories. In: Proceedings of the 34th International Conference on Software Engineering, pp. 200–210 (2012)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. 2nd edn. (2004)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 392–401 (2013)
Google Scholar
Joseph, J., Hair, F., Hult, G.T.M., Ringle, C., Sarstedt, M.: A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). Sage, Thousand Oak (2013)
MATH Google Scholar
Kamei, Y., et al.: A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39(6), 757–773 (2013)
Article Google Scholar
Kim, S., Zimmermann, T., James Whitehead, J.E., Zeller, A.: Predicting faults from cached history. In: Proceedings of 29thInternational Conference on Software Engineering, pp. 489–498 (2007)
Google Scholar
Koru, A.G., Zhang, D., Emam, K.E., Liu, H.: An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans. Softw. Eng. 35(2), 293–304 (2009)
Article Google Scholar
Lewis, D.D.: Naive (bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666
Chapter Google Scholar
Malhotra, R., Khanna, M.: An empirical study for software change prediction using imbalanced data. Empir. Softw. Eng. 22(6), 2806–2851 (2017). https://doi.org/10.1007/s10664-016-9488-7
Article Google Scholar
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)
Article MathSciNet Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)
Article Google Scholar
Mishra, B., Engg, C., Shukla, K.: Defect prediction for object oriented software using support vector based fuzzy classification model. Int. J. Comput. Appl. (2012)
Google Scholar
Mo, R., Cai, Y., Kazman, R., Feng, Q.: Assessing an architecture’s ability to support feature evolution. In: Proceedings of the 26th Conference on Program Comprehension, pp. 297–307 (2018)
Google Scholar
Mockus, A., Weiss, D.M.: Predicting risk of software changes. Bell Labs Tech. J. 5(2), 169–180 (2000)
Article Google Scholar
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of 28th International Conference on Software Engineering, pp. 452–461 (2006)
Google Scholar
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31(4), 340–355 (2005)
Article Google Scholar
Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans. Softw. Eng. 33(10), 675–686 (2007)
Article Google Scholar
Schröter, A., Zimmermann, T., Zeller, A.: Predicting component failures at design time. In: Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering, ISESE 2006, pp. 18–27 (2006)
Google Scholar
Shatnawi, R.: Improving software fault-prediction for imbalanced data. In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 54–59 (2012)
Google Scholar
Sliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories, pp. 1–5 (2005)
Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)
Article Google Scholar
Stone, M.: Cross-validation choice and assessment of statistical predictions. J. Roy. Stat. Soc. 36, 111–133 (1974)
MATH Google Scholar
Syer, M.D., Nagappan, M., Adams, B., Hassan, A.E.: Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans. Softw. Eng. 41(2), 176–197 (2015)
Article Google Scholar
Wan, Z., Xia, X., Hassan, A.E., Lo, D., Yin, J., Yang, X.: Perceptions, expectations, and challenges in defect prediction. IEEE Trans. Softw. Eng. 46, 1241–1266 (2018)
Article Google Scholar
Yan, M., Fang, Y., Lo, D., Xia, X., Zhang, X.: File-level defect prediction: unsupervised vs. supervised models. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 344–353 (2017)
Google Scholar
Yang, Y., et al.: An empirical study on dependence clusters for effort-aware fault-proneness prediction. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pp. 296–307 (2016)
Google Scholar
Yang, Y., et al.: Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 157–168 (2016)
Google Scholar
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE 2009, pp. 91–100 (2009)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under the grant No. 62002129, the Hubei Provincial Natural Science Foundation of China under the grant No. 2020CFB473, and the Fundamental Research Funds for the Central Universities under the grant No. CCNU19TD003.

Author information

Authors and Affiliations

School of Computer, Central China Normal University, Wuhan, China
Shaozhi Wei, Ran Mo, Pu Xiong, Siyuan Zhang, Yang Zhao & Zengyang Li

Authors

Shaozhi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ran Mo
View author publications
You can also search for this author in PubMed Google Scholar
Pu Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zengyang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ran Mo .

Editor information

Editors and Affiliations

Teesside University, Middlesbrough, UK
Shengchao Qin
University of York, York, UK
Jim Woodcock
Institute of Software, Chinese Academy of Sciences, Beijing, China
Wenhui Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, S., Mo, R., Xiong, P., Zhang, S., Zhao, Y., Li, Z. (2021). Predicting and Monitoring Bug-Proneness at the Feature Level. In: Qin, S., Woodcock, J., Zhang, W. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2021. Lecture Notes in Computer Science(), vol 13071. Springer, Cham. https://doi.org/10.1007/978-3-030-91265-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-91265-9_11
Published: 18 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91264-2
Online ISBN: 978-3-030-91265-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics