Automatic approval prediction for software enhancement requests

Nizamani, Zeeshan Ahmed; Liu, Hui; Chen, David Matthew; Niu, Zhendong

doi:10.1007/s10515-017-0229-y

Automatic approval prediction for software enhancement requests

Published: 26 October 2017

Volume 25, pages 347–381, (2018)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Zeeshan Ahmed Nizamani ORCID: orcid.org/0000-0002-7819-8526¹,
Hui Liu¹,
David Matthew Chen¹ &
…
Zhendong Niu¹

956 Accesses
17 Citations
Explore all metrics

Abstract

Software applications often receive a large number of enhancement requests that suggest developers to fulfill additional functions. Such requests are usually checked manually by the developers, which is time consuming and tedious. Consequently, an approach that can automatically predict whether a new enhancement report will be approved is beneficial for both the developers and enhancement suggesters. With the approach, according to their available time, the developers can rank the reports and thus limit the number of reports to evaluate from large collection of low quality enhancement requests that are unlikely to be approved. The approach can help developers respond to the useful requests more quickly. To this end, we propose a multinomial naive Bayes based approach to automatically predict whether a new enhancement report is likely to be approved or rejected. We acquire the enhancement reports of open-source software applications from Bugzilla for evaluation. Each report is preprocessed and modeled as a vector. Using these vectors with their corresponding approval status, we train a Bayes based classifier. The trained classifier predicts approval or rejection of the new enhancement reports. We apply different machine learning and neural network algorithms, and it turns out that the multinomial naive Bayes classifier yields the highest accuracy with the given dataset. The proposed approach is evaluated with 40,000 enhancement reports from 35 open source applications. The results of tenfold cross validation suggest that the average accuracy is up to 89.25%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Applications of AI in classical software engineering

Article Open access 26 July 2020

Notes

https://twinword-lemmatizer1.p.mashape.com/extract/, verified 03/03/2016.
https://bugzilla.mozilla.org/, verified 26/02/2016.
https://bugzilla.mozilla.org/rest/bug?severity=enhancement, verified 26/02/2016.
https://github.com/shanniz/Bugzilla.
https://bugzilla.mozilla.org/rest/bug/426904/comment, verified 26/02/2016.
https://github.com/zeeshanniz/enhancement.approval.prediction, verified 30/08/2017.
https://wiki.mozilla.org/Bugzilla_Products, verified 30/08/2017.
http://www.openpr.org.cn/index.php/NLP-Toolkit-for-Natural-Language-Processing/43-Naive-Bayes-Classfier/View-details.html, verified 13/05/2016.
http://svmlight.joachims.org, verified 27/05/2016.
https://github.com/yandongliu/learningjs, verified 20/05/2016.
http://deeplearning.net/, verified 10/08/2016.

References

Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)
Article Google Scholar
Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., Guéhéneuc, Y.G.: Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, ACM, p. 23 (2008)
Anvik, J.: Automating bug report assignment. In: Proceedings of the 28th international Conference on Software engineering, ACM, pp. 937–940 (2006)
Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, ACM, pp. 361–370 (2006)
Banerjee, S., Cukic, B., Adjeroh, D.: Automated duplicate bug report classification using subsequence matching. In: 2012 IEEE 14th International Symposium on High-Assurance Systems Engineering (HASE), IEEE, pp. 74–81 (2012)
Bhattacharya, P., Neamtiu, I., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)
Article Google Scholar
Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with naive bayes. Expert Syst. Appl. 15, 2160–2164 (2011)
Google Scholar
Chen, Z., Lü, K.: A preprocess algorithm of filtering irrelevant information based on the minimum class difference. Knowl.-Based Syst. 19(6), 422–429 (2006)
Article Google Scholar
Delany, S., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)
Article Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29(2), 103–130 (1997)
Article MATH Google Scholar
Eberhardt, J.: Bayesian spam detection. Sch. Horiz. Univ. Minn. Morris Undergrad. J. 2(1), 2 (2015)
Google Scholar
Feng, L., Song, L., Sha, C., Gong, X.: Practical duplicate bug reports detection in a large web-based development community. In: Web Technologies and Applications, Springer, pp. 709–720 (2013)
Gad, W., Rady, S.: Email filtering based on supervised learning and mutual information feature selection. In: 2015 Tenth International Conference on Computer Engineering & Systems (ICCES), IEEE, pp. 147–152 (2015)
Gopalan, R., Krishna, A.: Duplicate bug report detection using clustering. In: Software Engineering Conference (ASWEC), 2014 23rd Australian, IEEE, pp. 104–109 (2014)
Hellerstein, J., Thathachar, J., Rish, I.: Recognizing End-User Transactions in Performance Management, vol. 19. IBM Thomas J, Watson Research Division, New York (2000)
Google Scholar
Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, pp. 392–401 (2013)
Hindle, A., Alipour, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection and ranking. Empir. Softw. Eng. 21(2), 368–410 (2016)
Article Google Scholar
Hu, H., Zhang, H., Xuan, J., Sun, W.: Effective bug triage based on historical bug-fix information. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp. 122–132 (2014)
Jeong, G., Kim, S., Zimmermann, T.: Improving bug triage with bug tossing graphs. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ACM, pp. 111–120 (2009)
Jiang, L., Cai, Z., Zhang, H., Wang, D.: Naive bayes text classifiers: a locally weighted learning approach. J. Exp. Theor. Artif. Intell. 25, 273–286 (2013)
Article Google Scholar
Jin, Z., Li, Q., Zeng, D., Wang, L.: Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion. In: 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), IEEE, pp. 132–134 (2015)
Lamkanfi, A., Demeyer, S., Giger, E., Goethals, B.: Predicting the severity of a reported bug. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 1–10 (2010)
Lamkanfi, A., Demeyer, S., Soetens, Q., Verdonck, T.: Comparing mining algorithms for predicting the severity of a reported bug. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), IEEE, vol. 322, pp. 249–258 (2011)
Lazar, A., Ritchey, S., Sharif, B.: Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, pp. 308–311 (2014)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: international Conference on Machine Learning, 2014, ICML, vol. 14, pp. 1188–1196 (2014)
Lin, M., Yang, C., Lee, C., Chen, C.: Enhancements for duplication detection in bug reports with manifold correlation features. J. Syst. Softw. 121, 223–233 (2016)
Article Google Scholar
Liu, Y., Liu, Z., Chua, T., Sun, M.: Topical word embeddings. In: The 29th AAAI Conference on Artificial Intelligence (AAAI’15), AAAI, pp. 2418–2424 (2015)
Murphy, G., Cubranic, D.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering, Citeseer (2004)
Menzies, T., Marcus, A.: Automated severity assessment of software defect reports. In: IEEE International Conference on Software Maintenance, ICSM, pp. 346–355 (2008)
Naguib, H., Narayan, N., Brügge, B., Helal, D.: Bug report assignee recommendation using activity profiles. In: 2013 10th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 22–30 (2013)
Pingclasai, N., Hata, H., Matsumoto, K.: Classifying bug reports to bugs and other requests using topic modeling. In: Software Engineering Conference (APSEC), 2013 20th Asia-Pacific, IEEE, vol. 2, pp. 13–18 (2013)
Rajlich, V.: Software evolution and maintenance. In: Proceedings of the on Future of Software Engineering, ACM, pp. 133–144 (2014)
Rish, I.: An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, IBM New York, vol. 3, pp. 41–46 (2001)
Rish, I., Hellerstein, J., Jayram, T.: An analysis of data characteristics that affect naive bayes performance. IBM TJ Watson Research Center 30 (2001)
Roy, N.K.S., Rossi, B.: Towards an improvement of bug severity classification. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), IEEE, pp. 269–276 (2014a)
Roy, N.S., Rossi, B.: Towards an improvement of bug severity classification. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), IEEE, pp. 269–276 (2014b)
Santos, I., Laorden, C., Sanz, B., Bringas, P.: Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Syst. Appl. 39(1), 437–444 (2012)
Article Google Scholar
Saric, F., Glavas, G., Karan, M., Snajder, J., Basic, B.: Takelab: Systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 441–448 (2012)
Schölkopf, B., Burges, C.: Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)
MATH Google Scholar
Sohrawardi, S.J., Azam, I., Hosain, S.: A comparative study of text classification algorithms on user submitted bug reports. In: 2014 Ninth International Conference on Digital Information Management (ICDIM), IEEE, pp. 242–247 (2014)
Su, J., Shirab, J., Matwin, S.: Large scale text classification using semi-supervised multinomial naive bayes. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 97–104 (2011)
Sun, C., Lo, D., Khoo, S., Jiang, J.: Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, IEEE Computer Society, pp. 253–262 (2011)
Tan, S., Wang, Y., Wu, G.: Adapting centroid classifier for document categorization. Expert Syst. Appl. 38(8), 10,264–10,273 (2011)
Article Google Scholar
Thung, F., Kochhar, P.S., Lo, D.: Dupfinder: integrated tool support for duplicate bug report detection. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ACM, pp. 871–874 (2014)
Tian, Y., Sun, C., Lo, D.: Improved duplicate bug report identification. In: 2012 16th European Conference on Software Maintenance and Reengineering (CSMR), IEEE, pp. 385–390 (2012)
Valdivia Garcia, H., Shihab, E.: Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, pp. 72–81 (2014)
Wang, S., Jiang, L., Li, C.: Adapting naive bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2014)
Article Google Scholar
Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering, ACM, p. 461470 (2008)
Wei, Z., Feng, G.: An improvement to naive bayes for text classification. Proc. Eng. 15, 2160–2164 (2011)
Article Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Xia, X., Lo, D., Shihab, E., Wang, X., Yang, X.: Elblocker: Predicting blocking bugs with ensemble imbalance learning. Inf. Softw. Technol. 61, 93–106 (2015)
Article Google Scholar
Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification (2017). arXiv preprint arXiv:1704.04769
Xuan, H., Ming, L.: Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1909–1915 (2017)
Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manag. 48(4), 741–754 (2012)
Article Google Scholar
Zaghloul, W., Lee, S.M., Trimi, S.: Text classification: neural networks vs support vector machines. Ind. Manag. Data Syst. 109(5), 708–717 (2009)
Article Google Scholar
Zhang, H.: The optimality of naive bayes. AA 1(2), 3 (2004)
Google Scholar
Zhang, H., Li, D.: Naïve bayes text classifier. In: IEEE International Conference on Granular Computing, 2007. GRC 2007, IEEE, pp. 708–708 (2007)
Zhang, W., Tang, X., Yoshida, T.: TESC: An approach to TExt classification using semi-supervised clustering. Knowl.-Based Syst. 75, 152–160 (2015)
Article Google Scholar
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Article Google Scholar
Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, IEEE Press, pp. 14–24 (2012)
Zimmermann, T., Premraj, R., Bettenburg, N., Just, S., Schroter, A., Weiss, C.: What makes a good bug report? IEEE Trans. Softw. Eng. 36(5), 618–643 (2010)
Article Google Scholar

Download references

Acknowledgements

The work is supported by the National Key Research and Development Program of China (2016YFB1000801) and the National Natural Science Foundation of China (61472034, 61772071,61690205).

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Zeeshan Ahmed Nizamani, Hui Liu, David Matthew Chen & Zhendong Niu

Authors

Zeeshan Ahmed Nizamani
View author publications
You can also search for this author in PubMed Google Scholar
Hui Liu
View author publications
You can also search for this author in PubMed Google Scholar
David Matthew Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhendong Niu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nizamani, Z.A., Liu, H., Chen, D.M. et al. Automatic approval prediction for software enhancement requests. Autom Softw Eng 25, 347–381 (2018). https://doi.org/10.1007/s10515-017-0229-y

Download citation

Received: 17 October 2016
Accepted: 30 September 2017
Published: 26 October 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10515-017-0229-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic approval prediction for software enhancement requests

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Sampling in software engineering research: a critical review and guidelines

Applications of AI in classical software engineering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic approval prediction for software enhancement requests

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Sampling in software engineering research: a critical review and guidelines

Applications of AI in classical software engineering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation