Skip to main content
Log in

Automatic approval prediction for software enhancement requests

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Software applications often receive a large number of enhancement requests that suggest developers to fulfill additional functions. Such requests are usually checked manually by the developers, which is time consuming and tedious. Consequently, an approach that can automatically predict whether a new enhancement report will be approved is beneficial for both the developers and enhancement suggesters. With the approach, according to their available time, the developers can rank the reports and thus limit the number of reports to evaluate from large collection of low quality enhancement requests that are unlikely to be approved. The approach can help developers respond to the useful requests more quickly. To this end, we propose a multinomial naive Bayes based approach to automatically predict whether a new enhancement report is likely to be approved or rejected. We acquire the enhancement reports of open-source software applications from Bugzilla for evaluation. Each report is preprocessed and modeled as a vector. Using these vectors with their corresponding approval status, we train a Bayes based classifier. The trained classifier predicts approval or rejection of the new enhancement reports. We apply different machine learning and neural network algorithms, and it turns out that the multinomial naive Bayes classifier yields the highest accuracy with the given dataset. The proposed approach is evaluated with 40,000 enhancement reports from 35 open source applications. The results of tenfold cross validation suggest that the average accuracy is up to 89.25%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://twinword-lemmatizer1.p.mashape.com/extract/, verified 03/03/2016.

  2. https://bugzilla.mozilla.org/, verified 26/02/2016.

  3. https://bugzilla.mozilla.org/rest/bug?severity=enhancement, verified 26/02/2016.

  4. https://github.com/shanniz/Bugzilla.

  5. https://bugzilla.mozilla.org/rest/bug/426904/comment, verified 26/02/2016.

  6. https://github.com/zeeshanniz/enhancement.approval.prediction, verified 30/08/2017.

  7. https://wiki.mozilla.org/Bugzilla_Products, verified 30/08/2017.

  8. http://www.openpr.org.cn/index.php/NLP-Toolkit-for-Natural-Language-Processing/43-Naive-Bayes-Classfier/View-details.html, verified 13/05/2016.

  9. http://svmlight.joachims.org, verified 27/05/2016.

  10. https://github.com/yandongliu/learningjs, verified 20/05/2016.

  11. http://deeplearning.net/, verified 10/08/2016.

References

  • Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)

    Article  Google Scholar 

  • Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., Guéhéneuc, Y.G.: Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, ACM, p. 23 (2008)

  • Anvik, J.: Automating bug report assignment. In: Proceedings of the 28th international Conference on Software engineering, ACM, pp. 937–940 (2006)

  • Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, ACM, pp. 361–370 (2006)

  • Banerjee, S., Cukic, B., Adjeroh, D.: Automated duplicate bug report classification using subsequence matching. In: 2012 IEEE 14th International Symposium on High-Assurance Systems Engineering (HASE), IEEE, pp. 74–81 (2012)

  • Bhattacharya, P., Neamtiu, I., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)

    Article  Google Scholar 

  • Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with naive bayes. Expert Syst. Appl. 15, 2160–2164 (2011)

    Google Scholar 

  • Chen, Z., Lü, K.: A preprocess algorithm of filtering irrelevant information based on the minimum class difference. Knowl.-Based Syst. 19(6), 422–429 (2006)

    Article  Google Scholar 

  • Delany, S., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)

    Article  Google Scholar 

  • Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29(2), 103–130 (1997)

    Article  MATH  Google Scholar 

  • Eberhardt, J.: Bayesian spam detection. Sch. Horiz. Univ. Minn. Morris Undergrad. J. 2(1), 2 (2015)

    Google Scholar 

  • Feng, L., Song, L., Sha, C., Gong, X.: Practical duplicate bug reports detection in a large web-based development community. In: Web Technologies and Applications, Springer, pp. 709–720 (2013)

  • Gad, W., Rady, S.: Email filtering based on supervised learning and mutual information feature selection. In: 2015 Tenth International Conference on Computer Engineering & Systems (ICCES), IEEE, pp. 147–152 (2015)

  • Gopalan, R., Krishna, A.: Duplicate bug report detection using clustering. In: Software Engineering Conference (ASWEC), 2014 23rd Australian, IEEE, pp. 104–109 (2014)

  • Hellerstein, J., Thathachar, J., Rish, I.: Recognizing End-User Transactions in Performance Management, vol. 19. IBM Thomas J, Watson Research Division, New York (2000)

    Google Scholar 

  • Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, pp. 392–401 (2013)

  • Hindle, A., Alipour, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection and ranking. Empir. Softw. Eng. 21(2), 368–410 (2016)

    Article  Google Scholar 

  • Hu, H., Zhang, H., Xuan, J., Sun, W.: Effective bug triage based on historical bug-fix information. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp. 122–132 (2014)

  • Jeong, G., Kim, S., Zimmermann, T.: Improving bug triage with bug tossing graphs. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ACM, pp. 111–120 (2009)

  • Jiang, L., Cai, Z., Zhang, H., Wang, D.: Naive bayes text classifiers: a locally weighted learning approach. J. Exp. Theor. Artif. Intell. 25, 273–286 (2013)

    Article  Google Scholar 

  • Jin, Z., Li, Q., Zeng, D., Wang, L.: Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion. In: 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), IEEE, pp. 132–134 (2015)

  • Lamkanfi, A., Demeyer, S., Giger, E., Goethals, B.: Predicting the severity of a reported bug. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 1–10 (2010)

  • Lamkanfi, A., Demeyer, S., Soetens, Q., Verdonck, T.: Comparing mining algorithms for predicting the severity of a reported bug. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), IEEE, vol. 322, pp. 249–258 (2011)

  • Lazar, A., Ritchey, S., Sharif, B.: Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, pp. 308–311 (2014)

  • Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: international Conference on Machine Learning, 2014, ICML, vol. 14, pp. 1188–1196 (2014)

  • Lin, M., Yang, C., Lee, C., Chen, C.: Enhancements for duplication detection in bug reports with manifold correlation features. J. Syst. Softw. 121, 223–233 (2016)

    Article  Google Scholar 

  • Liu, Y., Liu, Z., Chua, T., Sun, M.: Topical word embeddings. In: The 29th AAAI Conference on Artificial Intelligence (AAAI’15), AAAI, pp. 2418–2424 (2015)

  • Murphy, G., Cubranic, D.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering, Citeseer (2004)

  • Menzies, T., Marcus, A.: Automated severity assessment of software defect reports. In: IEEE International Conference on Software Maintenance, ICSM, pp. 346–355 (2008)

  • Naguib, H., Narayan, N., Brügge, B., Helal, D.: Bug report assignee recommendation using activity profiles. In: 2013 10th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 22–30 (2013)

  • Pingclasai, N., Hata, H., Matsumoto, K.: Classifying bug reports to bugs and other requests using topic modeling. In: Software Engineering Conference (APSEC), 2013 20th Asia-Pacific, IEEE, vol. 2, pp. 13–18 (2013)

  • Rajlich, V.: Software evolution and maintenance. In: Proceedings of the on Future of Software Engineering, ACM, pp. 133–144 (2014)

  • Rish, I.: An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, IBM New York, vol. 3, pp. 41–46 (2001)

  • Rish, I., Hellerstein, J., Jayram, T.: An analysis of data characteristics that affect naive bayes performance. IBM TJ Watson Research Center 30 (2001)

  • Roy, N.K.S., Rossi, B.: Towards an improvement of bug severity classification. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), IEEE, pp. 269–276 (2014a)

  • Roy, N.S., Rossi, B.: Towards an improvement of bug severity classification. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), IEEE, pp. 269–276 (2014b)

  • Santos, I., Laorden, C., Sanz, B., Bringas, P.: Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Syst. Appl. 39(1), 437–444 (2012)

    Article  Google Scholar 

  • Saric, F., Glavas, G., Karan, M., Snajder, J., Basic, B.: Takelab: Systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 441–448 (2012)

  • Schölkopf, B., Burges, C.: Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  • Sohrawardi, S.J., Azam, I., Hosain, S.: A comparative study of text classification algorithms on user submitted bug reports. In: 2014 Ninth International Conference on Digital Information Management (ICDIM), IEEE, pp. 242–247 (2014)

  • Su, J., Shirab, J., Matwin, S.: Large scale text classification using semi-supervised multinomial naive bayes. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 97–104 (2011)

  • Sun, C., Lo, D., Khoo, S., Jiang, J.: Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, IEEE Computer Society, pp. 253–262 (2011)

  • Tan, S., Wang, Y., Wu, G.: Adapting centroid classifier for document categorization. Expert Syst. Appl. 38(8), 10,264–10,273 (2011)

    Article  Google Scholar 

  • Thung, F., Kochhar, P.S., Lo, D.: Dupfinder: integrated tool support for duplicate bug report detection. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ACM, pp. 871–874 (2014)

  • Tian, Y., Sun, C., Lo, D.: Improved duplicate bug report identification. In: 2012 16th European Conference on Software Maintenance and Reengineering (CSMR), IEEE, pp. 385–390 (2012)

  • Valdivia Garcia, H., Shihab, E.: Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, pp. 72–81 (2014)

  • Wang, S., Jiang, L., Li, C.: Adapting naive bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2014)

    Article  Google Scholar 

  • Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering, ACM, p. 461470 (2008)

  • Wei, Z., Feng, G.: An improvement to naive bayes for text classification. Proc. Eng. 15, 2160–2164 (2011)

    Article  Google Scholar 

  • Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  • Xia, X., Lo, D., Shihab, E., Wang, X., Yang, X.: Elblocker: Predicting blocking bugs with ensemble imbalance learning. Inf. Softw. Technol. 61, 93–106 (2015)

    Article  Google Scholar 

  • Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification (2017). arXiv preprint arXiv:1704.04769

  • Xuan, H., Ming, L.: Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1909–1915 (2017)

  • Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manag. 48(4), 741–754 (2012)

    Article  Google Scholar 

  • Zaghloul, W., Lee, S.M., Trimi, S.: Text classification: neural networks vs support vector machines. Ind. Manag. Data Syst. 109(5), 708–717 (2009)

    Article  Google Scholar 

  • Zhang, H.: The optimality of naive bayes. AA 1(2), 3 (2004)

    Google Scholar 

  • Zhang, H., Li, D.: Naïve bayes text classifier. In: IEEE International Conference on Granular Computing, 2007. GRC 2007, IEEE, pp. 708–708 (2007)

  • Zhang, W., Tang, X., Yoshida, T.: TESC: An approach to TExt classification using semi-supervised clustering. Knowl.-Based Syst. 75, 152–160 (2015)

    Article  Google Scholar 

  • Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)

    Article  Google Scholar 

  • Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, IEEE Press, pp. 14–24 (2012)

  • Zimmermann, T., Premraj, R., Bettenburg, N., Just, S., Schroter, A., Weiss, C.: What makes a good bug report? IEEE Trans. Softw. Eng. 36(5), 618–643 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

The work is supported by the National Key Research and Development Program of China (2016YFB1000801) and the National Natural Science Foundation of China (61472034, 61772071,61690205).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nizamani, Z.A., Liu, H., Chen, D.M. et al. Automatic approval prediction for software enhancement requests. Autom Softw Eng 25, 347–381 (2018). https://doi.org/10.1007/s10515-017-0229-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-017-0229-y

Keywords

Navigation