Skip to main content

Search-Based Wrapper Feature Selection Methods in Software Defect Prediction: An Empirical Analysis

  • Conference paper
  • First Online:
Intelligent Algorithms in Software Engineering (CSOC 2020)

Abstract

High dimensionality is a data quality problem that negatively influences the predictive capabilities of prediction models in software defect prediction (SDP). As a viable solution, feature selection (FS) has been used to address the high dimensionality problem in SDP. From existing studies, Filter-based feature selection (FFS) and Wrapper Feature Selection (WFS) are the two basic types of FS methods. WFS methods have been regarded to have superior performance between the two. However, WFS methods have been known to have high computational cost as the number of executions required for feature subset search, evaluation and selection is not known prior. This often leads to overfitting of prediction models due to easy trapping in local maxima. Applying appropriate search method in WFS subset evaluator phase can resolve its trapping in local maxima. Best First Search (BFS) and Greedy Step-wise Search (GSS) methods have been extensively and conventionally used as viable search methods in WFS with positive impacts. However, metaheuristic search methods can also be as effective as BFS and GSS. Consequently, this study conducts an empirical comparative analysis of 13 search methods (11 state-of-the-art metaheuristic search and 2 conventional search methods) in WFS methods for SDP. The experimental results showed that metaheuristic (AS, BS, BAT, CS, ES, FS, FLS, GS, NSGA-II, PSOS, RS) as search methods in WFS proved to be better than conventional search methods (BFS and GSS). Although the average computational time of metaheuristic-based WFS methods is relatively high. We recommend that metaheuristic search can be used as alternate search methods for WFS methods in SDP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mojeed, H.A., Bajeh, A.O., Balogun, A.O., Adeleke, H.O.: Memetic approach for multi-objective overtime planning in software engineering projects. J. Eng. Sci. Technol. 14, 3213–3233 (2019)

    Google Scholar 

  2. Balogun, A.O., Basri, S., Abdulkadir, S.J., Adeyemo, V.E., Imam, A.A., Bajeh, A.O.: Software defect prediction: analysis of class imbalance and performance stability. J. Eng. Sci. Technol. 14, 3294–3308 (2019)

    Google Scholar 

  3. Iqbal, A., Aftab, S., Matloob, F.: Performance analysis of resampling techniques on class imbalance issue in software defect prediction. Int. J. Inf. Technol. Comput. Sci. 11, 44–54 (2019)

    Google Scholar 

  4. Matloob, F., Aftab, S., Iqbal, A.: A framework for software defect prediction using feature selection and ensemble learning techniques. Int. J. Mod. Educ. Comput. Sci. 11(12), 14–20 (2019)

    Article  Google Scholar 

  5. Basri, S., Almomani, M.A., Imam, A.A., Thangiah, M., Gilal, A.R., Balogun, A.O.: The organisational factors of software process improvement in small software industry: comparative study. In: International Conference of Reliable Information and Communication Technology, pp. 1132–1143. Springer, Johor (2019)

    Google Scholar 

  6. Mabayoje, M.A., Balogun, A.O., Jibril, H.A., Atoyebi, J.O., Mojeed, H.A., Adeyemo, V.E.: Parameter tuning in KNN for software defect prediction: an empirical analysis. Jurnal Teknologi dan Sistem Komputer 7 (2019)

    Google Scholar 

  7. Li, L., Lessmann, S., Baesens, B.: Evaluating software defect prediction performance: an updated benchmarking study. arXiv preprint arXiv:1901.01726 (2019)

  8. Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: comments on “researcher bias: the use of machine learning in software defect prediction”. IEEE Trans. Softw. Eng. 42, 1092–1094 (2016)

    Article  Google Scholar 

  9. Nam, J., Fu, W., Kim, S., Menzies, T., Tan, L.: Heterogeneous defect prediction. IEEE Trans. Softw. Eng. 44, 874–896 (2017)

    Article  Google Scholar 

  10. Akintola, A.G., Balogun, A.O., Lafenwa-Balogun, F., Mojeed, H.A.: Comparative analysis of selected heterogeneous classifiers for software defects prediction using filter-based feature selection methods. FUOYE J. Eng. Technol. 3, 134–137 (2018)

    Google Scholar 

  11. Bowes, D., Hall, T., Petrić, J.: Software defect prediction: do different classifiers find the same defects? Softw. Qual. J. 26(2), 525–552 (2017). https://doi.org/10.1007/s11219-016-9353-3

    Article  Google Scholar 

  12. Mabayoje, M.A., Balogun, A.O., Bello, S.M., Atoyebi, J.O., Mojeed, H.A., Ekundayo, A.H.: Wrapper feature selection based heterogeneous classifiers for software defect prediction. Adeleke Univ. J. Eng. Technol. 2, 1–11 (2019)

    Google Scholar 

  13. Gao, K., Khoshgoftaar, T.M., Wang, H., Seliya, N.: Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw.: Pract. Exp. 41, 579–606 (2011)

    Google Scholar 

  14. Balogun, A.O., Bajeh, A.O., Orie, V.A., Yusuf-Asaju, W.A.: Software defect prediction using ensemble learning: an ANP based evaluation method. FUOYE J. Eng. Technol. 3, 50–55 (2018)

    Google Scholar 

  15. Jimoh, R., Balogun, A., Bajeh, A., Ajayi, S.: A PROMETHEE based evaluation of software defect predictors. J. Comput. Sci. Appl. 25, 106–119 (2018)

    Google Scholar 

  16. Xu, Z., Liu, J., Yang, Z., An, G., Jia, X.: The impact of feature selection on defect prediction performance: An empirical comparison. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 309–320. IEEE (2016)

    Google Scholar 

  17. Ameen, A.O., Balogun, A.O., Usman, G., Fashoto, G.S.: Heterogeneous ensemble methods based on filter feature selection. Comput. Inf. Syst. Dev. Inform. J. 7, 63–78 (2016)

    Google Scholar 

  18. Ghotra, B., McIntosh, S., Hassan, A.E.: A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 146–157. IEEE (2017)

    Google Scholar 

  19. Wahono, R.S., Suryana, N., Ahmad, S.: Metaheuristic optimization based feature selection for software defect prediction. J. Softw. 9, 1324–1333 (2014)

    Article  Google Scholar 

  20. Song, Q., Jia, Z., Shepperd, M., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37, 356–370 (2010)

    Article  Google Scholar 

  21. Muthukumaran, K., Rallapalli, A., Murthy, N.B.: Impact of feature selection techniques on bug prediction models. In: Proceedings of the 8th India Software Engineering Conference, pp. 120–129 (2015)

    Google Scholar 

  22. Rodríguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J.: Detecting fault modules applying feature selection to classifiers. In: 2007 IEEE International Conference on Information Reuse and Integration, pp. 667–672. IEEE (2007)

    Google Scholar 

  23. Al-Tashi, Q., Abdulkadir, S.J., Rais, H.M., Mirjalili, S., Alhussian, H.: Binary optimization using hybrid grey wolf optimization for feature selection. IEEE Access 7, 39496–39508 (2019)

    Article  Google Scholar 

  24. Yu, Q., Jiang, S., Zhang, Y.: The performance stability of defect prediction models with class imbalance: an empirical study. IEICE Trans. Inf. Syst. 100, 265–272 (2017)

    Article  Google Scholar 

  25. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33, 2–13 (2007)

    Article  Google Scholar 

  26. Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34, 485–496 (2008)

    Article  Google Scholar 

  27. Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: Some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39, 1208–1215 (2013)

    Article  Google Scholar 

  28. Kondo, M., Bezemer, C.-P., Kamei, Y., Hassan, A.E., Mizuno, O.: The impact of feature reduction techniques on defect prediction models. Empirical Softw. Eng. 24(4), 1925–1963 (2019). https://doi.org/10.1007/s10664-018-9679-5

    Article  Google Scholar 

  29. Balogun, A.O., Basri, S., Abdulkadir, S.J., Hashim, A.S.: Performance analysis of feature selection methods in software defect prediction: a search method approach. Appl. Sci. 9, 2764 (2019)

    Article  Google Scholar 

  30. Rathore, S.S., Gupta, A.: A comparative study of feature-ranking and feature-subset selection techniques for improved fault prediction. In: Proceedings of the 7th India Software Engineering Conference, p. 7. ACM (2014)

    Google Scholar 

  31. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, Heidelberg (2013)

    Book  Google Scholar 

  32. Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, Heidelberg (2013)

    Book  Google Scholar 

  33. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM Sig. Exp. 11, 10–18 (2009)

    Article  Google Scholar 

  34. Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: The impact of automated parameter optimization on defect prediction models. IEEE Trans. Softw. Eng. 45(7), 1–32 (2018)

    Article  Google Scholar 

  35. Chen, X., Shen, Y., Cui, Z., Ju, X.: Applying feature selection to software defect prediction using multi-objective optimization. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 54–59. IEEE (2017)

    Google Scholar 

Download references

Acknowledgement

This research/paper was fully supported by Universiti Teknologi PETRONAS, under the Yayasan Universiti Teknologi PETRONAS (YUTP) Research Grant Scheme (YUTP-FRG/015LC0240).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdullateef O. Balogun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Balogun, A.O. et al. (2020). Search-Based Wrapper Feature Selection Methods in Software Defect Prediction: An Empirical Analysis. In: Silhavy, R. (eds) Intelligent Algorithms in Software Engineering. CSOC 2020. Advances in Intelligent Systems and Computing, vol 1224. Springer, Cham. https://doi.org/10.1007/978-3-030-51965-0_43

Download citation

Publish with us

Policies and ethics