Abstract
Software Defect Prediction (SDP) is an advanced technological method of predicting software defects in the software development life cycle. Various research works have been previously being done on SDP but the performance of these methods varied from several datasets, hence, making them inconsistent for SDP in the unknown software project. But the hybrid technique using feature selection enabled with machine learning for SDP can be very efficient as it takes the advantage of various methods to come up with better prediction accuracy for a given dataset when compared with an individual classifier. The major issues with individual ML-based models for SDP are the long detection time, vulnerability of the software project, and high dimensionality of the feature parameters. Therefore, this study proposes a hybrid model using a feature selection enabled Extreme Gradient Boost (XGB) classifier to address these mentioned challenges. The cleaned NASA MDP datasets were used for the implementation of the proposed model, and various performance metrics like F-score, accuracy, and MCC were used to reveal the performance of the model. The results of the proposed model when compared with state-of-the-art methods without feature selection perform better in terms of the metrics used. The results reveal that the proposed model outperformed all other prediction techniques.
Keywords
A. E. Adeniyi and M. K. Abiodun—Landmark University SDG 4 (Quality Education)
M. K. Abiodun—Landmark University SDG 16 (Peace and Justice, Strong Institution)
A. E. Adeniyi—Landmark University SDG 11 (Sustainable Cities and Communities)
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Rathore, S.S., Kumar, S.: Towards an ensemble based system for predicting the number of software faults. Expert Syst. Appl. 82, 357–382 (2017)
Laradji, I.H., Alshayeb, M., Ghouti, L.: Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58, 388–402 (2015)
Abisoye, O.A., Akanji, O.S., Abisoye, B.O., Awotunde, J.: Slow hypertext transfer protocol mitigation model in software defined networks. In: 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy, ICDABI 2020, 9325601 (2020)
Malhotra, R., Jain, J.: Handling imbalanced data using ensemble learning in software defect prediction. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 300–304. IEEE (2020)
Awotunde, J.B., Ayo, F.E., Ogundokun, R.O., Matiluko, O.E., Adeniyi, E.A.: Investigating the roles of effective communication among stakeholders in collaborative software development projects. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, 12254 LNCS, pp. 311–319 (2020)
Awotunde, J.B., Folorunso, S.O., Bhoi, A.K., Adebayo, P.O., Ijaz, M.F.: Disease diagnosis system for IoT-based wearable body sensors with machine learning algorithm. Intelligent Systems Reference Library 2021(209), 201–222 (2021)
Awotunde, J.B., Misra, S.: Feature extraction and artificial intelligence-based intrusion detection model for a secure internet of things networks. Lecture Notes Data Eng. .ications Technol. 2022(109), 21–44 (2022)
Behera, R.K., Shukla, S., Rath, S.K., Misra, S.: Software reliability assessment using machine learning technique. In: Gervasi, O., et al. (eds.) Computational Science and Its Applications – ICCSA 2018: 18th International Conference, Melbourne, VIC, Australia, July 2-5, 2018, Proceedings, Part V, pp. 403–411. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-95174-4_32
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 1–13 (2020)
Shukla, S., Behera, R.K., Misra, S., Rath, S.K.: Software reliability assessment using deep learning technique. In: Chakraverty, S., Goel, A., Misra, S. (eds.) Towards Extensible and Adaptable Methods in Computing, pp. 57–68. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2348-5_5
Awotunde, J.B., Chakraborty, C., Adeniyi, A.E.: Intrusion detection in industrial internet of things network-based on deep learning model with rule-based feature selection. Wirel. Commun. Mob. Comput. 2021(2021), 7154587 (2021)
Ogundokun, R.O., Awotunde, J.B., Sadiku, P., Adeniyi, E.A., Abiodun, M., Dauda, O.I.: An enhanced intrusion detection system using particle swarm optimization feature extraction technique. Procedia Computer Science 193, 504–512 (2021)
Jagdhuber, R., Lang, M., Stenzl, A., Neuhaus, J., Rahnenführer, J.: Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms. BMC Bioinformatics 21(1), 1–21 (2020)
Kumari, A., Behera, R.K., Sahoo, B., Sahoo, S.P.: Prediction of link evolution using community detection in social network. Computing, 1–22 (2022)
Mishra, N., Soni, H.K., Sharma, S., Upadhyay, A.K.: Development and analysis of artificial neural network models for rainfall prediction by using time-series data. International Journal of Intelligent Systems Applications, 10(1) (2018)
Zhang, X., Mohanty, S.N., Parida, A.K., Pani, S.K., Dong, B., Cheng, X.: Annual and non-monsoon rainfall prediction modelling using SVR-MLP: an empirical study from Odisha. IEEE Access 8, 30223–30233 (2020)
Jagdale, R.S., Shirsat, V.S., Deshmukh, S.N.: Sentiment analysis on product reviews using machine learning techniques. In: Mallick, P.K., Balas, V.E., Bhoi, A.K., Zobaa, A.F. (eds.) Cognitive Informatics and Soft Computing. AISC, vol. 768, pp. 639–647. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0617-4_61
Hassonah, M.A., Al-Sayyed, R., Rodan, A., Ala’M, A.Z., Aljarah, I., Faris, H.: An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter. Knowledge-Based Syst.192, 105353 (2020)
Rehman, A.U., Malik, A.K., Raza, B., Ali, W.: A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimedia Tools and Applications 78(18), 26597–26613 (2019)
Awotunde, J.B., Abiodun, K.M., Adeniyi, E.A., Folorunso, S.O., Jimoh, R.G.: A deep learning-based intrusion detection technique for a secured IoMT system. Communications in Computer and Information Science, 2022, 1547 CCIS, pp. 50–62 (2021)
Verma, A., Ranga, V.: Machine learning based intrusion detection systems for IoT applications. Wireless Pers. Commun. 111(4), 2287–2310 (2020)
Amouri, A., Alaparthy, V.T., Morgera, S.D.: A machine learning based intrusion detection system for mobile Internet of Things. Sensors 20(2), 461 (2020)
Matloob, F., Aftab, S., Iqbal, A.: A framework for software defect prediction using feature selection and ensemble learning techniques. International Journal of Modern Education Computer Sci. 11(12) (2019)
Yalçıner, B., Özdeş, M.: Software defect estimation using machine learning algorithms. In: 2019 4th International Conference on Computer Science and Engineering (UBMK), pp. 487–491. IEEE (2019)
Arar, Ö.F., Ayan, K.: Software defect prediction using cost-sensitive neural network. Appl. Soft Comput. 33, 263–277 (2015)
Iqbal, A., et al.: Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Int. J. Adv. Comput. Sci. Appl 10(5), 300–308 (2019)
Iqbal, A., Aftab, S., Ullah, I., Bashir, M.S., Saeed, M.A.: A feature selection based ensemble classification framework for software defect prediction. Int. J. Modern Education Comput. Sci. 11(9), 54 (2019)
Lanubile, F., Lonigro, A., Vissagio, G.: Comparing models for identifying fault-prone software components. In: SEKE, pp. 312–319 (1995)
Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)
Gondra, I.: Applying machine learning to software fault-proneness prediction. J. Syst. Softw. 81(2), 186–195 (2008)
Manjula, C., Florence, L.: Deep neural network based hybrid approach for software defect prediction using software metrics. Clust. Comput. 22(4), 9847–9863 (2018). https://doi.org/10.1007/s10586-018-1696-z
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec. 31(1), 76–77 (2002)
Dai, H., Hwang, H.G., Tseng, V.S.: Convolutional neural network based automatic screening tool for cardiovascular diseases using different intervals of ECG signals. Comput. Methods Programs Biomed. 203, 106035 (2021)
Awotunde, J.B., et al.: An improved machine learnings diagnosis technique for COVID-19 pandemic using chest X-ray images. Communications in Computer and Information Science, 2021, 1455 CCIS, pp. 319–330 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Awotunde, J.B., Misra, S., Adeniyi, A.E., Abiodun, M.K., Kaushik, M., Lawrence, M.O. (2022). A Feature Selection-Based K-NN Model for Fast Software Defect Prediction. In: Gervasi, O., Murgante, B., Misra, S., Rocha, A.M.A.C., Garau, C. (eds) Computational Science and Its Applications – ICCSA 2022 Workshops. ICCSA 2022. Lecture Notes in Computer Science, vol 13380. Springer, Cham. https://doi.org/10.1007/978-3-031-10542-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-10542-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10541-8
Online ISBN: 978-3-031-10542-5
eBook Packages: Computer ScienceComputer Science (R0)