Abstract
Software is playing a growing role in many safety-critical applications, and software systems dependability is a major concern. Predicting faulty modules of software before the testing phase is one method for enhancing software reliability. The ability to predict and identify the faulty modules of software can lower software testing costs. Machine learning algorithms can be used to solve software fault prediction problem. Identifying the faulty modules of software with the maximum accuracy, precision, and performance are the main objectives of this study. A hybrid method combining the autoencoder and the K-means algorithm is utilized in this paper to develop a software fault predictor. The autoencoder algorithm, as a preprocessor, is used to select the effective attributes of the training dataset and consequently to reduce its size. Using an autoencoder with the K-means clustering method results in lower clustering error and time. Tests conducted on the standard NASA PROMIS data sets demonstrate that by removing the inefficient elements from the training data set, the proposed fault predictor has increased accuracy (96%) and precision (93%). The recall criteria provided by the proposed method is about 87%. Also, reducing the time necessary to create the software fault predictor is the other merit of this study.








Similar content being viewed by others
Data Availability
Access.
The data relating to the current study is available via the following link:
https://drive.google.com/drive/folders/1-aX_QueAUV1PhL9rBOAFn0ZzS5RcnNXF? usp=drive_link.
Notes
High Priority.
Low Priority.
Medium Priority.
References
Iqra Batool B, Tamim Ahmed Khan AK (2022) Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Comput Electr Eng 100:0045–7906. https://doi.org/10.1016/j.compeleceng.2022.107886
Al-Laham M, Kassaymeh S, Al-Betar MA, Makhadmeh SN, Albashish D, Alweshah M, Part A (2023) 0045–7906, https://doi.org/10.1016/j.compeleceng.2023.108923
Mafarja M, Thaher T, Al-Betar MA et al (2023) Classification framework for faulty-software using enhanced exploratory whale optimiser-based feature selection scheme and random forest ensemble learning. Appl Intell 53:18715–18757. https://doi.org/10.1007/s10489-022-04427-x
Yousef HA (2015) Extracting Software Static defect models using Data Mining. Ain Shams Eng J 6(1):133–144
Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput 22(1):77–88. https://doi.org/10.1007/s10586-018-1730-1
Arasteh B (2018) Software Fault-Prediction using combination of neural network and Naive Bayes Algorithm. J Netw Technol 9(3):94–101. https://doi.org/10.6025/jnt/2018/9/3/94-101
Catal C, Diri B (2009) Investigating the Effect of Dataset Size, Metrics Sets and Feature Selection Techniques on Software Fault Prediction Problem, Information Sciences, Vol. 179, No. 8, pp. 1040–1058, Mar
Radjenović D, Heričko M, Torkar R, Živkovič A (Aug 2013) Software Fault Prediction Metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418
Anbu M, Anandha GS (2019) Feature selection using firefly algorithm in software defect prediction. Cluster Comput 22:10925–10934. https://doi.org/10.1007/s10586-017-1235-3
Rathi SC, Misra S, Colomo-Palacios R, Adarsh R et al (2023) Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Syst Appl 223:0957–4174. https://doi.org/10.1016/j.eswa.2023.119806
Promise software engineering repository [Online Available:http://promise.site.uottawa.ca/SERepository/datasets-page.html
He P, Li B, Liu X, Chen J, Ma Y (2015) An Empirical Study on Software Defect Prediction with a Simplified Metric Set, Information and Software Technology, Vol. 59, pp. 170–190, Mar
Sujitha KC, Leninisha S (2014) Software Fault Prediction Using Single Linkage Clustering Method, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 3, No. 2, Apr
Rathore SS, Kumar S (March 2017) Linear and non-linear heterogeneous ensemble methods to predict the number of faults in Software systems. Knowl Based Syst 119:232–256
Kaur S, Kumar D (2011) Quality Prediction of Object-Oriented Software Using Density Based Clustering Approach, International Journal of Engineering and Technology, Vol. 3, No. 4, pp. 440–445, Aug
Catal C (April 2011) Software Fault Prediction: A literature review and current trends. Expert Syst Appl 38(4):4626–4636
Jiang Y, Cukicc B, Menzies T (2007) Fault Prediction using Early Lifecycle Data, in Proceedings of 17th IEEE international symposium on software reliability, Sweden, pp. 237–246
Kaszycki G (1999) Using Process Metrics to Enhance Software Fault Prediction Models, Proceedings of 10th international symposium on software reliability engineering, Boca Raton, Florida
Moeyersoms J, Junqu E, Dejaeger K, Baesens B, Martens D (February 2015) Comprehensible Software Fault and Effort Prediction: A Data Mining Approach. J Syst Softw 100:80–90
İrsoy O, Alpaydın E (2017) Unsupervised feature extraction with autoencoder trees, Neurocomputing, Volume 258, Pages 63–73, ISSN 0925–2312, https://doi.org/10.1016/j.neucom.2017.02.075
Gharehchopogh F, Abdollahzadeh B, Arasteh B (2023) An Improved Farmland Fertility Algorithm with Hyper-Heuristic Approach for solving travelling salesman problem. CMES-Computer Model Eng Sci 135(3):1981–2006. https://doi.org/10.32604/cmes.2023.024172
Arasteh B, Miremadi SG, Rahmani AM (2014) Developing inherently resilient Software against soft-errors based on Algorithm Level inherent features. J Electron Test 30:193–212. https://doi.org/10.1007/s10836-014-5438-8
Soleimanian F, Abdollahzadeh B, Barshandeh S, Arasteh B (2023) A multi-objective mutation-based dynamic Harris Hawks optimization for botnet detection in IoT, Internet of things. 24:2542–6605. https://doi.org/10.1016/j.iot.2023.100952
Arasteh B, Sadegi R, Arasteh K (2021) Bölen: software module clustering method using the combination of shuffled frog leaping and genetic algorithm. Data Technol Appl 55(2):251–279. https://doi.org/10.1108/DTA-08-2019-0138
Author information
Authors and Affiliations
Contributions
The proposed method was developed and discretized by B. Arasteh and S. Golshani. The designed algorithm was implemented and coded by B. Arasteh and S. Shami. The implemented method code was adapted and benchmarked by B. Arasteh. The data and results analysis were performed by B. Arasteh and S. Golshani. The manuscript of the paper was written by B. Arasteh and F. Kiani.
Corresponding author
Ethics declarations
Ethical and Informed Consent for data used
The data used in this research does not belong to any other person or third party and was prepared and generated by the researchers themselves during the research. The data of this research will be accessible to other researchers.
Competing Interests
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. The authors have no relevant financial or non-financial conflict of interest.
Additional information
Responsible Editor: Y. Malaiya.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Arasteh, B., Golshan, S., Shami, S. et al. Sahand: A Software Fault-Prediction Method Using Autoencoder Neural Network and K-Means Algorithm. J Electron Test 40, 229–243 (2024). https://doi.org/10.1007/s10836-024-06116-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10836-024-06116-8