Skip to main content
Log in

Sahand: A Software Fault-Prediction Method Using Autoencoder Neural Network and K-Means Algorithm

  • Published:
Journal of Electronic Testing Aims and scope Submit manuscript

Abstract

Software is playing a growing role in many safety-critical applications, and software systems dependability is a major concern. Predicting faulty modules of software before the testing phase is one method for enhancing software reliability. The ability to predict and identify the faulty modules of software can lower software testing costs. Machine learning algorithms can be used to solve software fault prediction problem. Identifying the faulty modules of software with the maximum accuracy, precision, and performance are the main objectives of this study. A hybrid method combining the autoencoder and the K-means algorithm is utilized in this paper to develop a software fault predictor. The autoencoder algorithm, as a preprocessor, is used to select the effective attributes of the training dataset and consequently to reduce its size. Using an autoencoder with the K-means clustering method results in lower clustering error and time. Tests conducted on the standard NASA PROMIS data sets demonstrate that by removing the inefficient elements from the training data set, the proposed fault predictor has increased accuracy (96%) and precision (93%). The recall criteria provided by the proposed method is about 87%. Also, reducing the time necessary to create the software fault predictor is the other merit of this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

Access.

The data relating to the current study is available via the following link:

https://drive.google.com/drive/folders/1-aX_QueAUV1PhL9rBOAFn0ZzS5RcnNXF? usp=drive_link.

Notes

  1. High Priority.

  2. Low Priority.

  3. Medium Priority.

References

  1. Iqra Batool B, Tamim Ahmed Khan AK (2022) Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Comput Electr Eng 100:0045–7906. https://doi.org/10.1016/j.compeleceng.2022.107886

    Article  Google Scholar 

  2. Al-Laham M, Kassaymeh S, Al-Betar MA, Makhadmeh SN, Albashish D, Alweshah M, Part A (2023) 0045–7906, https://doi.org/10.1016/j.compeleceng.2023.108923

  3. Mafarja M, Thaher T, Al-Betar MA et al (2023) Classification framework for faulty-software using enhanced exploratory whale optimiser-based feature selection scheme and random forest ensemble learning. Appl Intell 53:18715–18757. https://doi.org/10.1007/s10489-022-04427-x

    Article  Google Scholar 

  4. Yousef HA (2015) Extracting Software Static defect models using Data Mining. Ain Shams Eng J 6(1):133–144

    Article  Google Scholar 

  5. Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput 22(1):77–88. https://doi.org/10.1007/s10586-018-1730-1

    Article  Google Scholar 

  6. Arasteh B (2018) Software Fault-Prediction using combination of neural network and Naive Bayes Algorithm. J Netw Technol 9(3):94–101. https://doi.org/10.6025/jnt/2018/9/3/94-101

    Article  Google Scholar 

  7. Catal C, Diri B (2009) Investigating the Effect of Dataset Size, Metrics Sets and Feature Selection Techniques on Software Fault Prediction Problem, Information Sciences, Vol. 179, No. 8, pp. 1040–1058, Mar

  8. Radjenović D, Heričko M, Torkar R, Živkovič A (Aug 2013) Software Fault Prediction Metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418

  9. Anbu M, Anandha GS (2019) Feature selection using firefly algorithm in software defect prediction. Cluster Comput 22:10925–10934. https://doi.org/10.1007/s10586-017-1235-3

    Article  Google Scholar 

  10. Rathi SC, Misra S, Colomo-Palacios R, Adarsh R et al (2023) Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Syst Appl 223:0957–4174. https://doi.org/10.1016/j.eswa.2023.119806

    Article  Google Scholar 

  11. Promise software engineering repository [Online Available:http://promise.site.uottawa.ca/SERepository/datasets-page.html

  12. He P, Li B, Liu X, Chen J, Ma Y (2015) An Empirical Study on Software Defect Prediction with a Simplified Metric Set, Information and Software Technology, Vol. 59, pp. 170–190, Mar

  13. Sujitha KC, Leninisha S (2014) Software Fault Prediction Using Single Linkage Clustering Method, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 3, No. 2, Apr

  14. Rathore SS, Kumar S (March 2017) Linear and non-linear heterogeneous ensemble methods to predict the number of faults in Software systems. Knowl Based Syst 119:232–256

  15. Kaur S, Kumar D (2011) Quality Prediction of Object-Oriented Software Using Density Based Clustering Approach, International Journal of Engineering and Technology, Vol. 3, No. 4, pp. 440–445, Aug

  16. Catal C (April 2011) Software Fault Prediction: A literature review and current trends. Expert Syst Appl 38(4):4626–4636

  17. Jiang Y, Cukicc B, Menzies T (2007) Fault Prediction using Early Lifecycle Data, in Proceedings of 17th IEEE international symposium on software reliability, Sweden, pp. 237–246

  18. Kaszycki G (1999) Using Process Metrics to Enhance Software Fault Prediction Models, Proceedings of 10th international symposium on software reliability engineering, Boca Raton, Florida

  19. Moeyersoms J, Junqu E, Dejaeger K, Baesens B, Martens D (February 2015) Comprehensible Software Fault and Effort Prediction: A Data Mining Approach. J Syst Softw 100:80–90

  20. İrsoy O, Alpaydın E (2017) Unsupervised feature extraction with autoencoder trees, Neurocomputing, Volume 258, Pages 63–73, ISSN 0925–2312, https://doi.org/10.1016/j.neucom.2017.02.075

  21. Gharehchopogh F, Abdollahzadeh B, Arasteh B (2023) An Improved Farmland Fertility Algorithm with Hyper-Heuristic Approach for solving travelling salesman problem. CMES-Computer Model Eng Sci 135(3):1981–2006. https://doi.org/10.32604/cmes.2023.024172

    Article  Google Scholar 

  22. Arasteh B, Miremadi SG, Rahmani AM (2014) Developing inherently resilient Software against soft-errors based on Algorithm Level inherent features. J Electron Test 30:193–212. https://doi.org/10.1007/s10836-014-5438-8

    Article  Google Scholar 

  23. Soleimanian F, Abdollahzadeh B, Barshandeh S, Arasteh B (2023) A multi-objective mutation-based dynamic Harris Hawks optimization for botnet detection in IoT, Internet of things. 24:2542–6605. https://doi.org/10.1016/j.iot.2023.100952

  24. Arasteh B, Sadegi R, Arasteh K (2021) Bölen: software module clustering method using the combination of shuffled frog leaping and genetic algorithm. Data Technol Appl 55(2):251–279. https://doi.org/10.1108/DTA-08-2019-0138

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

The proposed method was developed and discretized by B. Arasteh and S. Golshani. The designed algorithm was implemented and coded by B. Arasteh and S. Shami. The implemented method code was adapted and benchmarked by B. Arasteh. The data and results analysis were performed by B. Arasteh and S. Golshani. The manuscript of the paper was written by B. Arasteh and F. Kiani.

Corresponding author

Correspondence to Bahman Arasteh.

Ethics declarations

Ethical and Informed Consent for data used

The data used in this research does not belong to any other person or third party and was prepared and generated by the researchers themselves during the research. The data of this research will be accessible to other researchers.

Competing Interests

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. The authors have no relevant financial or non-financial conflict of interest.

Additional information

Responsible Editor: Y. Malaiya.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arasteh, B., Golshan, S., Shami, S. et al. Sahand: A Software Fault-Prediction Method Using Autoencoder Neural Network and K-Means Algorithm. J Electron Test 40, 229–243 (2024). https://doi.org/10.1007/s10836-024-06116-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10836-024-06116-8

Keywords