Sahand: A Software Fault-Prediction Method Using Autoencoder Neural Network and K-Means Algorithm

Arasteh, Bahman; Golshan, Sahar; Shami, Shiva; Kiani, Farzad

doi:10.1007/s10836-024-06116-8

Sahand: A Software Fault-Prediction Method Using Autoencoder Neural Network and K-Means Algorithm

Published: 12 April 2024

Volume 40, pages 229–243, (2024)
Cite this article

Journal of Electronic Testing Aims and scope Submit manuscript

Bahman Arasteh ORCID: orcid.org/0000-0001-5202-6315^1,2,
Sahar Golshan³,
Shiva Shami⁴ &
…
Farzad Kiani⁵

277 Accesses
Explore all metrics

Abstract

Software is playing a growing role in many safety-critical applications, and software systems dependability is a major concern. Predicting faulty modules of software before the testing phase is one method for enhancing software reliability. The ability to predict and identify the faulty modules of software can lower software testing costs. Machine learning algorithms can be used to solve software fault prediction problem. Identifying the faulty modules of software with the maximum accuracy, precision, and performance are the main objectives of this study. A hybrid method combining the autoencoder and the K-means algorithm is utilized in this paper to develop a software fault predictor. The autoencoder algorithm, as a preprocessor, is used to select the effective attributes of the training dataset and consequently to reduce its size. Using an autoencoder with the K-means clustering method results in lower clustering error and time. Tests conducted on the standard NASA PROMIS data sets demonstrate that by removing the inefficient elements from the training data set, the proposed fault predictor has increased accuracy (96%) and precision (93%). The recall criteria provided by the proposed method is about 87%. Also, reducing the time necessary to create the software fault predictor is the other merit of this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fault-Prone Software Classes Recognition via Artificial Neural Network with Granular Dataset Balancing

SLMBC: spiral life cycle model-based Bayesian classification technique for efficient software fault prediction and classification

Article 23 August 2016

Recognizing Faults in Software Related Difficult Data

Data Availability

Access.

The data relating to the current study is available via the following link:

https://drive.google.com/drive/folders/1-aX_QueAUV1PhL9rBOAFn0ZzS5RcnNXF? usp=drive_link.

Notes

High Priority.
Low Priority.
Medium Priority.

References

Iqra Batool B, Tamim Ahmed Khan AK (2022) Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Comput Electr Eng 100:0045–7906. https://doi.org/10.1016/j.compeleceng.2022.107886
Article Google Scholar
Al-Laham M, Kassaymeh S, Al-Betar MA, Makhadmeh SN, Albashish D, Alweshah M, Part A (2023) 0045–7906, https://doi.org/10.1016/j.compeleceng.2023.108923
Mafarja M, Thaher T, Al-Betar MA et al (2023) Classification framework for faulty-software using enhanced exploratory whale optimiser-based feature selection scheme and random forest ensemble learning. Appl Intell 53:18715–18757. https://doi.org/10.1007/s10489-022-04427-x
Article Google Scholar
Yousef HA (2015) Extracting Software Static defect models using Data Mining. Ain Shams Eng J 6(1):133–144
Article Google Scholar
Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput 22(1):77–88. https://doi.org/10.1007/s10586-018-1730-1
Article Google Scholar
Arasteh B (2018) Software Fault-Prediction using combination of neural network and Naive Bayes Algorithm. J Netw Technol 9(3):94–101. https://doi.org/10.6025/jnt/2018/9/3/94-101
Article Google Scholar
Catal C, Diri B (2009) Investigating the Effect of Dataset Size, Metrics Sets and Feature Selection Techniques on Software Fault Prediction Problem, Information Sciences, Vol. 179, No. 8, pp. 1040–1058, Mar
Radjenović D, Heričko M, Torkar R, Živkovič A (Aug 2013) Software Fault Prediction Metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418
Anbu M, Anandha GS (2019) Feature selection using firefly algorithm in software defect prediction. Cluster Comput 22:10925–10934. https://doi.org/10.1007/s10586-017-1235-3
Article Google Scholar
Rathi SC, Misra S, Colomo-Palacios R, Adarsh R et al (2023) Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Syst Appl 223:0957–4174. https://doi.org/10.1016/j.eswa.2023.119806
Article Google Scholar
Promise software engineering repository [Online Available:http://promise.site.uottawa.ca/SERepository/datasets-page.html
He P, Li B, Liu X, Chen J, Ma Y (2015) An Empirical Study on Software Defect Prediction with a Simplified Metric Set, Information and Software Technology, Vol. 59, pp. 170–190, Mar
Sujitha KC, Leninisha S (2014) Software Fault Prediction Using Single Linkage Clustering Method, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 3, No. 2, Apr
Rathore SS, Kumar S (March 2017) Linear and non-linear heterogeneous ensemble methods to predict the number of faults in Software systems. Knowl Based Syst 119:232–256
Kaur S, Kumar D (2011) Quality Prediction of Object-Oriented Software Using Density Based Clustering Approach, International Journal of Engineering and Technology, Vol. 3, No. 4, pp. 440–445, Aug
Catal C (April 2011) Software Fault Prediction: A literature review and current trends. Expert Syst Appl 38(4):4626–4636
Jiang Y, Cukicc B, Menzies T (2007) Fault Prediction using Early Lifecycle Data, in Proceedings of 17th IEEE international symposium on software reliability, Sweden, pp. 237–246
Kaszycki G (1999) Using Process Metrics to Enhance Software Fault Prediction Models, Proceedings of 10th international symposium on software reliability engineering, Boca Raton, Florida
Moeyersoms J, Junqu E, Dejaeger K, Baesens B, Martens D (February 2015) Comprehensible Software Fault and Effort Prediction: A Data Mining Approach. J Syst Softw 100:80–90
İrsoy O, Alpaydın E (2017) Unsupervised feature extraction with autoencoder trees, Neurocomputing, Volume 258, Pages 63–73, ISSN 0925–2312, https://doi.org/10.1016/j.neucom.2017.02.075
Gharehchopogh F, Abdollahzadeh B, Arasteh B (2023) An Improved Farmland Fertility Algorithm with Hyper-Heuristic Approach for solving travelling salesman problem. CMES-Computer Model Eng Sci 135(3):1981–2006. https://doi.org/10.32604/cmes.2023.024172
Article Google Scholar
Arasteh B, Miremadi SG, Rahmani AM (2014) Developing inherently resilient Software against soft-errors based on Algorithm Level inherent features. J Electron Test 30:193–212. https://doi.org/10.1007/s10836-014-5438-8
Article Google Scholar
Soleimanian F, Abdollahzadeh B, Barshandeh S, Arasteh B (2023) A multi-objective mutation-based dynamic Harris Hawks optimization for botnet detection in IoT, Internet of things. 24:2542–6605. https://doi.org/10.1016/j.iot.2023.100952
Arasteh B, Sadegi R, Arasteh K (2021) Bölen: software module clustering method using the combination of shuffled frog leaping and genetic algorithm. Data Technol Appl 55(2):251–279. https://doi.org/10.1108/DTA-08-2019-0138
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Faculty of Engineering and Natural Science, Istinye University, Istanbul, Turkey
Bahman Arasteh
Applied Science Research Center, Applied Science Private University, Amman, Jordan
Bahman Arasteh
Department of Software Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran
Sahar Golshan
Department of Software Engineering, Seraj Institute, Tabriz, Azerbaijan Province, Iran
Shiva Shami
Data Science Application and Research Center (VEBIM), Fatih Sultan Mehmet Vakif University, Istanbul, Turkey
Farzad Kiani

Authors

Bahman Arasteh
View author publications
You can also search for this author inPubMed Google Scholar
Sahar Golshan
View author publications
You can also search for this author inPubMed Google Scholar
Shiva Shami
View author publications
You can also search for this author inPubMed Google Scholar
Farzad Kiani
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

The proposed method was developed and discretized by B. Arasteh and S. Golshani. The designed algorithm was implemented and coded by B. Arasteh and S. Shami. The implemented method code was adapted and benchmarked by B. Arasteh. The data and results analysis were performed by B. Arasteh and S. Golshani. The manuscript of the paper was written by B. Arasteh and F. Kiani.

Corresponding author

Correspondence to Bahman Arasteh.

Ethics declarations

Ethical and Informed Consent for data used

The data used in this research does not belong to any other person or third party and was prepared and generated by the researchers themselves during the research. The data of this research will be accessible to other researchers.

Competing Interests

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. The authors have no relevant financial or non-financial conflict of interest.

Additional information

Responsible Editor: Y. Malaiya.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Arasteh, B., Golshan, S., Shami, S. et al. Sahand: A Software Fault-Prediction Method Using Autoencoder Neural Network and K-Means Algorithm. J Electron Test 40, 229–243 (2024). https://doi.org/10.1007/s10836-024-06116-8

Download citation

Received: 03 October 2023
Accepted: 26 March 2024
Published: 12 April 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10836-024-06116-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sahand: A Software Fault-Prediction Method Using Autoencoder Neural Network and K-Means Algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fault-Prone Software Classes Recognition via Artificial Neural Network with Granular Dataset Balancing

SLMBC: spiral life cycle model-based Bayesian classification technique for efficient software fault prediction and classification

Recognizing Faults in Software Related Difficult Data

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and Informed Consent for data used

Competing Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now