Abstract
Developing software applications has become more perplexing nowadays due to the huge usage of software applications. Under such circumstances, developing software without defects is a very challenging task. So, detecting defects in software modules is necessary for the developers to allocate appropriate sources for the project. Knowing the defects in advance increases the software quality at a low cost. This article aims to develop a correlation-based neural network model for identifying defects in software projects. A novel correlation-based modified long short-term memory neural network (CM-LSTM) is proposed to estimate the software defects in software modules with modeled data. Based on the positive correlation between the features and the target variable, target variables have been changed. The prepared data is fed to the LSTM model to overcome the imbalance issue in the software defect prediction data. The adequacy of the proposed method is tested with a JM1 software defect prediction dataset with various performance parameters. It is observed that the proposed correlation-based modified LSTM technique is effective in detecting defects in software projects. The proposed technique employs correlation-based feature selection for long-short term memory neural networks to identify defects in software projects, and it is found to be more efficient than other existing approaches such as correlation-based LSTM, K-nearest neighbor, Stochastic gradient descent, Random forest, Gaussian Naive Bays, Logistic regression, Decision trees, Linear discriminant analysis, Multi-layer perceptron.
Similar content being viewed by others
References
Angelov P, Giglio V, Guardiola C, Lughofer E, Luján JM (2006) An approach to model-based fault detection in industrial measurement systems with application to engine test benches. Meas Sci Technol 17(7):1809–1818. https://doi.org/10.1088/0957-0233/17/7/020
Arar ÖF, Ayan K (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277. https://doi.org/10.1016/j.asoc.2015.04.045
Arora I, Saha A (2016) Comparison of back propagation training algorithms for software defect prediction,” in 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I). 51–58. https://doi.org/10.1109/IC3I.2016.7917934
Arora I, Saha A (2018) Software defect prediction: a comparison between artificial neural network and support vector machine. Adv Intell Syst Comput 562:51–61. https://doi.org/10.1007/978-981-10-4603-2_6
Askari MM, Bardsiri VK (2014) Software defect prediction using a high performance neural network. Int J Softw Eng Its Appl 8(12):177–188. https://doi.org/10.14257/ijseia.2014.8.12.17
Balogun AO, Bajeh AO, Orie VA, Yusuf-asaju AW (2018) Software defect prediction using ensemble learning: an ANP based evaluation method. J Eng Technol 3(2):50–55
Bashir K, Li T, Yohannese CW, Mahama Y (2017) Enhancing software defect prediction using supervised-learning based framework. Proc. 2017 12th Int. Conf. Intell. Syst. Knowl. Eng. ISKE 2017, vol. 2018, pp. 1–6. https://doi.org/10.1109/ISKE.2017.8258790
Batur Şahin C, Abualigah L (2021) A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection. Neural Comput Appl 33(20):14049–14067. https://doi.org/10.1007/s00521-021-06047-x
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Network 5(2):157–166. Available: https://doi.org/10.1109/72.279181
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In European Conference on Principles of Data Mining and Knowledge Discovery 2838:107–119
Costa BSJ, Angelov PP, Guedes LA (2014) Real-time fault detection using recursive density estimation. J Control Autom Electr Syst 25(4):428–437. https://doi.org/10.1007/s40313-014-0128-4
Di Nucci C et al (2003) A measurement system for odor classification based on the dynamic response of QCM sensors. IEEE Trans Instrum Meas 52(4):1079–1086. https://doi.org/10.1109/TIM.2003.814826
Dipa WA, Sunindyo WD (2021) Software defect prediction using SMOTE and artificial neural network. In 2021 International Conference on Data and Software Engineering (ICoDSE), pp 1–4. https://doi.org/10.1109/ICoDSE53690.2021.9648476
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36. https://doi.org/10.1111/j.0824-7935.2004.t01-1-00228.x
Fan G, Diao X, Yu H, Yang K, Chen L (2019) Software defect prediction via attention-based recurrent neural network. Sci Program 2019:1–14. https://doi.org/10.1155/2019/6230953
Gayathri M, Sudha A (2014) Software defect prediction system using multilayer perceptron neural network with data mining. Int J Recent Technol Eng 32:2277–3878
Geng W (2018) Cognitive deep neural networks prediction method for software fault tendency module based on bound particle swarm optimization. Cogn Syst Res 52:12–20. https://doi.org/10.1016/j.cogsys.2018.06.001
Halstead MH (1977) Elements of software science, vol 2. Elsevier Science Inc, Amsterdam
Hasanpour A, Farzi P, Tehrani A, Akbari R (2020) Software defect prediction based on deep learning models: performance study. [Online]. Available: http://arxiv.org/abs/2004.02589. Accessed 2 Apr 2020
Iqbal A, Aftab S (2020) A classification framework for software defect prediction using multi-filter feature selection technique and MLP. Int J Mod Educ Comput Sci 12(1):18–25. https://doi.org/10.5815/ijmecs.2020.01.03
Iqbal A et al (2019) Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Int J Adv Comput Sci Appl 10(5):300–308. https://doi.org/10.14569/ijacsa.2019.0100538
Jacob RJ, Kamat RJ, Sahithya NM, John SS, Shankar SP (2021) Voting based ensemble classification for software defect prediction. In: 2021 IEEE Mysore Sub Section International Conference (MysuruCon), pp. 358–365. https://doi.org/10.1109/MysuruCon52639.2021.9641713.
Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput 22(S1):77–88. https://doi.org/10.1007/s10586-018-1730-1
Khoshgoftaar TM, Gao K, Seliya N (2010) Attribute selection and imbalanced data: problems in software defect prediction. Proc.—Int. Conf. Tools with Artif. Intell. ICTAI. (1)137–144. https://doi.org/10.1109/ICTAI.2010.27.
Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man, Cybern. Part ASystems Humans. 41(3):552–568. https://doi.org/10.1109/TSMCA.2010.2084081
Kovács B, Tinya F, Németh C, Ódor P (2020) Unfolding the effects of different forestry treatments on microclimate in oak forests: results of a 4-yr experiment. Ecol Appl 30(2):321–357. https://doi.org/10.1002/eap.2043
Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402. https://doi.org/10.1016/j.infsof.2014.07.005
Li J, He P, Zhu J, Lyu MR (2017) Software defect prediction via convolutional neural network. In: 2017 International Conference on Software Quality, Reliability and Security (QRS) pp. 318–328. https://doi.org/10.1109/QRS.2017.42.
Liang H, Yu Y, Jiang L, Xie Z (2019) Seml: a semantic LSTM model for software defect prediction. IEEE Access 7:83812–83824. https://doi.org/10.1109/ACCESS.2019.2925313
Maddipati SS, Pradeepini G, Yesubabu A (2018) Software defect prediction using adaptive neuro fuzzy inference system. Int J Appl Eng Res 13(1):394–397
Magal KR, Gracia Jacob S (2015) Improved random forest algorithm for software defect prediction through data mining techniques. Int. J. Comput. Appl. 117(23):18–22. https://doi.org/10.5120/20693-3582.
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320. https://doi.org/10.1109/TSE.1976.233837
Menzies T (2004) JM1 Software defect prediction. http://promise.site.uottawa.ca/SERepository/datasets/jm1.arff. Accessed 28 Jul 2020
Miholca D-L, Czibula G, Czibula IG (2018) A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks. Inf Sci (NY) 441:152–170. https://doi.org/10.1016/j.ins.2018.02.027
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181. https://doi.org/10.1007/s10664-012-9218-8
Pak C, Wang TT, Su XH (2018) An empirical study on software defect prediction using over-sampling by SMOTE. Int J Softw Eng Knowl Eng 28(6):811–830. https://doi.org/10.1142/S0218194018500237
Pelayo L, Dick S (2007) Applying novel resampling strategies to software defect prediction. Annu. Conf. North Am. Fuzzy Inf. Process. Soc. - NAFIPS, pp. 69–72. https://doi.org/10.1109/NAFIPS.2007.383813.
Petric J, Bowes D, Hall T, Christianson B, Baddoo N (2016) Building an ensemble for software defect prediction based on diversity selection. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM ’16, pp. 1–10. https://doi.org/10.1145/2961111.2962610.
Prasad MCM, Florence LF, Arya3 A (2015) A study on software metrics based software defect prediction using data mining and machine learning techniques. Int. J. Database Theory Appl. 8(3):179–190. https://doi.org/10.14257/ijdta.2015.8.3.15.
Qiao L, Li X, Umer Q, Guo P (2020) Deep learning based software defect prediction. Neurocomputing 385:100–110. https://doi.org/10.1016/j.neucom.2019.11.067
Rodriguez D, Herraiz I, Harrison R, Dolado J, Riquelme JC (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. ACM Int Conf Proceeding Ser. https://doi.org/10.1145/2601248.2601294
Şahin CB, Dinler ÖB, Abualigah L (2021) Prediction of software vulnerability based deep symbiotic genetic algorithms: phenotyping of dominant-features. Appl Intell 51(11):8271–8287. https://doi.org/10.1007/s10489-021-02324-3
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man, Cybern—part A Syst. Humans 40(1):185–197. https://doi.org/10.1109/TSMCA.2009.2029559.
Shakhovska N, Yakovyna V (2021) Feature selection and software defect prediction by different ensemble classifiers. Springer, Cham, pp 307–313
Shatnawi R (2012) Improving software fault-prediction for imbalanced data. In 2012 International Conference on Innovations in Information Technology (IIT). https://doi.org/10.1109/INNOVATIONS.2012.6207774.
Shuai B, Li H, Li M, Zhang Q, Tang C (2013) Software defect prediction using dynamic support vector machine. In Proceedings—9th International Conference on Computational Intelligence and Security, CIS 2013, pp. 260–26. https://doi.org/10.1109/CIS.2013.61
Siers MJ, Islam MZ (2015) Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf Syst 51:62–71. https://doi.org/10.1016/j.is.2015.02.006
Suresh Kumar P, Behera HS, Nayak J, Naik B (2021) Bootstrap aggregation ensemble learning-based reliable approach for software defect prediction by using characterized code feature. Innov Syst Softw Eng 2019:1–22. https://doi.org/10.1007/s11334-021-00399-2
Wang T, Li W (2010) Naive Bayes software defect prediction model. In 2010 International Conference on Computational Intelligence and Software Engineering, 2006, pp. 1–4. https://doi.org/10.1109/CISE.2010.5677057.
Wang S Yao X (2010) The effectiveness of a new negative correlation learning algorithm for classification ensembles. In 2010 IEEE International Conference on Data Mining Workshops, pp. 1013–1020. https://doi.org/10.1109/ICDMW.2010.196.
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443. https://doi.org/10.1109/TR.2013.2259203
Zhao L, Shang Z, Zhao L, Qin A, Tang YY (2019) Siamese dense neural network for software defect prediction with small data. IEEE Access 7(7663–7677). https://doi.org/10.1109/ACCESS.2018.2889061.
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77. https://doi.org/10.1109/TKDE.2006.17
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that this manuscript has no conflict of interest with any other published source and has not been published previously (partly or in full). No data have been fabricated or manipulated to support our conclusions.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pemmada, S.K., Behera, H.S., Nayak, J. et al. Correlation-based modified long short-term memory network approach for software defect prediction. Evolving Systems 13, 869–887 (2022). https://doi.org/10.1007/s12530-022-09423-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-022-09423-7