Abstract
System analysts often use software fault prediction models to identify fault-prone modules during the design phase of the software development life cycle. The models help predict faulty modules based on the software metrics that are input to the models. In this study, we consider 20 types of metrics to develop a model using an extreme learning machine associated with various kernel methods. We evaluate the effectiveness of the mode using a proposed framework based on the cost and efficiency in the testing phases. The evaluation process is carried out by considering case studies for 30 object-oriented software systems. Experimental results demonstrate that the application of a fault prediction model is suitable for projects with the percentage of faulty classes below a certain threshold, which depends on the efficiency of fault identification (low: 47.28%; median: 39.24%; high: 25.72%). We consider nine feature selection techniques to remove the irrelevant metrics and to select the best set of source code metrics for fault prediction.
Similar content being viewed by others
References
Abaei G, Selamat A, Fujita H, 2015. An empirical study based on semi–supervised hybrid self–organizing map for software fault prediction. Knowl–Based Syst, 74:28–39. https://doi.org/10.1016/j.knosys.2014.10.017
Aggarwal KK, Singh Y, Kaur A, et al., 2009. Empirical analysis for investigating the effect of object–oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract, 14(1):39–62. https://doi.org/10.1002/spip.389
Arisholm E, Briand LC, Johannessen EB, 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Emp Softw Eng, 83(1):2–17. https://doi.org/10.1016/j.jss.2009.06.055
Briand LC, Wüst J, Daly JW, et al., 2000. Exploring the relationships between design measures and software quality in object–oriented systems. J Syst Softw, 51(3):245–273. https://doi.org/10.1016/S0164-1212(99)00102-8
Camargo Cruz AE, Ochimizu K, 2009. Towards logistic regression models for predicting fault–prone code across software projects. Proc 3rd Int Symp on Empirical Software Engineering and Measurement, p.460–463. https://doi.org/10.1109/ESEM.2009.5316002
Cartwright M, Shepperd M, 2000. An empirical investigation of an object–oriented software system. IEEE Trans Softw Eng, 26(8):786–796. https://doi.org/10.1109/32.879814
Chidamber SR, Kemerer CF, 1991. Towards a metrics suite for object–oriented design. Proc 6th ACM Conf on Object–Oriented Programming Systems, Languages, and Applications, p.197–211. https://doi.org/10.1145/118014.117970
Chidamber SR, Kemerer CF, 1994. A metrics suite for object–oriented design. IEEE Trans Softw Eng, 20(6):476–493. https://doi.org/10.1109/32.295895
Dash M, Liu H, 2003. Consistency–based search in feature selection. Artif Intell, 151(1–2):155–176. https://doi.org/10.1016/S0004-3702(03)00079-1
Doraisamy S, Golzari S, Mohd N, et al., 2008. A study on feature selection and classification techniques for automatic genre classification of traditional malay music. ISMIR, p.331–336.
El Emam K, Melo W, Machado JC, 2001. The prediction of faulty classes using object–oriented design metrics. J Syst Softw, 56(1):63–75. https://doi.org/10.1016/S0164-1212(00)00086-8
Erturk E, Sezer EA, 2015. A comparison of some soft computing methods for software fault prediction. Exp Syst Appl, 42(4):1872–1879. https://doi.org/10.1016/j.eswa.2014.10.025
Fokaefs M, Mikhaiel R, Tsantalis N, et al., 2011. An empirical study on web service evolution. IEEE Int Conf on Web Services, p.49–56. https://doi.org/10.1109/ICWS.2011.114
Forman G, 2003. An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res, 3(2):1289–1305.
Furlanello C, Serafini M, Merler S, et al., 2003. Entropybased gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinform, 4(1):54. https://doi.org/10.1186/1471-2105-4-54
Gao K, Khoshgoftaar TM, Wang H, et al., 2011. Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp, 41(5):579–606. https://doi.org/10.1002/spe.1043
Goyal R, Chandra P, Singh Y, 2014. Suitability of KNN regression in the development of interaction based software fault prediction models. IERI Proc, 6:15–21. https://doi.org/10.1016/j.ieri.2014.03.004
Gyimothy T, Ferenc R, Siket I, 2005. Empirical validation of object–oriented metrics on open source software for fault prediction. IEEE Trans Softw, 31(10):897–910. https://doi.org/10.1109/TSE.2005.112
Halstead MH, 1977. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York, NY, USA.
Huang GB, Zhu QY, Siew CK, 2006. Extreme learning machine: theory and applications. Neurocomputing, 70(1):489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Huitt R, Wilde N, 1992. Maintenance support for objectoriented programs. IEEE Trans Softw Eng, 18(12):1038–1044. https://doi.org/10.1109/ICSM.1991.160324
Jiang Y, Cukic B, Ma Y, 2008. Techniques for evaluating fault prediction models. Emp Softw Eng, 13(5):561–595. https://doi.org/10.1007/s10664-008-9079-3
Jing XY, Ying S, Zhang ZW, et al., 2014a. Dictionary learning based software defect prediction. Proc 36th Int Conf on Software Engineering, p.414–423. https://doi.org/10.1145/2568225.2568320
Jing XY, Zhang ZW, Ying S, et al., 2014b. Software defect prediction based on collaborative representation classification. Companion Proc 36th Int Conf on Software Engineering, p.632–633. https://doi.org/10.1145/2591062.2591151
Jing XY, Wu F, Dong XW, et al., 2015. Heterogeneous cross–company defect prediction by unified metric representation and CCA–based transfer learning. Proc 10th Joint Meeting on Foundations of Software Engineering, p.496–507. https://doi.org/10.1145/2786805.2786813
Jing XY, Wu F, Dong XW, et al., 2017. An improved SDA based defect prediction framework for both within–project and cross–project class–imbalance problems. IEEE Trans Softw Eng, 43(4):321–339. https://doi.org/10.1109/TSE.2016.2597849
Jones C, 2010. Software quality in 2010: a survey of the state of the art. https://doi.org/semat.org/documents/20181/27952/software_quality_survey_2010.pdf/7cf00a73-c290-47fe-a5ff-4449ba32f65b
Kanmani S, Uthariaraj VR, Sankaranarayanan V, et al., 2007. Object–oriented software fault prediction using neural networks. Inform Softw Technol, 49(5):483–492. https://doi.org/10.1016/j.infsof.2006.07.005
Kapila H, Singh S, 2013. Analysis of CK metrics to predict software fault–proneness using Bayesian inference. Int J Comput Appl, 74(2):1–4. https://doi.org/10.5120/12854-9152
Kohavi R, 1995. A study of cross–validation and bootstrap for accuracy estimation and model selection. Proc 14th Int Joint Conf on Artificial Intelligence, p.1137–1143.
Kohavi R, John GH, 1997. Wrappers for feature subset selection. Artif Intell, 97(1):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Li W, Henry S, 1993. Maintenance metrics for the objectoriented paradigm. Proc 1st Int Software Metrics Symp, p.52–60. https://doi.org/10.1109/METRIC.1993.263801
Lorenz M, Kidd J, 1994. Object–Oriented Software Metrics. Prentice–Hall, Englewood
Cliffs, NJ. Malhotra R, Jain A, 2012. Fault prediction using statistical and machine learning methods for improving software quality. J Inform Process Syst, 8(2):241–262. https://doi.org/10.3745/JIPS.2012.8.2.241
Malhotra R, Singh Y, 2011. On the applicability of machine learning techniques for object–oriented software fault prediction. Softw Eng Int J, 1(1):24–37.
McCabe TJ, 1976. A complexity measure. IEEE Trans Softw Eng, 2(4):308–320. https://doi.org/10.1109/TSE.1976.233837
Mende T, Koschke R, 2009. Revisiting the evaluation of defect prediction models. Proc 5th Int Conf on Predictor Models in Software Engineering, p.1–10. https://doi.org/10.1145/1540438.1540448
Mende T, Koschke R, 2010. Effort–aware defect prediction models. 14th European Conf on Software Maintenance and Reengineering, p.107–116. https://doi.org/10.1109/CSMR.2010.18
Mishra B, Shukla KK, 2012. Defect prediction for object oriented software using support vector based fuzzy classification model. Int J Comput Appl, 60(15):8–16. https://doi.org/10.5120/9766-3114
Nagappan N, Williams L, Vouk M, et al., 2005. Early estimation of software quality using in–process testing metrics: a controlled case study. ACM SIGSOFT Softw Eng Notes, 30(4):1–7. https://doi.org/10.1145/1082983.1083304
Novakovic J, 2010. The impact of feature selection on the accuracy of Naive Bayes classifier. 18th Telecommunications Forum TELFOR, p.1113–1116.
Olague HM, Etzkorn LH, Gholston S, et al., 2007. Empirical validation of three software metrics suites to predict fault–proneness of object–oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng, 33(6):402–419. https://doi.org/10.1109/TSE.2007.1015
Pai GJ, Dugan JB, 2007. Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng, 33(10):675–686. https://doi.org/10.1109/TSE.2007.70722
Pawlak Z, 1982. Rough sets. Int J Comput Inform Sci, 11(5):341–356.
Plackett RL, 1983. Karl Pearson and the Chi–squared test. Int Statist Rev, 51(1):59–72. https://doi.org/10.2307/1402731
Shatnawi R, Li W, 2008. The effectiveness of software metrics in identifying error–prone classes in post–release software evolution process. J Syst Softw, 81(11):1868–1882.
Singh Y, Kaur A, Malhotra R, 2010. Empirical validation of object–oriented metrics for predicting fault proneness models. Softw Qual J, 18(1):3–35. https://doi.org/10.1007/s11219-009-9079-6
Slowinski R, 1992. Intelligent decision support. In: Handbook of Applications and Advances of the Rough Sets Theory. Kluwer Academic Publishers, Dordrecht, p.396. https://doi.org/10.1016/0165-0114(93)90040-O
Tomaszewski P, Håkansson J, Grahn H, et al., 2007. Statistical models vs. expert estimation for fault prediction in modified code—an industrial case study. J Syst Softw, 80(8):1227–1238. https://doi.org/10.1016/j.jss.2006.12.548
Wagner S, 2006. A literature survey of the quality economics of defect–detection techniques. Proc ACM/IEEE Int Symp on Empirical Software Engineering, p.194–203. https://doi.org/10.1145/1159733.1159763
Wang D, Romagnoli JA, 2005. Robust multi–scale principal components analysis with applications to process monitoring. J Process Contr, 15(8):869–882. https://doi.org/10.1016/j.jprocont.2005.04.001
Wang T, Zhang Z, Jing X, et al., 2016. Multiple kernel ensemble learning for software defect prediction. Autom Softw Eng, 23(4):569–590. https://doi.org/10.1007/s10515-015-0179-1
Zhou Y, Leung H, 2006. Empirical analysis of objectoriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng, 32(10):771–789. https://doi.org/10.1109/TSE.2006.102
Zhou Y, Xu B, Leung H, 2010. On the ability of complexity metrics to predict fault–prone classes in object–oriented systems. J Syst Softw, 83(4):660–674. https://doi.org/10.1016/j.jss.2009.11.704
Acknowledgements
The researchers are grateful to the FIST project, of DST, government of India for sponsoring the work on web engineering and cloud based computing. The researchers are thankful to the Computer Science & Engineering Department, NIT Rourkela, for providing all facilities and guidance.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kumar, L., Tirkey, A. & Rath, SK. An effective fault prediction model developed using an extreme learning machine with various kernel methods. Frontiers Inf Technol Electronic Eng 19, 864–888 (2018). https://doi.org/10.1631/FITEE.1601501
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1601501