Abstract
Software quality is an important factor in the success of software companies. Traditional software quality assurance techniques face some serious limitations especially in terms of time and budget. This leads to increase in the use of machine learning classification techniques to predict software faults. Software fault prediction can help developers to uncover software problems in early stages of software life cycle. The extent to which these techniques can be generalized to different sizes of software, class imbalance problem, and identification of discriminative software metrics are the most critical challenges. In this paper, we have analyzed the performance of nine widely used machine learning classifiers—Bayes Net, NB, artificial neural network, support vector machines, K nearest neighbors, AdaBoost, Bagging, Zero R, and Random Forest for software fault prediction. Two standard sampling techniques—SMOTE and Resample with substitution are used to handle the class imbalance problem. We further used FLDA-based feature selection approach in combination with SMOTE and Resample to select most discriminative metrics. Then the top four classifiers based on performance are used for software fault prediction. The experimentation is carried out over 15 publically available datasets (small, medium and large) which are collected from PROMISE repository. The proposed Resample-FLDA method gives better performance as compared to existing methods in terms of precision, recall, f-measure and area under the curve.

Similar content being viewed by others
References
Andr B et al (2010) A symbolic fault-prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882
Manjula C, Florence L (2018) Deep neural network based hybrid approach for software defect prediction using software metrics. Cluster Comput. https://doi.org/10.1007/s10586-018-1696-z
Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82(Supplement C):357–382
Aljamaan HI, Elish MO (2009) An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software. In: IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE
Chiu K-C, Huang Y-S, Lee T-Z (2008) A study of software reliability growth from the perspective of learning effects. Reliab Eng Syst Saf 93(10):1410–1421
Gao K et al (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606
Gray AR, MacDonell SG (1997) A comparison of techniques for developing predictive models of software metrics. Inf Softw Technol 39(6):425–437
Sharma D, Chandra P (2018) Software fault prediction using machine-learning techniques. In: Smart computing and informatics. Springer, pp 541–549
Muhamad FPB, Siahaan DO, Fatichah C (2018) Software fault prediction using filtering feature selection in cluster-based classification. IPTEK Proc Ser 4(1):59–64
Hall T et al (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Helmer G et al (2007) Software fault tree and coloured Petri net—based specification, design and implementation of agent-based intrusion detection systems. Int J Inf Comput Secur 1(1–2):109–142
Graves TL et al (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proceedings of the 2006 International Symposium on Software Testing and Analysis. ACM
Weyuker EJ, Ostrand TJ, Bell RM (2007) Using developer information as a factor for fault prediction. In: International Workshop on Predictor Models in Software Engineering. PROMISE’07: ICSE Workshops 2007. IEEE
Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. SIGSOFT Softw Eng Notes 29(4):86–96
Janes A et al (2006) Identification of defect-prone classes in telecommunication software systems using design metrics. Inf Sci 176(24):3711–3734
Mısırlı AT, Bener AB, Turhan B (2011) An industrial case study of classifier ensembles for locating software defects. Softw Qual J 19(3):515–536
Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543
Twala B (2011) Software faults prediction using multiple classifiers. In: 2011 3rd International Conference on Computer Research and Development (ICCRD). IEEE
Xu W et al (2011) Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of α-ketoglutarate-dependent dioxygenases. Cancer Cell 19(1):17–30
Khoshgoftaar TM, Geleyn E, Nguyen L (2003) Empirical case studies of combining software quality classification models. In: Third International Conference on Quality Software, 2003. Proceedings. IEEE
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
Kamei Y et al (2007) The effects of over and under sampling on fault-prone module detection. In: First International Symposium on Empirical Software Engineering and Measurement, 2007. ESEM 2007. IEEE
He P et al (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Galar M et al (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484
Rodriguez D et al (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering. ACM, Leipzig, pp 181–190
Alexandre-Cortizo E, Rosa-Zurera M, Lopez-Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: EUROCON 2005-The International Conference on Computer as a Tool. IEEE
Al Hindi A et al (2014) Automatic pronunciation error detection of nonnative Arabic Speech. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA). IEEE
Franco H et al (1999) Automatic detection of phone-level mispronunciation for language learning. In: EUROSPEECH
Strik H et al (2009) Comparing different approaches for automatic pronunciation error detection. Speech Commun 51(10):845–852
Truong K et al (2004) Automatic pronunciation error detection: an acoustic-phonetic approach. In: InSTIL/ICALL Symposium 2004
Ghazanfar MA (2015) Experimenting switching hybrid recommender systems. Intell Data Anal 19(4):845–877
Singh P et al (2017) Fuzzy rule-based approach for software fault prediction. IEEE Trans Syst Man Cybern Syst 47(5):826–837
Malhotra R (2014) Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl Soft Comput 21:286–297
Malhotra R, Pritam N, Singh Y (204) On the applicability of evolutionary computation for software defect prediction. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Malhotra R, Bansal AJ (2015) Fault prediction considering threshold effects of object-oriented metrics. Expert Syst 32(2):203–219
Stanic B, Afzal W (2017) Process metrics are not bad predictors of fault proneness. In: The 2017 IEEE International Workshop on Software Engineering and Knowledge Management SEKM’17, 25 July 2017, Prague, Sweden
Shatnawi R (2017) The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction. Innov Syst Softw Eng 13(2–3):201–217
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1A09919551).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kalsoom, A., Maqsood, M., Ghazanfar, M.A. et al. A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA). J Supercomput 74, 4568–4602 (2018). https://doi.org/10.1007/s11227-018-2326-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2326-5