A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

Kalsoom, Anum; Maqsood, Muazzam; Ghazanfar, Mustansar Ali; Aadil, Farhan; Rho, Seungmin

doi:10.1007/s11227-018-2326-5

A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

Published: 20 March 2018

Volume 74, pages 4568–4602, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Anum Kalsoom¹,
Muazzam Maqsood^1,2,
Mustansar Ali Ghazanfar²,
Farhan Aadil^1,2 &
…
Seungmin Rho ORCID: orcid.org/0000-0003-1936-6785³

959 Accesses
Explore all metrics

Abstract

Software quality is an important factor in the success of software companies. Traditional software quality assurance techniques face some serious limitations especially in terms of time and budget. This leads to increase in the use of machine learning classification techniques to predict software faults. Software fault prediction can help developers to uncover software problems in early stages of software life cycle. The extent to which these techniques can be generalized to different sizes of software, class imbalance problem, and identification of discriminative software metrics are the most critical challenges. In this paper, we have analyzed the performance of nine widely used machine learning classifiers—Bayes Net, NB, artificial neural network, support vector machines, K nearest neighbors, AdaBoost, Bagging, Zero R, and Random Forest for software fault prediction. Two standard sampling techniques—SMOTE and Resample with substitution are used to handle the class imbalance problem. We further used FLDA-based feature selection approach in combination with SMOTE and Resample to select most discriminative metrics. Then the top four classifiers based on performance are used for software fault prediction. The experimentation is carried out over 15 publically available datasets (small, medium and large) which are collected from PROMISE repository. The proposed Resample-FLDA method gives better performance as compared to existing methods in terms of precision, recall, f-measure and area under the curve.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Fault Prediction Using Machine Learning Algorithms

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

Article 09 February 2023

Effect of Feature Selection on Software Fault Prediction

References

Andr B et al (2010) A symbolic fault-prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882
Article Google Scholar
Manjula C, Florence L (2018) Deep neural network based hybrid approach for software defect prediction using software metrics. Cluster Comput. https://doi.org/10.1007/s10586-018-1696-z
Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82(Supplement C):357–382
Article Google Scholar
Aljamaan HI, Elish MO (2009) An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software. In: IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE
Chiu K-C, Huang Y-S, Lee T-Z (2008) A study of software reliability growth from the perspective of learning effects. Reliab Eng Syst Saf 93(10):1410–1421
Article Google Scholar
Gao K et al (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606
Article Google Scholar
Gray AR, MacDonell SG (1997) A comparison of techniques for developing predictive models of software metrics. Inf Softw Technol 39(6):425–437
Article Google Scholar
Sharma D, Chandra P (2018) Software fault prediction using machine-learning techniques. In: Smart computing and informatics. Springer, pp 541–549
Muhamad FPB, Siahaan DO, Fatichah C (2018) Software fault prediction using filtering feature selection in cluster-based classification. IPTEK Proc Ser 4(1):59–64
Article Google Scholar
Hall T et al (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Article Google Scholar
Helmer G et al (2007) Software fault tree and coloured Petri net—based specification, design and implementation of agent-based intrusion detection systems. Int J Inf Comput Secur 1(1–2):109–142
Google Scholar
Graves TL et al (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Article Google Scholar
Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proceedings of the 2006 International Symposium on Software Testing and Analysis. ACM
Weyuker EJ, Ostrand TJ, Bell RM (2007) Using developer information as a factor for fault prediction. In: International Workshop on Predictor Models in Software Engineering. PROMISE’07: ICSE Workshops 2007. IEEE
Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. SIGSOFT Softw Eng Notes 29(4):86–96
Article Google Scholar
Janes A et al (2006) Identification of defect-prone classes in telecommunication software systems using design metrics. Inf Sci 176(24):3711–3734
Article Google Scholar
Mısırlı AT, Bener AB, Turhan B (2011) An industrial case study of classifier ensembles for locating software defects. Softw Qual J 19(3):515–536
Article Google Scholar
Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543
Article Google Scholar
Twala B (2011) Software faults prediction using multiple classifiers. In: 2011 3rd International Conference on Computer Research and Development (ICCRD). IEEE
Xu W et al (2011) Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of α-ketoglutarate-dependent dioxygenases. Cancer Cell 19(1):17–30
Article MathSciNet Google Scholar
Khoshgoftaar TM, Geleyn E, Nguyen L (2003) Empirical case studies of combining software quality classification models. In: Third International Conference on Quality Software, 2003. Proceedings. IEEE
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
Article Google Scholar
Kamei Y et al (2007) The effects of over and under sampling on fault-prone module detection. In: First International Symposium on Empirical Software Engineering and Measurement, 2007. ESEM 2007. IEEE
He P et al (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Article Google Scholar
Galar M et al (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484
Article Google Scholar
Rodriguez D et al (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering. ACM, Leipzig, pp 181–190
Alexandre-Cortizo E, Rosa-Zurera M, Lopez-Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: EUROCON 2005-The International Conference on Computer as a Tool. IEEE
Al Hindi A et al (2014) Automatic pronunciation error detection of nonnative Arabic Speech. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA). IEEE
Franco H et al (1999) Automatic detection of phone-level mispronunciation for language learning. In: EUROSPEECH
Strik H et al (2009) Comparing different approaches for automatic pronunciation error detection. Speech Commun 51(10):845–852
Article Google Scholar
Truong K et al (2004) Automatic pronunciation error detection: an acoustic-phonetic approach. In: InSTIL/ICALL Symposium 2004
Ghazanfar MA (2015) Experimenting switching hybrid recommender systems. Intell Data Anal 19(4):845–877
Article Google Scholar
Singh P et al (2017) Fuzzy rule-based approach for software fault prediction. IEEE Trans Syst Man Cybern Syst 47(5):826–837
Article Google Scholar
Malhotra R (2014) Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl Soft Comput 21:286–297
Article Google Scholar
Malhotra R, Pritam N, Singh Y (204) On the applicability of evolutionary computation for software defect prediction. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Article Google Scholar
Malhotra R, Bansal AJ (2015) Fault prediction considering threshold effects of object-oriented metrics. Expert Syst 32(2):203–219
Article Google Scholar
Stanic B, Afzal W (2017) Process metrics are not bad predictors of fault proneness. In: The 2017 IEEE International Workshop on Software Engineering and Knowledge Management SEKM’17, 25 July 2017, Prague, Sweden
Shatnawi R (2017) The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction. Innov Syst Softw Eng 13(2–3):201–217
Article Google Scholar

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1A09919551).

Author information

Authors and Affiliations

Department of Computer Science, COMSATS Institute of Information and Technology Attock, Attock, Pakistan
Anum Kalsoom, Muazzam Maqsood & Farhan Aadil
Department of Software Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
Muazzam Maqsood, Mustansar Ali Ghazanfar & Farhan Aadil
Department of Media Software, Sungkyul University, Anyang, South Korea
Seungmin Rho

Authors

Anum Kalsoom
View author publications
You can also search for this author inPubMed Google Scholar
Muazzam Maqsood
View author publications
You can also search for this author inPubMed Google Scholar
Mustansar Ali Ghazanfar
View author publications
You can also search for this author inPubMed Google Scholar
Farhan Aadil
View author publications
You can also search for this author inPubMed Google Scholar
Seungmin Rho
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Seungmin Rho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalsoom, A., Maqsood, M., Ghazanfar, M.A. et al. A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA). J Supercomput 74, 4568–4602 (2018). https://doi.org/10.1007/s11227-018-2326-5

Download citation

Published: 20 March 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11227-018-2326-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Software Fault Prediction Using Machine Learning Algorithms

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

Effect of Feature Selection on Software Fault Prediction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now