A decision tree logic based recommendation system to select software fault prediction techniques

Rathore, Santosh S.; Kumar, Sandeep

doi:10.1007/s00607-016-0489-6

A decision tree logic based recommendation system to select software fault prediction techniques

Published: 21 March 2016

Volume 99, pages 255–285, (2017)
Cite this article

Computing Aims and scope Submit manuscript

Santosh S. Rathore¹ &
Sandeep Kumar¹

1572 Accesses
49 Citations
Explore all metrics

Abstract

Identifying a reliable fault prediction technique is the key requirement for building effective fault prediction model. It has been found that the performance of fault prediction techniques is highly dependent on the characteristics of the fault dataset. To mitigate this issue, researchers have evaluated and compared a plethora of fault prediction techniques by varying the context in terms of domain information, characteristics of input data, complexity, etc. However, the lack of an accepted benchmark makes it difficult to select fault prediction technique for a particular context of prediction. In this paper, we present a recommendation system that facilitates the selection of appropriate technique(s) to build fault prediction model. First, we have reviewed the literature to elicit the various characteristics of the fault dataset and the appropriateness of the machine learning and statistical techniques for the identified characteristics. Subsequently, we have formalized our findings and built a recommendation system that helps in the selection of fault prediction techniques. We performed an initial appraisal of our presented system and found that proposed recommendation system provides useful hints in the selection of the fault prediction techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exploratory analysis of regression methods for predicting faults in software systems

Article 03 September 2021

Data quality issues in software fault prediction: a systematic literature review

Article 21 December 2022

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

Article 09 February 2023

References

Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures. CRC Press, Boca Raton
Zimmermann T, Nagappan N, Zeller A (2008) Predicting bugs from history. Softw Evol J. Springer, Berlin, pp 69-88
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng J
Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675-689
Article Google Scholar
Briand LC, Daly JW, Wust J (1998) A unified framework for cohesion measurement in object-oriented systems. Empir Softw Eng 3(1):65-117
Article Google Scholar
Alshayeb M, Li W (2003) An empirical validation of object-oriented metrics in two different iterative software processes. IEEE Trans Softw Eng 29(11):1043-1049
Article Google Scholar
Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111-122
Article Google Scholar
Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction based on support vector machine. In: Proceeding of 16th IEEE international symposium on software reliability engineering, pp 10-19
Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: Proceedings of 8th international symposium on software reliability engineering, pp 27-35
Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: Proceeding of 15th international symposium on software reliability engineering, pp 417-428
Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci J 179(8):1040-1058
Article Google Scholar
Challagulla UV, Bastani FB, Yen IL (2006) A unified framework for defect data analysis using the mbr technique. In: Proceeding of 18th IEEE international conference on tools with artificial intelligence, pp 39-46
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561-595
Article Google Scholar
Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):1806-1817
Article Google Scholar
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485-496
Article Google Scholar
Vandecruys O, Martens D, Baesens B, Mues C, De Backer M, Haesen R (2008) Mining software repositories for comprehensible software fault prediction models. J Syst Softw 81(5):823-839
Article Google Scholar
Dejaeger K, Verbraken T, Baesens B (2013) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Softw Eng 39(2):237-257
Article Google Scholar
Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483-492
Article Google Scholar
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the ESEC and FSE, pp 91-100
Pickard L, Kitchenham B, Linkman S (1999) An investigation of analysis techniques for software datasets. In: Proceedings of 6th international software metrics symposium, pp 130-142
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Martinez J, Fuentes O (2005) Using c4.5 as variable selection criterion in classification tasks. In: Proceedings of the 9th international conference on artificial intelligence and soft computings. Benidrom, Spain
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of emerging artificial intelligence applications in computer engineering, pp 3-24
Fitzpatrick JM, Grefenstette JJ (1988) Genetic algorithms in noisy environments. Mach Learn 3(2-3):101-120
Google Scholar
Rokach L (2005) Ensemble methods for classifiers. In: Data mining and knowledge discovery handbook. Springer, Berlin, pp 957-980
Xuan L, Zhigang C, Fan Y (2013) Exploring of clustering algorithm on class-imbalanced data. In: Proceeding of 8th international conference on computer science and education. IEEE, New York, pp 89-93
Manago M, Kodratoff Y (1987) Noise and knowledge acquisition. In: IJCAI, pp 348-354
Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579-606
Rodriguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J, Garre M (2007) Attribute selection in software engineering datasets for detecting fault modules. In: Proceedings of 33rd EUROMICRO conference on software engineering and advanced applications, pp 418-423
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653-661
Article Google Scholar
Charu C (2013) Aggarwal. Outlier analysis. Springer Science and Business Media, Berlin
Moreno-Torres JG, Raeder T, Alaiz-Rodriguez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recognit 45(1):521-530
Article Google Scholar
Calikli G, Bener A (2013) An algorithmic approach to missing data problem in modeling human aspects in software development. In: Proceedings of 9th international conference on predictive models in software engineering. ACM, New York
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceeding of international conference on software engineering
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 33rd international conference on software engineering, pp 481-490
Grbac T, Mausa G, Basic BD (2013) Stability of software defect prediction in relation to levels of data imbalance. In: SQAMIA, pp 1-10
Vu B, Challagulla FB, Bastani IL, Paul RA (2008) Empirical assessment of machine learning based software defect prediction techniques. Int J Artif Intell Tools 17(02):389-400
Succi G, Pedrycz W, Djokic S, Zuliani P, Russo B (2005) An empirical exploration of the distributions of the chidamber and kemerer object-oriented metrics suite. Empir Softw Eng 10(1):81-104
Article Google Scholar
Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8(1):87-102
MATH Google Scholar
Murphey YL, Guo H, Feldkamp LA (2004) Neural learning from unbalanced data. Appl Intell 21(2):117-128
Article MATH Google Scholar
Smith MR, Martinez T (2011) Improving classification accuracy by identifying and removing instances that should be misclassified. In: Proceeding of 2011 international joint conference on neural networks, pp 2690-2697
Sharpe PK, Solly RJ (1995) Dealing with missing values in neural network-based diagnostic systems. Neural Comput Appl 3(2):73-77
Article Google Scholar
Venkatesh S, Gopal S (2011) Robust heteroscedastic probabilistic neural network for multiple source partial discharge pattern recognition-significance of outliers on classification capability. Exp Syst Appl 38(9):11501-11514
Article Google Scholar
Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley, New York
Allison PD (2001) Missing data, vol 136. Sage Publications, Chennai
Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis
Afzal W, Torkar R, Feldt R (2008) Prediction of fault count data using genetic programming. In: Proceeding of international multitopic conference, pp 349-356
Fonseca CM, Fleming PJ (1993) Multiobjective genetic algorithms. In: IEE colloquium on genetic algorithms for control systems engineering. IET, Thiruvananthapuram, pp 1-6
Li F, Li H (2012) Svm classification for large data sets by support vector estimating and selecting. In: Recent advances in computer science and information engineering. Springer, Berlin, pp 775-781
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83-85
Google Scholar
Debruyne M (2009) An outlier map for support vector machine classification. Ann Appl Stat 1566-1580
Khoshgoftaar TM, Seliya N (2003) Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir Softw Eng 8(3):255-283
Article Google Scholar
Mauvsa G, Grbac TG, Bavsic BD (2012) Multivariate logistic regression prediction of fault-proneness in software modules. In: Proceedings of the 35th international convention, pp 698-703
Ratanamahatana CA, Gunopulos D (2002) Scaling up the naive bayesian classifier: using decision trees for feature selection
Briand L, Devanbu P, Melo W (1997) An investigation into coupling measures for c++. In: Proceedings of 19th international conference on software engineering, pp 412-421
Ghimire B, Rogan J, Galiano VR, Panday P, Neeti N (2012) An evaluation of bagging, boosting, and random forests for land-cover classification in cape cod, massachusetts, usa. GISci Remote Sens 49(5):623-643
Article Google Scholar
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th international conference on very large data bases, VLDB ’00, pp 506-515
Jiamthapthaksin R, Eick CF, Vilalta R (2009) A framework for multi-objective clustering and its application to colocation mining. In: Advanced data mining and applications. Springer, Berlin, pp 188-199
Acuna E, Rodriguez C (2004) The treatment of missing values and its effect on classifier accuracy. In: Classification, clustering, and data mining applications. Springer, Berlin, pp 639-647
Amatriain X, Jaimes A, Oliver N, Pujol JM (2011) Data mining methods for recommender systems. In: Recommender systems handbook. Springer, Berlin, pp 39-71
Ma Y, Guo L, Cukic B (2006) A statistical framework for the prediction of fault-proneness. In: Advances in machine learning application in software engineering. Idea Group Inc, Calgary, pp 237-265
Karimi K, Hamilton HJ (2002) Timesleuth: a tool for discovering causal and temporal rules. In: Proceedings of 14th IEEE international conference on tools with artificial intelligence. IEEE, New York, pp 375-380
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491-502
Article MathSciNet Google Scholar
Law HCM, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 2, pp II-424
Mitchell TM (1997) Machine learning, vol 1. McGraw-Hill, USA
MATH Google Scholar
Owoc ML, Galant V (1999) Validation of rule-based systems generated by classification algorithms. In: Evolution and challenges in system development. Springer, Berlin, pp 459-467
Khoshgoftaar TM, Seliya N (2002) Tree-based software quality estimation models for fault prediction. In: Proceedings of the eighth IEEE symposium on software metrics. IEEE, New York, pp 203-214
Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: Proceeding of 18th IEEE international symposium on software reliability, pp 215-224
Elish OK, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649-660
Article Google Scholar
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276-1304
Article Google Scholar
Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. PhD thesis, Queens University
Nam J (2014) Survey on software defect prediction. PhD Thesis, Hong Kong University of Science and Technology

Download references

Acknowledgments

The authors would like to thank the editor of the journal and the anonymous reviewers for their valuable comments, guidance, and suggestions that have really improved the quality of the paper and have led to the paper in its current form. Further, we would like to thank the Ministry of Human Resource Development (MHRD), India for providing institute assistantship.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, India
Santosh S. Rathore & Sandeep Kumar

Authors

Santosh S. Rathore
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandeep Kumar.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 66 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rathore, S.S., Kumar, S. A decision tree logic based recommendation system to select software fault prediction techniques. Computing 99, 255–285 (2017). https://doi.org/10.1007/s00607-016-0489-6

Download citation

Received: 22 January 2015
Accepted: 24 February 2016
Published: 21 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00607-016-0489-6

Keywords

Mathematics Subject Classification

68N30 Mathematical aspects of software engineering (specification, verification, metrics, requirements, etc.)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A decision tree logic based recommendation system to select software fault prediction techniques

Abstract

Access this article

Similar content being viewed by others

An exploratory analysis of regression methods for predicting faults in software systems

Data quality issues in software fault prediction: a systematic literature review

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 66 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A decision tree logic based recommendation system to select software fault prediction techniques

Abstract

Access this article

Similar content being viewed by others

An exploratory analysis of regression methods for predicting faults in software systems

Data quality issues in software fault prediction: a systematic literature review

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 66 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation