Skip to main content
Log in

Three empirical studies on predicting software maintainability using ensemble methods

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

More accurate prediction of software maintenance effort contributes to better management and control of software maintenance. Several research studies have recently investigated the use of computational intelligence models for software maintainability prediction. The performance of these models, however, may vary from dataset to dataset. Consequently, ensemble methods have become increasingly popular as they take advantage of the capabilities of their constituent computational intelligence models toward a dataset to come up with more accurate or at least competitive prediction accuracy compared to individual models. This paper investigates and empirically evaluates different homogenous and heterogeneous ensemble methods in predicting software maintenance effort and change proneness. Three major empirical studies were designed and conducted taken into consideration different design such as the types of the investigated ensembles methods, types of prediction problems, used datasets, and other experimental setup. Overall empirical evidence obtained from the three studies confirms that some ensemble methods provide more accurate or at least competitive prediction accuracy compared to individual models across datasets, and thus they are more reliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Ahmed M, Al-Jamimi H (2013) Machine learning approaches for predicting software maintainability: a fuzzy-based transparent model. IET Softw 7(6):317–326

    Article  Google Scholar 

  • Al-Dallal J (2013) Object-oriented class maintainability prediction using internal quality attributes. Inf Softw Technol 55:2028–2048

    Article  Google Scholar 

  • Aljamaan H, Elish M (2009) An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software. In: IEEE symposium on computational intelligence and data mining, pp 187–194

  • Aljamaan H, Elish M, Ahmad I (2013) An ensemble of computational intelligence models for software maintenance effort prediction. In: 12th International work conference on artificial neural networks (IWANN 2013), part I, LNCS 7902, pp 592–603

  • Bandi R, Vaishnavi V, Turk D (2003) Predicting maintenance performance using object-oriented design complexity metrics. IEEE Trans Softw Eng 29(1):77–87

    Article  Google Scholar 

  • Banfield R, Hall L, Bowyer K, Kegelmeyer W (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180

    Article  Google Scholar 

  • Bittencourt V, Abreu M, Souto M, Canuto A (2005) An empirical comparison of individual machine learning techniques and ensemble approaches in protein structural class prediction. In: International joint conference on neural networks, pp 527–531

  • Bradley A (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159

    Article  Google Scholar 

  • Braga P, Oliveira A, Ribeiro G, Meira S (2007) Bagging predictors for estimation of software project effort. In: International joint conference on neural networks, pp 1595–1600

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MathSciNet  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Briand L, Bunse C, Daly J (2001) A controlled experiment for evaluating quality guidelines on the maintainability of object-oriented designs. IEEE Trans Softw Eng 27(6):513–530

    Article  Google Scholar 

  • Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Conte S, Dunsmore H, Shen V (1986) Software engineering metrics and models. Benjamin/Cummings, Menlo Park

    Google Scholar 

  • De Lucia A, Pompella E, Stefanucci S (2005) Assessing effort estimation models for corrective maintenance through empirical studies. Inf Softw Technol 47(1):3–15

    Article  Google Scholar 

  • DTREG, Predictive modeling software by Phillip Sherrod. http://www.dtreg.com. Accessed 5 Jan 2014

  • Elish M, Al-Khiaty M (2013) A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J Softw Evol Process 25(5):407–437

    Article  Google Scholar 

  • Elish M, Elish K (2009) Application of TreeNet in predicting object-oriented software maintainability: a comparative study. In: 13th European conference on software maintenance and reengineering (CSMR ’09), pp 69–78

  • Elish M, Helmy T, Hussain M (2013) Empirical study of homogeneous and heterogeneous ensemble models for software development effort estimation. Math Probl Eng 2013:1–21. doi:10.1155/2013/312067

  • Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst 13(2):87–129

    Google Scholar 

  • Fioravanti F, Nesi P (2001) Estimation and prediction metrics for adaptive maintenance effort of object-oriented systems. IEEE Trans Softw Eng 27(12):1062–1084

    Article  Google Scholar 

  • Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2):256–285

    Article  Google Scholar 

  • Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory, pp 23–37

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Thirteenth international conference on machine learning, Italy, pp 148–156

  • Gutta S, Wechsler H (1996) Face recognition using hybrid classifier systems. In: IEEE international conference on neural networks, pp 1017–1022

  • Hansen L, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001

    Article  Google Scholar 

  • Hartigan J, Wong M (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108

    Google Scholar 

  • Hashem S, Schmeiser B, Yih Y (1994) Optimal linear combinations of neural networks. Neural Netw 3:1507–1512

    Google Scholar 

  • Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, New Jersey

    Google Scholar 

  • Huang FJ, Zhou Z, Zhang H-J, Chen T (2000) Pose invariant face recognition. In: Proceedings of the 4th IEEE international conference on automatic face and gesture recognition, France, pp 245–250

  • Khoshgoftaar T, Geleyn E, Nguyen L (2003) Empirical case studies of combining software quality classification models. In: Third international conference on quality software, p 40

  • Kiran N, Ravi V (2008) Software reliability prediction by soft computing techniques. J Syst Softw 81(4):576–583

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI), pp 1137–1143

  • Koten C, Gray A (2006) An application of Bayesian network for predicting object-oriented software maintainability. Inf Softw Technol 48(1):59–67

    Article  Google Scholar 

  • Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. Adv Neural Inf Process Syst 7:231–238

    Google Scholar 

  • Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111–122

    Article  Google Scholar 

  • Mao J (1998) A case study on bagging, boosting and basic ensembles of neural networks for OCR. In: Proceedings of IEEE international joint conference on neural networks, pp 1828–1833

  • Misra S (2005) Modeling design/coding factors that drive maintainability of software systems. Softw Qual Control 13(3):297–320

    Article  Google Scholar 

  • Opitz D, Shavlik J (1996) Actively searching for an effective neural-network ensemble. Connect Sci 8(3/4):337–353

    Article  Google Scholar 

  • Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neural-network ensemble. Adv Neural Inf Process Syst 8:535–541

    Google Scholar 

  • Optiz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198

    Google Scholar 

  • Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497

    Article  Google Scholar 

  • Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  • Quinlan R (1992) Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence, Singapore, pp 343–348

  • Shevade S, Keerthi S, Bhattacharyya C, Murthy K (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193

    Article  Google Scholar 

  • Shimshoni Y, Intrator N (1998) Classification of seismic signals by integrating ensembles of neural networks. IEEE Trans Signal Process 46(5):1194–1201

    Article  Google Scholar 

  • Sollich P (1996) Learning with ensembles: how over-fitting can be useful. Adv Neural Inf Process Syst 8:190–196

    Google Scholar 

  • Thwin M, Quah T (2005) Application of neural networks for software quality prediction using object-oriented metrics. J Syst Softw 76(2):147–156

    Article  Google Scholar 

  • Vapnik V (1995) The nature of statistical learning theory. Springer, New York

  • Wang Y, Witten IH (1997) Induction of model trees for predicting continuous classes. In: Poster papers of the 9th European conference on machine learning

  • Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Zhang C, Zhang J, Zhang G (2008) An efficient modified boosting method for solving classification problems. J Comput Appl Math 214:381–392

    Article  MathSciNet  Google Scholar 

  • Zheng J (2009) Predicting software reliability with neural network ensembles. Expert Syst App 36(2):2116–2122

    Article  Google Scholar 

  • Zhou Y, Leung H (2007) Predicting object-oriented software maintainability using multivariate adaptive regression splines. J Syst Softw 80(8):1349–1361

    Article  Google Scholar 

Download references

Acknowledgments

The authors wish to acknowledge King Fahd University of Petroleum and Minerals (KFUPM) for utilizing the various facilities in carrying out this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahmoud O. Elish.

Additional information

Communicated by I. R. Ruiz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elish, M.O., Aljamaan, H. & Ahmad, I. Three empirical studies on predicting software maintainability using ensemble methods. Soft Comput 19, 2511–2524 (2015). https://doi.org/10.1007/s00500-014-1576-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1576-2

Keywords

Navigation