Skip to main content

Toward Accurate Software Effort Prediction Using Multiple Classifier Systems

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 617))

Abstract

Averaging is a standard technique in applied machine learning for combining multiple classifiers to achieve greater accuracy. Such accuracy could be useful in software effort estimation which is an important part of software process management. To investigate the use of ensemble multiple classifiers learning in terms of predicting software effort. The use of ensemble multiple classier combination is demonstrated and evaluated against individual classifiers using 10 industrial datasets in terms of the smoothed error rate. Experimental results show that multiple classifier combination can improve software effort prediction with boosting, bagging and feature selection achieving higher accuracy rates. Accordingly, good performance is consistently derived from static parallel systems while dynamic classifier selection systems exhibit poor accuracy rates. Most of the base classifiers are highly competitive with each other. The success of each method appears to depend on the underlying characteristics of each of the ten industrial datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aha, D.W., Kibbler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(37), 37–66 (1991)

    Google Scholar 

  2. Basha, S., Dhavechelvan, P.: Analyisis of empirical software effort estimation models. Int. J. Comput. Sci. Inf. Secur. 7(3), 68–77 (2010)

    Google Scholar 

  3. Braga, P.L., Oliveira, A., Ribeiro, G., Meira, S.: Bagging predictors for estimation of software project effort. In: International Joint Conference on Neural networks, Orlando, pp. 1595–1600 (2007)

    Google Scholar 

  4. Breiman, L.: Bagging predictors. Mach. Learn. 26(2), 123–140 (1996)

    MATH  Google Scholar 

  5. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees. Wadsworth (1984)

    Google Scholar 

  6. Briand, L.C., Wieczorek, I.: Resource estimation in software engineering. In: Marcinak, J.J. (ed.) Encyclopedia of Software Engineering, pp. 1160–1196. Wiley, New York (2002)

    Google Scholar 

  7. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J Artif. Intell. Res. 11, 131–167 (1999)

    Google Scholar 

  8. Corona, I., Giacinto, G., Roli, F.: Intrusion detection in computer systems using multiple classifier systems. In: Okun, O., Valentini, G. (eds.) Supervised and Unsupervised Ensemble Methods and Their Applications, vol 126, pp. 91–114. Springer, Berlin (2008)

    Google Scholar 

  9. Cox, D.R.: Some procedures associated with the logistic qualitative response curve. In: David, F.N. (ed.) Research Papers in Statistics: Festschrift for J. Neyman, pp. 55–71. Wiley, New York (1966)

    Google Scholar 

  10. Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–158 (2000)

    Article  Google Scholar 

  11. Duda, R.O., Hart, P.E.: Pattern Classification, 2nd edn. Wiley, New York (1973)

    MATH  Google Scholar 

  12. Finlay, S.M.: Multiple classifier architectures and their application to credit risk assessment. Working Paper 2008/012, Department of Management Science, Lancaster University, UK (2008)

    Google Scholar 

  13. Freund, Y., Schapire, R.: A decision theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. 55, 119–139 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  14. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc 32(200), 675–701 (1937)

    Article  MATH  Google Scholar 

  15. Ho, T.K.: Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278–282 (1995)

    Google Scholar 

  16. Hosmer, D.W., Lameshow, S.: Applied Logistic Regression. Wiley, New York (1989)

    Google Scholar 

  17. Jolliffe, I.: Principal Component Analysis. Springer, Berlin (1986)

    Google Scholar 

  18. Jørgensen, M.: A review of studies on expert estimation of software development effort. J. Syst. Softw. 70(1–2), 37–60 (2004)

    Article  Google Scholar 

  19. Khoshgoftaar, T.M., Xiao, Y., Gao, K.: Software quality assessment using a multi-strategy classifier. Inf Sci (2010, in press)

    Google Scholar 

  20. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)

    Google Scholar 

  21. Kocaguneli, E., Bener, A., Kultur, Y.: Combining multiple learners induced on multiple datasets on software effort prediction. In: International Symposium on Software Reliability Engineering, Mysuri, India, p. 6 (2009)

    Google Scholar 

  22. Kocaguneli, E., Menzies, T., Keung, J.: On value of ensemble effort estimation. IEEE Trans. Softw. Eng. 38(06), 1403–1416 (2012)

    Article  Google Scholar 

  23. Kultur, Y., Turhan, B., Bener, A.: Ensemble of neural networks with associative memery (ENNA) for estimating software development costs. Knowl. Based Syst. 22, 395–402 (2009)

    Article  Google Scholar 

  24. Kuncheva, L.I.: Swithcing between selection and fusion in combining classifers: an experiment. IEEE Trans. Syst. Man Cybern. Part B Cybern. 32(2), 146–156 (2002)

    Article  Google Scholar 

  25. Kuncheva, L.: A theoretical study in six classifier fusion strategies. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 281–286 (2002)

    Article  Google Scholar 

  26. Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noise and supervised learning in medical domains: the effect of feature extraction. In: Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems, pp. 708–713 (2006)

    Google Scholar 

  27. Quinlan, J.R.: C4.5: Programs for Machine Learning. Los Altos, California. Morgan Kauffman Publishers INC, Burlington (1993)

    Google Scholar 

  28. Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge and Wiley, New York (1992)

    Google Scholar 

  29. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)

    Google Scholar 

  30. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21, 660–674 (1991)

    Article  MathSciNet  Google Scholar 

  31. Sayyad, J.S., Menzies, T.J.: The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada (2005). http://promise.site.uottawa.ca/SERepository. Accessed on 01 Dec 2014

  32. Schapire, R., Freund, Y., Bartlett, P., Lee, W.: Boosting the margin: a new explanation for the effectiveness of voting methods. In: Proceedings of International Conference on Machine Learning, Morgan Kaufmann, San Francisco pp. 322–330 (1997)

    Google Scholar 

  33. Twala, B.: multiple classifier application to credit risk assessment. Expert Syst. Appl. 37(4), 3236–3336 (2010)

    Article  Google Scholar 

  34. Twala, B.: Effective techniques for dealing with incomplete data using decision trees. Published PhD thesis, Open University, Milton Keynes, UK (2005)

    Google Scholar 

  35. Twala, B.: software faults prediction using multiple classifiers. In: IEEE International Conference on Computer Research and Development (ICCRD2011), Shanghai, China, 11–13 Mar 2011

    Google Scholar 

  36. Twala, B., Cartwright, M.: Ensemble missing data methods in software effort prediction. Intell. Data Anal. 14, 299–331 (2010)

    Google Scholar 

  37. Venables, W., Ripley, B.: Modern Applied Statistics with S-Plus. Springer, Berlin (1997)

    Google Scholar 

  38. Wettschereck, D.: A hybrid nearest neighbour and nearest hyperrectangle algorithm. In: Bergadano, F., Raedt, L.D. (eds.) Proceedings of European Conference on Machine Learning, pp 323–335 (1994)

    Google Scholar 

  39. Witten, I., Frank, E.: Data Mining Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kauffman, Burlington (2005)

    Google Scholar 

  40. Wolpert, D.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  MathSciNet  Google Scholar 

  41. Zhang, D., Tsai, J.J.P.: Advances in Machine Learning Applications in Software Engineering (2007)

    Google Scholar 

  42. Zhu, H., Beling, P.A., Overstreet, G.A.: A study in the combination of two consumer credit scores. J. Oper. Res. Soc. 52, 2543–2559 (2001)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The project was sponsored by the Department of Electrical and Electronic Engineering Science at the University Of Johannesburg, South Africa. The authors would like to thank their colleagues for their valuable comments and suggestions to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhekisipho Twala .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Twala, B., Verner, J. (2016). Toward Accurate Software Effort Prediction Using Multiple Classifier Systems. In: Pedrycz, W., Succi, G., Sillitti, A. (eds) Computational Intelligence and Quantitative Software Engineering. Studies in Computational Intelligence, vol 617. Springer, Cham. https://doi.org/10.1007/978-3-319-25964-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25964-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25962-8

  • Online ISBN: 978-3-319-25964-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics