Skip to main content

A Study on Software Effort Prediction Using Machine Learning Techniques

  • Conference paper
Evaluation of Novel Approaches to Software Engineering (ENASE 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 275))

Abstract

This paper conducts a study on of software effort prediction using machine learning techniques. Both supervised and unsupervised learning techniques are employed to predict software effort using historical dataset. The unsupervised learning as k-medoids clustering equipped with different similarity measures is used to cluster projects in historical dataset. The supervised learning as J48 decision tree, back propagation neural network (BPNN) and na\(\ddot{i}\)ve Bayes is used to classify the software projects into different effort classes. We also impute the missing values in the historical datasets and then machine learning techniques are adopted to predict software effort. Experiments on ISBSG and CSBSG datasets demonstrate that unsupervised learning as k-medoids clustering produced a poor performance. Kulzinsky coefficient has the best performance in measuring the similarities of projects. Supervised learning techniques produced superior performances than unsupervised learning techniques in software effort prediction. BPNN produced the best performance among the three supervised learning techniques. Missing data imputation improved the performances of both unsupervised and supervised learning techniques in software effort prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boehm, B., Abts, C., Brown, A., Chulani, S., Clark, B., Horowitz, E.: Software Cost Estimation with COCOMO II. Prentice Hall, New Jersey (2001)

    Google Scholar 

  2. Pendharkar, P., Subramanian, G., Roger, J.: A Probabilistic Model for Predicting Software Development Effort. IEEE Transactions on Software Engineering 31(7), 615–624 (2005)

    Article  Google Scholar 

  3. Jorgensen, M.: A Review of Studies on Expert Estimation of Software Development Effort. Journal of Systems and Software 70, 37–60 (2004)

    Article  Google Scholar 

  4. Fairley, R.: Recent Advances in Software Estimation Techniques. In: Proceedings of International Conference on Software Engineering, pp. 382–391 (1992)

    Google Scholar 

  5. Yang, Y., Wang, Q., Li, M.: Process Trustworthiness as a Capability Indicator for Measuring and Improving Software Trustworthiness. In: Wang, Q., Garousi, V., Madachy, R., Pfahl, D. (eds.) ICSP 2009. LNCS, vol. 5543, pp. 389–401. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Korte, M., Port, D.: Confidence in Software Cost Estimation Results based on MMRE and PRED. In: Proceedings of PROMISE 2008, pp. 63–70 (2008)

    Google Scholar 

  7. He, M., Li, M., Wang, Q., Yang, Y., Ye, K.: An Investigation of Software Development Productivity in China. In: Wang, Q., Pfahl, D., Raffo, D.M. (eds.) ICSP 2008. LNCS, vol. 5007, pp. 381–394. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Krupka, E., Tishby, N.: Generalization from Observed to Unoberserved Features by Clustering. Journal of Machine Learning Research 83, 339–370 (2008)

    MathSciNet  Google Scholar 

  9. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Elsevier (2006)

    Google Scholar 

  10. Gan, G., Ma, C., Wu, J.: Data Clustering, Theory, Algorithmsm, and Applications. In: ASA-SIAM Series on Statistical and Applied Probability, pp. 78–78 (2008)

    Google Scholar 

  11. Song, Q., Shepperd, M.: A new imputation method for small software project data sets. Journal of Systems and Software 80, 51–62 (2007)

    Article  Google Scholar 

  12. Zhou, Z., Tang, W.: Clusterer ensemble. Knowledge-Based Systems 19, 77–83 (2006)

    Article  Google Scholar 

  13. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of KDD-2000 Workshop on Text Mining, pp. 109–119 (2000)

    Google Scholar 

  14. Quinlan, J.: Programs for Machine Learning, 2nd edn. Morgan Kaufmann Publishers (1993)

    Google Scholar 

  15. Rumelhart, D., Hinton, G., Williams, J.: Learning internal representations by error propagation. In: Proceedings of Parallel Distributed Processing, Exploitations in the Microstructure of Cognition, pp. 318–362 (1986)

    Google Scholar 

  16. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons (2003)

    Google Scholar 

  17. Finnie, G., Wittig, G.: A Comparison of Software Effort Estimation Techniques: Using Function Points with Neural Networks, Case-Based Reasoning and Regression Models. Journal of Systems and Software 39, 281–289 (1997)

    Article  Google Scholar 

  18. Park, H., Baek, S.: An empirical validation of a neural network model for software effort estimation. Expert System with Applications 35, 929–937 (2008)

    Article  Google Scholar 

  19. Srinivasan, K., Fisher, D.: Machine Learning Approaches to Estimating Software Development Effort. IEEE Transactions on Software Engineering 21(2), 126–137 (1995)

    Article  Google Scholar 

  20. Shukla, K.: Neuro-genetic prediction of software development effort. Information and Software Technology 42, 701–713 (2000)

    Article  Google Scholar 

  21. Boehm, B.: Software Engineering Economics. Prentice Hall, New Jersey (1981)

    MATH  Google Scholar 

  22. Prietula, M., Vicinanza, S., Mukhopadhyay, T.: Software-effort estimation with a case-based resoner. Journal of Experimental & Theoritical Artificial Intelligence 8, 341–363 (1996)

    Article  Google Scholar 

  23. Jorgensen, M., Shepperd, M.: A Systematic Review of Software Development Cost Estimation Studies. IEEE Transactions on Software Engineering 33(1), 33–53 (2007)

    Article  Google Scholar 

  24. Zhang, W., Yang, Y., Wang, Q.: Handling missing data in software effort prediction with naive Bayes and EM algorithm. In: Proceedings of International Conference on Predictive Models in Software Engineering, vol. 4 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, W., Yang, Y., Wang, Q. (2013). A Study on Software Effort Prediction Using Machine Learning Techniques. In: Maciaszek, L.A., Zhang, K. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2011. Communications in Computer and Information Science, vol 275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32341-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32341-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32340-9

  • Online ISBN: 978-3-642-32341-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics