Skip to main content

Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges

  • Conference paper
  • First Online:
Emerging Trends in Intelligent Computing and Informatics (IRICT 2019)

Abstract

Software effort estimation is one the critical aspects of software engineering. It revolves around predicting the required efforts needed to complete a software task. However, any estimation technique or model relies on an input data in which it defines and predicts future values. Missing data and values within such data is a common occurrence in the software development industry and thus it leads to inaccurate predictions or misleading results. Thus, Missing Data is an important aspect of effort estimation models that is required to be addressed. However, Missing Data is not without its gaps and issues. This review aims at elaborating the recent issues and gaps that exist within the missing data and software effort estimation field. This may allow future researchers to get a better grasp and understanding of the inner workings of Missing Data and the methods through which these challenges can be addressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, J., Li, Y.-F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)

    Article  Google Scholar 

  2. Dizaji, Z.A., Gharehchopogh, F.S.: A hybrid of ant colony optimization and chaos optimization algorithms approach for software cost estimation. Indian J. Sci. Technol. 8(2), 128 (2015)

    Article  Google Scholar 

  3. Papatheocharous, E., et al.: An investigation of effort distribution among development phases: a four-stage progressive software cost estimation model. J. Softw. Evol. Process 29(10), e1881 (2017)

    Article  Google Scholar 

  4. Trendowicz, A., Jeffery, R.: Software project effort estimation. In: Foundations and Best Practice Guidelines for Success, Constructive Cost Model – COCOMO 2014, pp. 277–293 (2014)

    Google Scholar 

  5. Wen, J., et al.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)

    Article  Google Scholar 

  6. Song, L., Minku, L.L., Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. (TOSEM) 28(1), 5 (2019)

    Article  Google Scholar 

  7. Azzeh, M., Nassif, A.B., Banitaan, S.: Comparative analysis of soft computing techniques for predicting software effort based use case points. IET Softw. 12(1), 19–29 (2017)

    Article  Google Scholar 

  8. Twala, B., Cartwright, M.: Ensemble missing data techniques for software effort prediction. Intell. Data Anal. 14(3), 299–331 (2010)

    Article  Google Scholar 

  9. Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76(3), 449–462 (2005)

    Article  Google Scholar 

  10. Lethbridge, T.C., Sim, S.E., Singer, J.: Studying software engineers: data collection techniques for software field studies. Empir. Softw. Eng. 10(3), 311–341 (2005)

    Article  Google Scholar 

  11. Mockus, A.: Missing data in software engineering. In: Guide to Advanced Empirical Software Engineering, pp. 185–200. Springer (2008)

    Google Scholar 

  12. Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. IEEE Trans. Softw. Eng. 21(2), 126–137 (1995)

    Article  Google Scholar 

  13. Sarro, F., Petrozziello, A., Harman, M.: Multi-objective software effort estimation. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE (2016)

    Google Scholar 

  14. Andrew, B., Selamat, A.: Systematic literature review of missing data imputation techniques for effort prediction. In: International Conference on Information and Knowledge Management, Singapore (2012)

    Google Scholar 

  15. Keele, S.: Guidelines for performing systematic literature reviews in software engineering. In: Version 2.3, EBSE Technical report (2007)

    Google Scholar 

  16. Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 1–23 (2019)

    Google Scholar 

  17. Zhang, W., Yang, Y., Wang, Q.: Using Bayesian regression and EM algorithm with missing handling for software effort prediction. Inf. Softw. Technol. 58, 58–70 (2015)

    Article  Google Scholar 

  18. Huang, J., et al.: An empirical study of dynamic incomplete-case nearest neighbor imputation in software quality data. In: 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE (2015)

    Google Scholar 

  19. Soltanveis, F., Alizadeh, S.H.: Using parametric regression and KNN algorithm with missing handling for software effort prediction. In: 2016 Artificial Intelligence and Robotics (IRANOPEN). IEEE (2016)

    Google Scholar 

  20. Jing, X.-Y., et al.: Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation. In: Proceedings of the 38th International Conference on Software Engineering. ACM (2016)

    Google Scholar 

  21. Huang, J., Sun, H.: Grey relational analysis based k nearest neighbor missing data imputation for software quality datasets. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE (2016)

    Google Scholar 

  22. Idri, A., Abnane, I., Abran, A.: Missing data techniques in analogy-based software development effort estimation. J. Syst. Softw. 117, 595–611 (2016)

    Article  Google Scholar 

  23. Abnane, I., Idri, A.: Evaluating fuzzy analogy on incomplete software projects data. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE (2016)

    Google Scholar 

  24. Bala, A., Abran, A.: Use of the multiple imputation strategy to deal with missing data in the ISBSG repository. J. Inf. Technol. Softw. Eng. 6, 171 (2016)

    Google Scholar 

  25. Huang, J., et al.: Cross-validation based K nearest neighbor imputation for software quality datasets: an empirical study. J. Syst. Softw. 132, 226–252 (2017)

    Article  Google Scholar 

  26. Huang, J., et al.: An empirical analysis of three-stage data-preprocessing for analogy-based software effort estimation on the ISBSG data. In: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE (2017)

    Google Scholar 

  27. Abnane, I., Idri, A.: Improved analogy-based effort estimation with incomplete mixed data. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE (2018)

    Google Scholar 

  28. Idri, A., Abnane, I., Abran, A.: Support vector regression-based imputation in analogy-based software development effort estimation. J. Softw. Evol. Process 30(12), e2114 (2018)

    Article  Google Scholar 

  29. Bala, A., Abran, A.: Impact analysis of multiple imputation on effort estimation models with the ISBSG repository of software projects. Softw. Meas. News 23(1), 17–34 (2018)

    Google Scholar 

  30. Chatzipetrou, P.: Software cost estimation: a state-of-the-art statistical and visualization approach for missing data. Int. J. Serv. Sci. Manag. Eng. Technol. (IJSSMET) 10(3), 14–31 (2019)

    Google Scholar 

  31. Padhy, N., Singh, R., Satapathy, S.C.: Software reusability metrics estimation: algorithms, models and optimization techniques. Comput. Electr. Eng. 69, 653–668 (2018)

    Article  Google Scholar 

Download references

Acknowledgments

The authors fully acknowledge Universiti Teknologi Malaysia for UTM-TDR Grant Vot No. 06G23, and Ministry of Higher Education (MOHE) for FRGS Grant Vot No. 5F117, which have made this research endeavor possible.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ayman Jalal Hassan Almutlaq or Dayang N. A. Jawawi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Almutlaq, A.J.H., Jawawi, D.N.A. (2020). Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_107

Download citation

Publish with us

Policies and ethics