Abstract
Software effort estimation is one the critical aspects of software engineering. It revolves around predicting the required efforts needed to complete a software task. However, any estimation technique or model relies on an input data in which it defines and predicts future values. Missing data and values within such data is a common occurrence in the software development industry and thus it leads to inaccurate predictions or misleading results. Thus, Missing Data is an important aspect of effort estimation models that is required to be addressed. However, Missing Data is not without its gaps and issues. This review aims at elaborating the recent issues and gaps that exist within the missing data and software effort estimation field. This may allow future researchers to get a better grasp and understanding of the inner workings of Missing Data and the methods through which these challenges can be addressed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Huang, J., Li, Y.-F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)
Dizaji, Z.A., Gharehchopogh, F.S.: A hybrid of ant colony optimization and chaos optimization algorithms approach for software cost estimation. Indian J. Sci. Technol. 8(2), 128 (2015)
Papatheocharous, E., et al.: An investigation of effort distribution among development phases: a four-stage progressive software cost estimation model. J. Softw. Evol. Process 29(10), e1881 (2017)
Trendowicz, A., Jeffery, R.: Software project effort estimation. In: Foundations and Best Practice Guidelines for Success, Constructive Cost Model – COCOMO 2014, pp. 277–293 (2014)
Wen, J., et al.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)
Song, L., Minku, L.L., Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. (TOSEM) 28(1), 5 (2019)
Azzeh, M., Nassif, A.B., Banitaan, S.: Comparative analysis of soft computing techniques for predicting software effort based use case points. IET Softw. 12(1), 19–29 (2017)
Twala, B., Cartwright, M.: Ensemble missing data techniques for software effort prediction. Intell. Data Anal. 14(3), 299–331 (2010)
Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76(3), 449–462 (2005)
Lethbridge, T.C., Sim, S.E., Singer, J.: Studying software engineers: data collection techniques for software field studies. Empir. Softw. Eng. 10(3), 311–341 (2005)
Mockus, A.: Missing data in software engineering. In: Guide to Advanced Empirical Software Engineering, pp. 185–200. Springer (2008)
Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. IEEE Trans. Softw. Eng. 21(2), 126–137 (1995)
Sarro, F., Petrozziello, A., Harman, M.: Multi-objective software effort estimation. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE (2016)
Andrew, B., Selamat, A.: Systematic literature review of missing data imputation techniques for effort prediction. In: International Conference on Information and Knowledge Management, Singapore (2012)
Keele, S.: Guidelines for performing systematic literature reviews in software engineering. In: Version 2.3, EBSE Technical report (2007)
Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 1–23 (2019)
Zhang, W., Yang, Y., Wang, Q.: Using Bayesian regression and EM algorithm with missing handling for software effort prediction. Inf. Softw. Technol. 58, 58–70 (2015)
Huang, J., et al.: An empirical study of dynamic incomplete-case nearest neighbor imputation in software quality data. In: 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE (2015)
Soltanveis, F., Alizadeh, S.H.: Using parametric regression and KNN algorithm with missing handling for software effort prediction. In: 2016 Artificial Intelligence and Robotics (IRANOPEN). IEEE (2016)
Jing, X.-Y., et al.: Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation. In: Proceedings of the 38th International Conference on Software Engineering. ACM (2016)
Huang, J., Sun, H.: Grey relational analysis based k nearest neighbor missing data imputation for software quality datasets. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE (2016)
Idri, A., Abnane, I., Abran, A.: Missing data techniques in analogy-based software development effort estimation. J. Syst. Softw. 117, 595–611 (2016)
Abnane, I., Idri, A.: Evaluating fuzzy analogy on incomplete software projects data. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE (2016)
Bala, A., Abran, A.: Use of the multiple imputation strategy to deal with missing data in the ISBSG repository. J. Inf. Technol. Softw. Eng. 6, 171 (2016)
Huang, J., et al.: Cross-validation based K nearest neighbor imputation for software quality datasets: an empirical study. J. Syst. Softw. 132, 226–252 (2017)
Huang, J., et al.: An empirical analysis of three-stage data-preprocessing for analogy-based software effort estimation on the ISBSG data. In: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE (2017)
Abnane, I., Idri, A.: Improved analogy-based effort estimation with incomplete mixed data. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE (2018)
Idri, A., Abnane, I., Abran, A.: Support vector regression-based imputation in analogy-based software development effort estimation. J. Softw. Evol. Process 30(12), e2114 (2018)
Bala, A., Abran, A.: Impact analysis of multiple imputation on effort estimation models with the ISBSG repository of software projects. Softw. Meas. News 23(1), 17–34 (2018)
Chatzipetrou, P.: Software cost estimation: a state-of-the-art statistical and visualization approach for missing data. Int. J. Serv. Sci. Manag. Eng. Technol. (IJSSMET) 10(3), 14–31 (2019)
Padhy, N., Singh, R., Satapathy, S.C.: Software reusability metrics estimation: algorithms, models and optimization techniques. Comput. Electr. Eng. 69, 653–668 (2018)
Acknowledgments
The authors fully acknowledge Universiti Teknologi Malaysia for UTM-TDR Grant Vot No. 06G23, and Ministry of Higher Education (MOHE) for FRGS Grant Vot No. 5F117, which have made this research endeavor possible.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Almutlaq, A.J.H., Jawawi, D.N.A. (2020). Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_107
Download citation
DOI: https://doi.org/10.1007/978-3-030-33582-3_107
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33581-6
Online ISBN: 978-3-030-33582-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)