Reference Hub6
Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction

Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction

Misha Kakkar, Sarika Jain, Abhay Bansal, P.S. Grover
Copyright: © 2018 |Volume: 9 |Issue: 1 |Pages: 19
ISSN: 1942-3926|EISSN: 1942-3934|EISBN13: 9781522543985|DOI: 10.4018/IJOSSP.2018010101
Cite Article Cite Article

MLA

Kakkar, Misha, et al. "Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction." IJOSSP vol.9, no.1 2018: pp.1-19. http://doi.org/10.4018/IJOSSP.2018010101

APA

Kakkar, M., Jain, S., Bansal, A., & Grover, P. (2018). Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction. International Journal of Open Source Software and Processes (IJOSSP), 9(1), 1-19. http://doi.org/10.4018/IJOSSP.2018010101

Chicago

Kakkar, Misha, et al. "Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction," International Journal of Open Source Software and Processes (IJOSSP) 9, no.1: 1-19. http://doi.org/10.4018/IJOSSP.2018010101

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Software Defect Prediction (SDP) models are used to predict, whether software is clean or buggy using the historical data collected from various software repositories. The data collected from such repositories may contain some missing values. In order to estimate missing values, imputation techniques are used, which utilizes the complete observed values in the dataset. The objective of this study is to identify the best-suited imputation technique for handling missing values in SDP dataset. In addition to identifying the imputation technique, the authors have investigated for the most appropriate combination of imputation technique and data preprocessing method for building SDP model. In this study, four combinations of imputation technique and data preprocessing methods are examined using the improved NASA datasets. These combinations are used along with five different machine-learning algorithms to develop models. The performance of these SDP models are then compared using traditional performance indicators. Experiment results show that among different imputation techniques, linear regression gives the most accurate imputed value. The combination of linear regression with correlation based feature selector outperforms all other combinations. To validate the significance of data preprocessing methods with imputation the findings are applied to open source projects. It was concluded that the result is in consistency with the above conclusion.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.