Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction

Misha Kakkar, Sarika Jain, Abhay Bansal, P.S. Grover

Source Title: International Journal of Open Source Software and Processes (IJOSSP)9(1)

ISSN: 1942-3926|EISSN: 1942-3934|EISBN13: 9781522543985|DOI: 10.4018/IJOSSP.2018010101

MLA

Kakkar, Misha, et al. "Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction." IJOSSP vol.9, no.1 2018: pp.1-19. http://doi.org/10.4018/IJOSSP.2018010101

APA

Kakkar, M., Jain, S., Bansal, A., & Grover, P. (2018). Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction. International Journal of Open Source Software and Processes (IJOSSP), 9(1), 1-19. http://doi.org/10.4018/IJOSSP.2018010101

Chicago

Kakkar, Misha, et al. "Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction," International Journal of Open Source Software and Processes (IJOSSP) 9, no.1: 1-19. http://doi.org/10.4018/IJOSSP.2018010101

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

Software Defect Prediction (SDP) models are used to predict, whether software is clean or buggy using the historical data collected from various software repositories. The data collected from such repositories may contain some missing values. In order to estimate missing values, imputation techniques are used, which utilizes the complete observed values in the dataset. The objective of this study is to identify the best-suited imputation technique for handling missing values in SDP dataset. In addition to identifying the imputation technique, the authors have investigated for the most appropriate combination of imputation technique and data preprocessing method for building SDP model. In this study, four combinations of imputation technique and data preprocessing methods are examined using the improved NASA datasets. These combinations are used along with five different machine-learning algorithms to develop models. The performance of these SDP models are then compared using traditional performance indicators. Experiment results show that among different imputation techniques, linear regression gives the most accurate imputed value. The combination of linear regression with correlation based feature selector outperforms all other combinations. To validate the significance of data preprocessing methods with imputation the findings are applied to open source projects. It was concluded that the result is in consistency with the above conclusion.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction

MLA

APA

Chicago

Export Reference

Abstract

Request Access