A Study on Class Imbalancing Feature Selection and Ensembles on Software Reliability Prediction

A Study on Class Imbalancing Feature Selection and Ensembles on Software Reliability Prediction

Jhansi Lakshmi Potharlanka, Maruthi Padmaja Turumella, Radha Krishna P.
Copyright: © 2019 |Volume: 10 |Issue: 4 |Pages: 24
ISSN: 1942-3926|EISSN: 1942-3934|EISBN13: 9781522565550|DOI: 10.4018/IJOSSP.2019100102
Cite Article Cite Article

MLA

Potharlanka, Jhansi Lakshmi, et al. "A Study on Class Imbalancing Feature Selection and Ensembles on Software Reliability Prediction." IJOSSP vol.10, no.4 2019: pp.20-43. http://doi.org/10.4018/IJOSSP.2019100102

APA

Potharlanka, J. L., Turumella, M. P., & Radha Krishna P. (2019). A Study on Class Imbalancing Feature Selection and Ensembles on Software Reliability Prediction. International Journal of Open Source Software and Processes (IJOSSP), 10(4), 20-43. http://doi.org/10.4018/IJOSSP.2019100102

Chicago

Potharlanka, Jhansi Lakshmi, Maruthi Padmaja Turumella, and Radha Krishna P. "A Study on Class Imbalancing Feature Selection and Ensembles on Software Reliability Prediction," International Journal of Open Source Software and Processes (IJOSSP) 10, no.4: 20-43. http://doi.org/10.4018/IJOSSP.2019100102

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Software quality can be improved by early software defect prediction models. However, class imbalance due to under representation of defects and the irrelevant metrics used to predict them are two major challenges that hinder the model performance. This article presents a new two-stage framework of Ensemble of Hybrid Feature selection (EHF) with Weighted Support Vector Machine Boosting (WSVMBoost), which further enhance the model performance. The EHF is the ensemble feature ranking of feature selection models such as filters and embedded models to select the relevant metrics. The classification ensembles, namely Random Forest, RUSBoost, WSVMBoost, and the base learners, namely Decision Tree, and SVM are also explored in this study using five software reliability datasets. From the statistical tests, EHF with WSVMBoost attained best mean rank in terms of performance than the rest of the feature selection hybrids in predicting the software defects. Additionally, this study has shown that both McCabe and Hasalted method level metrics are equally important in improving the model performance.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.