Loading [a11y]/accessibility-menu.js
Improving software quality estimation by combining feature selection strategies with sampled ensemble learning | IEEE Conference Publication | IEEE Xplore

Improving software quality estimation by combining feature selection strategies with sampled ensemble learning


Abstract:

The efficiency (prediction accuracy) of a classification model is affected by the quality of training data. High dimensionality and class imbalance are two main problems ...Show More

Abstract:

The efficiency (prediction accuracy) of a classification model is affected by the quality of training data. High dimensionality and class imbalance are two main problems that may cause low quality of training datasets, making data preprocessing a very important step for a classification problem. Feature (software metric) selection and data sampling are frequently used to overcome these problems. Feature selection (FS) is a process of selecting the most important attributes from the original dataset. Data sampling copes with class imbalance by adding/removing instances to/from training datasets. Another interesting method, called boosting (building multiple models, with each model tuned to work better on instances misclassifled by previous models), is found also effective for addressing the class imbalance problem. In this study, we investigate two types of FS approaches: individual FS and repetitive sampled FS. Following feature selection, models are built either using a plain learner or using a boosting algorithm, where random undersampling integrates with the AdaBoost algorithm. We focus on studying the impact of two FS methods (individual FS vs. repetitive sampled FS) and two model-building processes (boosting vs. plain learner) on software quality prediction. Six feature ranking techniques are examined in the experiment. The results demonstrate that the repetitive sampled FS generally has better performance than the individual FS technique when a plain learner is used for the subsequent learning process, and that boosting is more effective in improving classification performance than not using boosting.
Date of Conference: 13-15 August 2014
Date Added to IEEE Xplore: 02 March 2015
Electronic ISBN:978-1-4799-5880-1
Conference Location: Redwood City, CA, USA

References

References is not available for this document.