ABSTRACT
This paper proposes a novel method to generate synthetic projectcases and add them to a fit dataset for the purpose of improving the performance of analogy-based software effort estimation. The proposed method extends conventional over-sampling method, which is a preprocessing procedure for n-group classification problems, which makes it suitable for any imbalanced dataset to be used in analogy-based system. We experimentally evaluated the effect of the over-sampling method to improve the performance of the analogy-based software effort estimation by using the Desharnais dataset. Results show significant improvement to the estimation accuracy by using our approach.
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321--357, 2002. Google ScholarCross Ref
- J. M. Desharnais. Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. Master's thesis, University of Montreal, 1989.Google Scholar
- T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit. A simulation study of the model evaluation criterion MMRE. IEEE Trans. Software Engineering, 29(11):985--995, 2003. Google ScholarDigital Library
- Y. Kamei, A. Monden, S. Matsumoto, T. Kakimoto, and K. Matsumoto. The effects of over and under sampling on fault-prone module detection. In Proc. Int'l Symposium on Empirical Software Engineering and Measurement (ESEM'07), pages 196--204, 2007. Google ScholarDigital Library
- J. W. Keung and B. Kitchenham. Optimising project feature weights for analogy-based software cost estimation using the mantel correlation. In Proc. Asia-Pacific Software Engineering Conference (APSEC'07), pages 222--229, 2007. Google ScholarDigital Library
- N. Mittas, M. Athanasiades, and L. Angelis. Improving analogy-based software cost estimation by a resampling method. Information and Software Technology, 50(3):221-- 230, 2008. Google ScholarDigital Library
- M. Shepperd and C. Schofield. Estimating software project effort using analogies. IEEE Trans. Software Engineering, 23(11):736--743,1997. Google ScholarDigital Library
Index Terms
- An over-sampling method for analogy-based software effort estimation
Recommendations
The adjusted analogy-based software effort estimation based on similarity distances
Analogy-based estimation is a widely adopted problem solving method that has been evaluated and confirmed in software effort or cost estimation domains. The similarity measures between pairs of projects play a critical role in the analogy-based software ...
Over-sampling via under-sampling in strongly imbalanced data
Classification of imbalanced datasets is an important challenge in machine learning. This investigation analysed the effect of ratio imbalance and the selected classifier on the application of several re-sampling strategies to deal with imbalanced ...
KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
AbstractImbalanced learning has become a research emphasis in recent years because of the growing number of class-imbalance classification problems in real applications. It is particularly challenging when the imbalanced rate is very high. Sampling, ...
Comments