Abstract
A common assumption made in the field of Pattern Recognition is that the priors inherent to the class distributions in the training set are representative of the true class distributions. However this assumption does not always hold, since the true class-distributions may be different, and in fact may vary significantly. The implication of this is that the effect on cost for a given classifier may be worse than expected. In this paper we address this issue, discussing a theoretical framework and methodology to assess the effect on cost for a classifier in imbalanced conditions. The methodology can be applied to many different types of costs. Some artificial experiments show how the methodology can be used to assess and compare classifiers. It is observed that classifiers that model the underlying distributions well are more resilient to changes in the true class distribution than weaker classifiers.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bishop, C.M.: Neural Networks for Pattern Recognition, 1st edn. Oxford University Press Inc., New York (1995)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley- Interscience, Chichester (2001)
Duin, R.P.W.: On the choice of smoothing parameters for parzen estimators of probability density functions. IEEE Trans. Computing 25, 1175–1179 (1976)
Duin, R.P.W.: PRTools Version 3.0, A Matlab Toolbox for Pattern Recognition. Pattern Recognition Group, TUDelft (January 2000)
Flach, P.: The geometry of roc space: understanding machine learning metrics through roc isometrics. In: ICML 2003 Washington DC, pp. 194–201 (2003)
Hand, D.J.: Construction and Assessment of Classification Rules. John Wiley and Sons, Chichester (1997) ISBN 0-471- 96583-9
Highleyman, W.: Linear decision functions, with application to pattern recognition. In: Proc. IRE, vol. 49, pp. 31–48 (1961)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced data sets: One-sided sampling. In: Proceedings, 14th ICML, Nashville, July 1997, pp. 179–186 (1997)
Metz, C.: Basic principles of roc analysis. Seminars in Nuclear Medicine 3(4) (1978)
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42, 203–231 (2001)
Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Technical report ML-TR-44, Department of Computer Science, Rutgers University (August 2, 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Landgrebe, T., Paclík, P., Tax, D.M.J., Verzakov, S., Duin, R.P.W. (2004). Cost-Based Classifier Evaluation for Imbalanced Problems. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2004. Lecture Notes in Computer Science, vol 3138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27868-9_83
Download citation
DOI: https://doi.org/10.1007/978-3-540-27868-9_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22570-6
Online ISBN: 978-3-540-27868-9
eBook Packages: Springer Book Archive