Learning from Imbalanced Sets through Resampling and Weighting

Barandela, R.; Sánchez, J. S.; García, V.; Ferri, F. J.

doi:10.1007/978-3-540-44871-6_10

R. Barandela^5,8,
J. S. Sánchez⁶,
V. García⁵ &
…
F. J. Ferri⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2652))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

923 Accesses
5 Citations

Abstract

The problem of imbalanced training sets in supervised pattern recognition methods is receiving growing attention. Imbalanced training sample means that one class is represented by a large number of examples while the other is represented by only a few. It has been observed that this situation, which arises in several practical situations, may produce an important deterioration of the classification accuracy, in particular with patterns belonging to the less represented classes. In the present paper, we introduce a new approach to design an instance-based classifier in such imbalanced environments.

Partially supported by grants 32016-A (Mexican CONACyT), TIC2000-1703-C03-03 (Spanish CICYT), and P1-1B2002-07 (Fundació Caixa Castelló-Bancaixa).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barandela, R., Cortès, N., Palacios, A.: The nearest neighbor rule and the reduction of the training sample size. In: Proc. 9th Spanish Symp. on Pattern Recognition and Image Analysis, vol. 1, pp. 103–108 (2001)
Google Scholar
Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)
Article Google Scholar
Chan, P., Stolfo, S.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, pp. 164–168 (1998)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2000)
Article Google Scholar
Chen, C.H., Józwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recognition Letters 17, 819–823 (1996)
Article Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. on Information Theory 13, 21–27 (1967)
Article Google Scholar
Dasarathy, B.V.: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamos (1991)
Google Scholar
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Int. Conf. on Knowledge Discovery and Data Mining, pp. 155–164 (1999)
Google Scholar
Dudani, S.A.: The distance-weighted k-nearest neighbor rule. IEEE Trans. on Systems, Man, and Cybernetics 6, 325–327 (1976)
Article Google Scholar
Eavis, T., Japkowicz, N.: A recognition-based alternative to discrimination-based multi-layer perceptrons. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 280–292. Springer, Heidelberg (2000)
Chapter Google Scholar
Ezawa, K.J., Singh, M., Norton, S.W.: Learning goal oriented Bayesian networks for telecommunications management. In: Proc. 13th Int. Conf. on Machine Learning, pp. 139–147 (1996)
Google Scholar
Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1996)
Article Google Scholar
Ferri, F.J., Sánchez, J.S., Pla, F.: Editing prototypes in the finite sample size case using alternative neighbourhoods. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 620–629. Springer, Heidelberg (1998)
Chapter Google Scholar
Gordon, D.F., Perlis, D.: Explicitly biased generalization. Computational Intelligence 5, 67–81 (1989)
Article Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. on Information Theory 14, 515–516 (1968)
Article Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: onesided selection. In: Proc. 14th Int. Conf. on Machine Learning, pp. 179–186 (1997)
Google Scholar
Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30, 195–215 (1998)
Article Google Scholar
Lewis, D., Catlett, J.: Heterogeneous uncertainity sampling for supervised learning. In: Proc. 11th Int. Conf. on Machine Learning, pp. 148–156 (1994)
Chapter Google Scholar
Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, pp. 73–79 (1998)
Google Scholar
Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive Bayes. In: Proc. 16th Int. Conf. on Machine Learning, pp. 258–267 (1999)
Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proc. 11th Int. Conf. on Machine Learning, pp. 217–225 (1994)
Chapter Google Scholar
Swets, J., Dawes, R., Monahan, J.: Better decisions through science. Scientific American, 82–87 (2000)
Article Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data sets. IEEE Trans. on Systems, Man and Cybernetics 2, 408–421 (1972)
Article MathSciNet Google Scholar
Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. International Journal of Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Tecnológico de Toluca, Av. Tecnológico s/n, 52140, Metepec, México
R. Barandela & V. García
Dept. Llenguatges i Sistemes Informàtics, U. Jaume I, 12071, Castelló, Spain
J. S. Sánchez
Dept. d’Informàtica, U. València, 46100, Burjassot (València), Spain
F. J. Ferri
Instituto de Geografía, Vedado, La Habana, Cuba
R. Barandela

Authors

R. Barandela
View author publications
You can also search for this author in PubMed Google Scholar
J. S. Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
V. García
View author publications
You can also search for this author in PubMed Google Scholar
F. J. Ferri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Unitat de Gràfics i Visió per Ordinador Departament de Ciències Matemàtiques i Informàtica, Universitat de les Illes Balears Edifici Anselm Turmeda, Ctra. de Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
Francisco José Perales
FEUP - Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
Aurélio J. C. Campilho
Departamento de Ciencias da la Computacíon e I.A., Universidad de Granada, E.T. S. Ing. Informática, 18071, Granada, Spain
Nicolás Pérez de la Blanca
Dept. System Engineering and Automation, Universitat Politècnica de Catalunya (UPC) Barcelona, Spain
Alberto Sanfeliu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barandela, R., Sánchez, J.S., García, V., Ferri, F.J. (2003). Learning from Imbalanced Sets through Resampling and Weighting. In: Perales, F.J., Campilho, A.J.C., de la Blanca, N.P., Sanfeliu, A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2003. Lecture Notes in Computer Science, vol 2652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44871-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-44871-6_10
Published: 18 September 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40217-6
Online ISBN: 978-3-540-44871-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics