SMOTEBoost: Improving Prediction of the Minority Class in Boosting

Chawla, Nitesh V.; Lazarevic, Aleksandar; Hall, Lawrence O.; Bowyer, Kevin W.

doi:10.1007/978-3-540-39804-2_12

Nitesh V. Chawla¹⁰,
Aleksandar Lazarevic¹¹,
Lawrence O. Hall¹² &
…
Kevin W. Bowyer¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2838))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

6385 Accesses
640 Citations
20 Altmetric

Abstract

Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class. SMOTE (Synthetic Minority Over-sampling TEchnique) is specifically designed for learning from imbalanced data sets. This paper presents a novel approach for learning from imbalanced data sets, based on a combination of the SMOTE algorithm and the boosting procedure. Unlike standard boosting where all misclassified examples are given equal weights, SMOTEBoost creates synthetic examples from the rare or minority class, thus indirectly changing the updating weights and compensating for skewed distributions. SMOTEBoost applied to several highly and moderately imbalanced data sets shows improvement in prediction performance on the minority class and overall improved F-values.

Download to read the full chapter text

Chapter PDF

OUBoost: boosting based over and under sampling technique for handling imbalanced data

Article 10 May 2023

Sahar Hassanzadeh Mostafaei & Jafar Tanha

To combat multi-class imbalanced problems by means of over-sampling and boosting techniques

Article 30 April 2014

Lida Abdi & Sattar Hashemi

A review of boosting methods for imbalanced data classification

Article 06 August 2014

Qiujie Li & Yaobin Mao

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Provost, F., Fawcett, T.: Robust Classification for Imprecise Environments. Machine Learning 42/3, 203–231 (2001)
Article Google Scholar
Buckland, M., Gey, F.: The Relationship Between Recall and Precision. Journal of the American Society for Information Science 45(1), 12–19 (1994)
Article Google Scholar
Joshi, M., Kumar, V., Agarwal, R.: Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements. In: First IEEE International Conference on Data Mining, San Jose, CA (2001)
Google Scholar
Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 325–332 (1996)
Google Scholar
Ting, K.: A Comparative Study of Cost-Sensitive Boosting Algorithms. In: Proceedings of 17^th International Conference on Machine Learning, Stanford, CA, pp. 983–990 (2000)
Google Scholar
Fan, W., Stolfo, S., Zhang, J., Chan, P.: AdaCost: Misclassification Cost-Sensitive Boosting. In: Proc. of 16th International Conference on Machine Learning, Slovenia (1999)
Google Scholar
Karakoulas, G., Shawe-Taylor, J.: Optimizing Classifiers for Imbalanced Training Sets. In: Kearns, M., Solla, S., Cohn, D. (eds.) Advances in Neural Information Processing Systems, vol. 11, MIT Press, Cambridge (1999)
Google Scholar
Joshi, M., Agarwal, R., Kumar, V.: Predicting Rare Classes: Can Boosting Make Any Weak Learner Strong? In: Proceedings of Eighth ACM Conference ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, Canada (2002)
Google Scholar
Joshi, M., Agarwal, R.: PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-study in Network Intrusion Detection), First SIAM Conference on Data Mining, Chicago, IL (2001)
Google Scholar
Chan, P., Stolfo, S.: Towards Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In: Proceedings of Fourth ACM Conference ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, pp. 164–168 (1998)
Google Scholar
Kubat, M., Holte, R., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)
Article Google Scholar
Japkowicz, N.: The Class Imbalance Problem: Significance and Strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI 2000): Special Track on Inductive Learning, Las Vegas, Nevada (2000)
Google Scholar
Lewis, D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the Eleventh International Conference of Machine Learning, San Francisco, CA, pp. 148–156 (1994)
Google Scholar
Ling, C., Li, C.: Data Mining for Direct Marketing Problems and Solutions. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, NY (1998)
Google Scholar
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1992)
Google Scholar
Cohen, W.: Fast Effective Rule Induction. In: Proceedings of the 12th International Conference on Machine Learning, Lake Tahoe, CA, pp. 115–123 (1995)
Google Scholar
Stanfill, C., Waltz, D.: Toward Memory-based Reasoning. Communications of the ACM 29(12), 1213–1228 (1986)
Article Google Scholar
Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10(1), 57–78 (1993)
Google Scholar
KDD-Cup, Task Description (1999), http://kdd.ics.uci.edu/databases/kddcup99/task.html
Lippmann, R., Fried, D., Graf, I., Haines, J., Kendall, K., McClung, D., Weber, D., Webster, S., Wyschogrod, D., Cunningham, R., Zissman, M.: Evaluating Intrusion Detection Systems: The 1998 DARPA Off-line Intrusion Detection Evaluation. In: Proceedings DARPA Information Survivability Conference and Exposition (DISCEX) 2000, vol. 2, pp. 12–26. IEEE Computer Society Press, Los Alamitos (2000)
Chapter Google Scholar
Blake, C., Merz, C.: UCI Repository of Machine Learning Databases, Department of Information and Computer Sciences, University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html
Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of 15th International Conference on Machine Learning, Madison, WI, pp. 445–453 (1998)
Google Scholar
ELENA project, ftp.dice.ucl.ac.be in directory pub/neural-nets/ELENA/databases

Download references

Author information

Authors and Affiliations

Business Analytic Solutions, Canadian Imperial Bank of Commerce (CIBC), BCE Place, 161 Bay Street, 11th Floor, Toronto, ON, M5J 2S8, Canada
Nitesh V. Chawla
Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN, 55455, USA
Aleksandar Lazarevic
Department of Computer Science and Engineering, University of South Florida, ENB 118, 4202 E. Fowler Avenue, Tampa, FL, 33620, USA
Lawrence O. Hall
Department of Computer Science and Engineering, University of Notre Dame, 384 Fitzpatrick Hall, IN, 46556, USA
Kevin W. Bowyer

Authors

Nitesh V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandar Lazarevic
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence O. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Kevin W. Bowyer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W. (2003). SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-39804-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

SMOTEBoost: Improving Prediction of the Minority Class in Boosting

Abstract

Chapter PDF

Similar content being viewed by others

OUBoost: boosting based over and under sampling technique for handling imbalanced data

To combat multi-class imbalanced problems by means of over-sampling and boosting techniques

A review of boosting methods for imbalanced data classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

SMOTEBoost: Improving Prediction of the Minority Class in Boosting

Abstract

Chapter PDF

Similar content being viewed by others

OUBoost: boosting based over and under sampling technique for handling imbalanced data

To combat multi-class imbalanced problems by means of over-sampling and boosting techniques

A review of boosting methods for imbalanced data classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation