A New Over-Sampling Approach: Random-SMOTE for Learning from Imbalanced Data Sets

Dong, Yanjie; Wang, Xuehua

doi:10.1007/978-3-642-25975-3_30

Yanjie Dong²¹ &
Xuehua Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7091))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1746 Accesses
25 Citations

Abstract

For imbalanced data sets, examples of minority class are sparsely distributed in sample space compared with the overwhelming amount of majority class. This presents a great challenge for learning from the minority class. Enlightened by SMOTE, a new over-sampling method, Random-SMOTE, which generates examples randomly in the sample space of minority class is proposed. According to the experiments on real data sets, Random-SMOTE is more effective compared with other random sampling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Systems with Applications 36(3 PART 1), 4626–4636 (2009)
Article Google Scholar
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 164–168 (2001)
Google Scholar
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2), 195–215 (1998)
Article Google Scholar
Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)
Article Google Scholar
Estabrooks, A., Jo, T., Japkowicz, N.: A Multiple Resampling Method for Learning from Imbalanced Data Sets. Computational Intelligence 20(1), 18–36 (2004)
Article MathSciNet Google Scholar
Weiss, S., Kapouleas, I.: An empirical comparison of pattern recognition, neural nets and machine learning methods. Readings in Machine Learning (1990)
Google Scholar
Weiss, G.M., Provost, F.: Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction. JAIR 19, 315–354 (2003)
MATH Google Scholar
Estabrooks, A., Japkowicz, N.: A Mixture-of-Experts Framework for Learning from Imbalanced Data Sets. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, p. 34. Springer, Heidelberg (2001)
Chapter Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Chan, P., Stolfo, S.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, Menlo Park, pp. 164–168 (1998)
Google Scholar
Visa, S., Ralescu, A.: Experiments in guided class rebalance based on class structure. In: Proc. of the MAICS Conference, pp. 8–14 (2004a)
Google Scholar
Nickerson, A.S., Japkowicz, N., Milios, E.: Using unsupervised learning to guide re-sampling in imbalanced data sets, pp. 261–265 (2001)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Chapter Google Scholar
Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html
Weiss, G.M.: Mining with Rarity: A Unifying Framework. SIGKDD Explorations 6(1), 7–19 (2004)
Article Google Scholar
Wu, G., Chang, E.Y.: Class-Boundary Alignment for Imbalanced Dataset Learning. In: Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC (2003)
Google Scholar
Guo, H., Viktor, H.L.: Learning from Imbalanced Data Sets with Boosting and Data Generation: The DataBoost-IM Approach. Sigkdd Explorations 6(1), 30–39 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institution of Information and Decision-making Technology, Dalian University of Technology, Dalian, China
Yanjie Dong & Xuehua Wang

Authors

Yanjie Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xuehua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Management Science and Information Systems Department, Rutgers, the State University of New Jersey, 1, Washington Park, 07102, Newark, NJ, USA
Hui Xiong
Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China
W. B. Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, Y., Wang, X. (2011). A New Over-Sampling Approach: Random-SMOTE for Learning from Imbalanced Data Sets. In: Xiong, H., Lee, W.B. (eds) Knowledge Science, Engineering and Management. KSEM 2011. Lecture Notes in Computer Science(), vol 7091. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25975-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-25975-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25974-6
Online ISBN: 978-3-642-25975-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics