Abstract
For imbalanced data sets, examples of minority class are sparsely distributed in sample space compared with the overwhelming amount of majority class. This presents a great challenge for learning from the minority class. Enlightened by SMOTE, a new over-sampling method, Random-SMOTE, which generates examples randomly in the sample space of minority class is proposed. According to the experiments on real data sets, Random-SMOTE is more effective compared with other random sampling approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Systems with Applications 36(3 PART 1), 4626–4636 (2009)
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 164–168 (2001)
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2), 195–215 (1998)
Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)
Estabrooks, A., Jo, T., Japkowicz, N.: A Multiple Resampling Method for Learning from Imbalanced Data Sets. Computational Intelligence 20(1), 18–36 (2004)
Weiss, S., Kapouleas, I.: An empirical comparison of pattern recognition, neural nets and machine learning methods. Readings in Machine Learning (1990)
Weiss, G.M., Provost, F.: Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction. JAIR 19, 315–354 (2003)
Estabrooks, A., Japkowicz, N.: A Mixture-of-Experts Framework for Learning from Imbalanced Data Sets. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, p. 34. Springer, Heidelberg (2001)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (1997)
Chan, P., Stolfo, S.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, Menlo Park, pp. 164–168 (1998)
Visa, S., Ralescu, A.: Experiments in guided class rebalance based on class structure. In: Proc. of the MAICS Conference, pp. 8–14 (2004a)
Nickerson, A.S., Japkowicz, N., Milios, E.: Using unsupervised learning to guide re-sampling in imbalanced data sets, pp. 261–265 (2001)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html
Weiss, G.M.: Mining with Rarity: A Unifying Framework. SIGKDD Explorations 6(1), 7–19 (2004)
Wu, G., Chang, E.Y.: Class-Boundary Alignment for Imbalanced Dataset Learning. In: Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC (2003)
Guo, H., Viktor, H.L.: Learning from Imbalanced Data Sets with Boosting and Data Generation: The DataBoost-IM Approach. Sigkdd Explorations 6(1), 30–39 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dong, Y., Wang, X. (2011). A New Over-Sampling Approach: Random-SMOTE for Learning from Imbalanced Data Sets. In: Xiong, H., Lee, W.B. (eds) Knowledge Science, Engineering and Management. KSEM 2011. Lecture Notes in Computer Science(), vol 7091. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25975-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-25975-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25974-6
Online ISBN: 978-3-642-25975-3
eBook Packages: Computer ScienceComputer Science (R0)