Skip to main content

Adaptive Oversampling for Imbalanced Data Classification

  • Conference paper
  • First Online:
Information Sciences and Systems 2013

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 264))

Abstract

Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, Virtual, that combines the benefits of oversampling and active learning. Unlike traditional resampling methods which require preprocessing of the data, Virtual generates synthetic examples for the minority class during the training process, therefore it removes the need for an extra preprocessing stage. In the context of learning with Support Vector Machines, we demonstrate that Virtual outperforms competitive oversampling techniques both in terms of generalization performance and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barua S (2012) Monirul Islam, Xin Yao, and Kazuyuki Murase. Mwmote-majority weighted minority oversampling technique for imbalanced data set learning, IEEE Trans Knowl Data Eng

    Google Scholar 

  2. Blagus R, Lusa L (2012) Evaluation of smote for high-dimensional class-imbalanced microarray data. In machine learning and applications (ICMLA), 2012 11th international conference on, IEEE, 2012, vol 2, pp 89–94

    Google Scholar 

  3. Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res (JMLR) 6:1579–1619

    Google Scholar 

  4. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth

    Google Scholar 

  5. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In advances in knowledge discovery and data mining. Springer, pp 475–482

    Google Scholar 

  6. Chan PK, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining, pp 164–168

    Google Scholar 

  7. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Google Scholar 

  8. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In knowledge discovery in databases: PKDD 2003. Springer, pp 107–119

    Google Scholar 

  9. Chen Sheng, He Haibo, Garcia Edwardo A (2010) Ramoboost: ranked minority oversampling in boosting. IEEE Trans Neural Networks 21(10):1624–1642

    Article  Google Scholar 

  10. Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th international conference on knowledge discovery and data mining, pp 155–164

    Google Scholar 

  11. Ertekin S, Huang J, Bottou L, Giles L (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the 16th ACM conference on information and knowledge management (CIKM), ACM, 2007, pp 127–136

    Google Scholar 

  12. Ertekin S, Huang J, Giles CL (2007) Active learning for class imbalance problem. In: Proceedings of the 30th annual international ACM SIGIR conference, 2007

    Google Scholar 

  13. Grzymala-Busse JW, Zheng Z, Goodwin LK, Grzymala-Busse WJ (2000) An approach to imbalanced datasets based on changing rule strength. In: Proceedings of learning from imbalanced datasets, AAAI workshop

    Google Scholar 

  14. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In neural networks, 2008. IJCNN 2008. (IEEE world congress on computational intelligence). IEEE international joint conference on, IEEE, 2008, pp 1322–1328

    Google Scholar 

  15. Hilas Constantinos S, Mastorocostas Paris As (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21(7):721–726

    Article  Google Scholar 

  16. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449

    MATH  Google Scholar 

  17. Japkowicz N (2000) The class imbalance problem: Significance and strategies. In: Proceedings of 2000 international conference on, artificial intelligence (IC-AI’2000), 1, pp 111–117

    Google Scholar 

  18. Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215

    Article  Google Scholar 

  19. Radivoja P, Chawla NV, Dunker AK, Obradovic Z (2004) Classification and knowledge discovery in protein databases. J Biomed Inf 37(4):224–239

    Google Scholar 

  20. Bhavani R, Adam K (2004) Extreme re-balancing for svms: a case study. SIGKDD Explor Newslett 6(1):60–69

    Google Scholar 

  21. Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. In The 2010 international joint Conference on neural networks (IJCNN), IEEE, 2010, pp 1–8

    Google Scholar 

  22. Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res (JMLR) 2:45–66

    MATH  Google Scholar 

  23. Wu G, Chang EY (2004) Aligning boundary in kernel space for learning imbalanced dataset. In: Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), pp 265–272

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Şeyda Ertekin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Ertekin, Ş. (2013). Adaptive Oversampling for Imbalanced Data Classification. In: Gelenbe, E., Lent, R. (eds) Information Sciences and Systems 2013. Lecture Notes in Electrical Engineering, vol 264. Springer, Cham. https://doi.org/10.1007/978-3-319-01604-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01604-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01603-0

  • Online ISBN: 978-3-319-01604-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics