Adaptive Oversampling for Imbalanced Data Classification

Ertekin, Şeyda

doi:10.1007/978-3-319-01604-7_26

Şeyda Ertekin³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 264))

1495 Accesses
17 Citations

Abstract

Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, Virtual, that combines the benefits of oversampling and active learning. Unlike traditional resampling methods which require preprocessing of the data, Virtual generates synthetic examples for the minority class during the training process, therefore it removes the need for an extra preprocessing stage. In the context of learning with Support Vector Machines, we demonstrate that Virtual outperforms competitive oversampling techniques both in terms of generalization performance and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barua S (2012) Monirul Islam, Xin Yao, and Kazuyuki Murase. Mwmote-majority weighted minority oversampling technique for imbalanced data set learning, IEEE Trans Knowl Data Eng
Google Scholar
Blagus R, Lusa L (2012) Evaluation of smote for high-dimensional class-imbalanced microarray data. In machine learning and applications (ICMLA), 2012 11th international conference on, IEEE, 2012, vol 2, pp 89–94
Google Scholar
Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res (JMLR) 6:1579–1619
Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth
Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In advances in knowledge discovery and data mining. Springer, pp 475–482
Google Scholar
Chan PK, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining, pp 164–168
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In knowledge discovery in databases: PKDD 2003. Springer, pp 107–119
Google Scholar
Chen Sheng, He Haibo, Garcia Edwardo A (2010) Ramoboost: ranked minority oversampling in boosting. IEEE Trans Neural Networks 21(10):1624–1642
Article Google Scholar
Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th international conference on knowledge discovery and data mining, pp 155–164
Google Scholar
Ertekin S, Huang J, Bottou L, Giles L (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the 16th ACM conference on information and knowledge management (CIKM), ACM, 2007, pp 127–136
Google Scholar
Ertekin S, Huang J, Giles CL (2007) Active learning for class imbalance problem. In: Proceedings of the 30th annual international ACM SIGIR conference, 2007
Google Scholar
Grzymala-Busse JW, Zheng Z, Goodwin LK, Grzymala-Busse WJ (2000) An approach to imbalanced datasets based on changing rule strength. In: Proceedings of learning from imbalanced datasets, AAAI workshop
Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In neural networks, 2008. IJCNN 2008. (IEEE world congress on computational intelligence). IEEE international joint conference on, IEEE, 2008, pp 1322–1328
Google Scholar
Hilas Constantinos S, Mastorocostas Paris As (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21(7):721–726
Article Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
MATH Google Scholar
Japkowicz N (2000) The class imbalance problem: Significance and strategies. In: Proceedings of 2000 international conference on, artificial intelligence (IC-AI’2000), 1, pp 111–117
Google Scholar
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215
Article Google Scholar
Radivoja P, Chawla NV, Dunker AK, Obradovic Z (2004) Classification and knowledge discovery in protein databases. J Biomed Inf 37(4):224–239
Google Scholar
Bhavani R, Adam K (2004) Extreme re-balancing for svms: a case study. SIGKDD Explor Newslett 6(1):60–69
Google Scholar
Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. In The 2010 international joint Conference on neural networks (IJCNN), IEEE, 2010, pp 1–8
Google Scholar
Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res (JMLR) 2:45–66
MATH Google Scholar
Wu G, Chang EY (2004) Aligning boundary in kernel space for learning imbalanced dataset. In: Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), pp 265–272
Google Scholar

Download references

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
Şeyda Ertekin

Authors

Şeyda Ertekin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Şeyda Ertekin .

Editor information

Editors and Affiliations

Dept of Electrical and Electronics Eng, Imperial College, London, United Kingdom
Erol Gelenbe
Imperial College, London, United Kingdom
Ricardo Lent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ertekin, Ş. (2013). Adaptive Oversampling for Imbalanced Data Classification. In: Gelenbe, E., Lent, R. (eds) Information Sciences and Systems 2013. Lecture Notes in Electrical Engineering, vol 264. Springer, Cham. https://doi.org/10.1007/978-3-319-01604-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-01604-7_26
Published: 24 September 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01603-0
Online ISBN: 978-3-319-01604-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics