Performance Comparison of Multi-class SVM with Oversampling Methods for Imbalanced Data Classification

Park, Seunghyun; Park, Hyunhee

doi:10.1007/978-3-030-61108-8_11

Performance Comparison of Multi-class SVM with Oversampling Methods for Imbalanced Data Classification

Seunghyun Park¹⁴ &
Hyunhee Park¹⁵

Conference paper
First Online: 08 October 2020

595 Accesses
1 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 159))

Abstract

Network traffic data generally comprise a major amount of normal traffic data and a minor amount of attack data. Such an imbalance in the amounts of the two types of data leads to issues such as low prediction performance including misclassifications owing to the estimation bias toward minority data and anomalies. To address this problem, several minority data synthesis models based on the synthetic minority oversampling technique algorithm have been developed. However, in recent years, studies have been actively conducted to synthesize minority data using the newly developed generative adversarial network (GAN) model. In this paper, we examine a GAN based oversampling model to address the data imbalance problem associated with intrusion detection data and compares the performance of the oversampling models. Therefore, the GAN based oversampling model can generate data of a class which has a small number of data so that the problem induced by imbalanced class distribution can be mitigated, and classification performance can be improved. Simulation results using KDD Cup 99 dataset show that the oversampling method using GAN algorithm is effective and that it is superior to existing oversampling methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004). https://doi.org/10.1111/j.0824-7935.2004.t01-1-00228.x
Article MathSciNet Google Scholar
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Ghahramani, Z. (ed.) Proceedings of the 24th International Conference on Machine Learning, Corvallis, 20–24 June 2007, pp 935–942 (2007). https://doi.org/10.1145/1273496.1273614
Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 539–550 (2009). https://doi.org/10.1109/TSMCB.2008.2007853
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Article MATH Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, Hong Kong, 1–8 June 2008, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.S., Zhang, X.P., Huang, G.B. (eds.) Advances in Intelligent Computing. International Conference on Intelligent Computing, Hefei, 23–26 August 2005. Lecture Notes in Computer Science, vol. 3644. Springer, Heidelberg, pp. 878–887 (2005). https://doi.org/10.1007/11538059_91
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27. Proceedings of Neural Information Processing Systems, Montréal, 8–13 December 2014, pp. 2672–2680 (2014)
Google Scholar
Ali-Gombe, A., Elyan, E.: MFC-GAN:class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network. Neurocomputing 361, 212–221 (2019). https://doi.org/10.1016/j.neucom.2019.06.043
Article Google Scholar
Lei, X., Maria, S., Alfredo, C., Kalyan, V.: Modeling tabular data using conditional GAN. In:Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 7335–7345 (2019)
Google Scholar
Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets. arXiv preprint. arXiv:1411.1784v1 (2014)
Lei, X., Kalyan, V.: Synthesizing Tabular Data using Generative Adversarial Networks. arXiv preprint. arXiv:1811.11264 (2018)
The Third International Knowledge Discovery and Data Mining Tools Competition: KDD Cup 1999 Data (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.htmlAccessed 5 Aug 2020
Hamad, R.A., Kimura, M., Lundström, J.: Efficacy of imbalanced data handling methods on deep learning for smart homes environments. SN Comput. Sci. 1(204) (2020). https://doi.org/10.1007/s42979-020-00211-1
Cao, Y., et al.: Recent advances of generative adversarial networks in computer vision. IEEE Access 7, 14985–15006 (2019). https://doi.org/10.1109/ACCESS.2018.2886814
Article Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, 8–10 July 2009 (2009). https://doi.org/10.1109/CISDA.2009.5356528
Fares, A.H., Sharawy, M.I.: Intrusion detection: supervised machine learning. J. Comput. Sci. Eng. 5(4), 305–313 (2011). https://doi.org/10.5626/JCSE.2011.5.4.305
Article Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001)
MATH Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, 21–24 June 2010, pp. 807–814 (2010)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv preprint. arXiv:1412.6980v9 (2014)
Corder, G.W., Foreman, D.I.: Nonparametric Statistics: A Step-by-Step Approach. Wiley, New Jersey (2014)
MATH Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1F1A1060742).

Author information

Authors and Affiliations

Korea University, Seoul, South Korea
Seunghyun Park
Myongji University, Yongin-si, Gyeonggi-do, South Korea
Hyunhee Park

Authors

Seunghyun Park
View author publications
You can also search for this author in PubMed Google Scholar
Hyunhee Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyunhee Park .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Department of Advanced Sciences, Faculty of Science and Engineering, Hosei University, Tokyo, Japan
Makoto Takizawa
Faculty of Business Administration, Rissho University, Tokyo, Japan
Tomoya Enokido
Department of Computer Science and Information Engineering, Asian University, Taichung, Taiwan
Hsing-Chung Chen
Department of Information and Communication Engineering, Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Keita Matsuo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Park, H. (2021). Performance Comparison of Multi-class SVM with Oversampling Methods for Imbalanced Data Classification. In: Barolli, L., Takizawa, M., Enokido, T., Chen, HC., Matsuo, K. (eds) Advances on Broad-Band Wireless Computing, Communication and Applications. BWCCA 2020. Lecture Notes in Networks and Systems, vol 159. Springer, Cham. https://doi.org/10.1007/978-3-030-61108-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-61108-8_11
Published: 08 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61107-1
Online ISBN: 978-3-030-61108-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics