Skip to main content

Performance Comparison of Multi-class SVM with Oversampling Methods for Imbalanced Data Classification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 159))

Abstract

Network traffic data generally comprise a major amount of normal traffic data and a minor amount of attack data. Such an imbalance in the amounts of the two types of data leads to issues such as low prediction performance including misclassifications owing to the estimation bias toward minority data and anomalies. To address this problem, several minority data synthesis models based on the synthetic minority oversampling technique algorithm have been developed. However, in recent years, studies have been actively conducted to synthesize minority data using the newly developed generative adversarial network (GAN) model. In this paper, we examine a GAN based oversampling model to address the data imbalance problem associated with intrusion detection data and compares the performance of the oversampling models. Therefore, the GAN based oversampling model can generate data of a class which has a small number of data so that the problem induced by imbalanced class distribution can be mitigated, and classification performance can be improved. Simulation results using KDD Cup 99 dataset show that the oversampling method using GAN algorithm is effective and that it is superior to existing oversampling methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  2. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004). https://doi.org/10.1111/j.0824-7935.2004.t01-1-00228.x

    Article  MathSciNet  Google Scholar 

  3. Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Ghahramani, Z. (ed.) Proceedings of the 24th International Conference on Machine Learning, Corvallis, 20–24 June 2007, pp 935–942 (2007). https://doi.org/10.1145/1273496.1273614

  4. Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 539–550 (2009). https://doi.org/10.1109/TSMCB.2008.2007853

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  6. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, Hong Kong, 1–8 June 2008, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969

  7. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.S., Zhang, X.P., Huang, G.B. (eds.) Advances in Intelligent Computing. International Conference on Intelligent Computing, Hefei, 23–26 August 2005. Lecture Notes in Computer Science, vol. 3644. Springer, Heidelberg, pp. 878–887 (2005). https://doi.org/10.1007/11538059_91

  8. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27. Proceedings of Neural Information Processing Systems, Montréal, 8–13 December 2014, pp. 2672–2680 (2014)

    Google Scholar 

  9. Ali-Gombe, A., Elyan, E.: MFC-GAN:class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network. Neurocomputing 361, 212–221 (2019). https://doi.org/10.1016/j.neucom.2019.06.043

    Article  Google Scholar 

  10. Lei, X., Maria, S., Alfredo, C., Kalyan, V.: Modeling tabular data using conditional GAN. In:Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 7335–7345 (2019)

    Google Scholar 

  11. Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets. arXiv preprint. arXiv:1411.1784v1 (2014)

  12. Lei, X., Kalyan, V.: Synthesizing Tabular Data using Generative Adversarial Networks. arXiv preprint. arXiv:1811.11264 (2018)

  13. The Third International Knowledge Discovery and Data Mining Tools Competition: KDD Cup 1999 Data (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.htmlAccessed 5 Aug 2020

  14. Hamad, R.A., Kimura, M., Lundström, J.: Efficacy of imbalanced data handling methods on deep learning for smart homes environments. SN Comput. Sci. 1(204) (2020). https://doi.org/10.1007/s42979-020-00211-1

  15. Cao, Y., et al.: Recent advances of generative adversarial networks in computer vision. IEEE Access 7, 14985–15006 (2019). https://doi.org/10.1109/ACCESS.2018.2886814

    Article  Google Scholar 

  16. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, 8–10 July 2009 (2009). https://doi.org/10.1109/CISDA.2009.5356528

  17. Fares, A.H., Sharawy, M.I.: Intrusion detection: supervised machine learning. J. Comput. Sci. Eng. 5(4), 305–313 (2011). https://doi.org/10.5626/JCSE.2011.5.4.305

    Article  Google Scholar 

  18. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001)

    MATH  Google Scholar 

  19. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, 21–24 June 2010, pp. 807–814 (2010)

    Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv preprint. arXiv:1412.6980v9 (2014)

  21. Corder, G.W., Foreman, D.I.: Nonparametric Statistics: A Step-by-Step Approach. Wiley, New Jersey (2014)

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1F1A1060742).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyunhee Park .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Park, S., Park, H. (2021). Performance Comparison of Multi-class SVM with Oversampling Methods for Imbalanced Data Classification. In: Barolli, L., Takizawa, M., Enokido, T., Chen, HC., Matsuo, K. (eds) Advances on Broad-Band Wireless Computing, Communication and Applications. BWCCA 2020. Lecture Notes in Networks and Systems, vol 159. Springer, Cham. https://doi.org/10.1007/978-3-030-61108-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61108-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61107-1

  • Online ISBN: 978-3-030-61108-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics