Skip to main content

Comprehensive Analysis of Different Techniques for Data Augmentation and Proposal of New Variants of BOSME and GAN

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2023)

Abstract

In many environments in which detection of minority class instances is critical, the available data intended for training Machine Learning models is poorly distributed. The data imbalance usually produces deterioration of the trained model by generalisation of instances belonging to minority class predicting as majority class instances. To avoid these, different techniques have been adopted in the literature and expand the original database such as Generative Adversarial Networks (GANs) or Bayesian network-based over-sampling method (BOSME). Starting from these two methods, in this work we propose three new variants of data augmentation to address data imbalance issue. We use traffic event data from three different areas of California divided in two subgroups attending their severity. Experiments show that top performance cases where reached after using our variants. The importance of data augmentation techniques as preprocessing tool has been proved as well, as a consequence of performance drop of systems in which original databases with imbalanced data where used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caltrans. performance measurement system (pems). Accessed 07 Mar 2023, http://pems.dot.ca.gov,

  2. Anantharam, P., Barnaghi, P., Thirunarayan, K., Sheth, A.: Extracting city traffic events from social streams. ACM Trans. Intell. Syst. Technol. 6, 1–27 (2015). https://doi.org/10.1145/2717317

    Article  Google Scholar 

  3. Anantharam, P., Thirunarayan, K., Marupudi, S., Sheth, A., Banerjee, T.: Understanding city traffic dynamics utilizing sensor and textual observations, vol. 30 (2016)

    Google Scholar 

  4. Camino, R.D., Hammerschmidt, C.A., State, R.: Generating multi-categorical samples with generative adversarial networks. ArXiv abs/1807.01202 (2018)

    Google Scholar 

  5. Chen, Q., Wang, W., Huang, K., De, S., Coenen, F.: Multi-modal generative adversarial networks for traffic event detection in smart cities. Expert Syst. Appl. 177, 114939 (2021). https://doi.org/10.1016/j.eswa.2021.114939, https://www.sciencedirect.com/science/article/pii/S0957417421003808

  6. Ding, H., Chen, L., Dong, L., Fu, Z., Cui, X.: Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection. Future Gener. Comput. Syst. 131, 240–254 (2022). https://doi.org/10.1016/j.future.2022.01.026, https://www.sciencedirect.com/science/article/pii/S0167739X22000346

  7. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems. Curran Associates Inc

    Google Scholar 

  8. Hou, J.C., Wang, S.S., Lai, Y.H., Tsao, Y., Chang, H.W., Wang, H.M.: Audio-visual speech enhancement using multimodal deep convolutional neural networks. IEEE Trans. Emerg. Top. Comput. Intell. 2(2), 117–128 (2018). https://doi.org/10.1109/TETCI.2017.2784878

    Article  Google Scholar 

  9. Lan, J., Liu, X., Li, B., Sun, J., Li, B., Zhao, J.: Member: a multi-task learning model with hybrid deep features for network intrusion detection. Comput. Secur. 123, 102919 (2022). https://doi.org/10.1016/j.cose.2022.102919, https://www.sciencedirect.com/science/article/pii/S016740482200311X

  10. Pan, B., Zheng, Y., Wilkie, D., Shahabi, C.: Crowd sensing of traffic anomalies based on human mobility and social media. IN: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (2013)

    Google Scholar 

  11. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2012)

    MathSciNet  MATH  Google Scholar 

  12. Rosario, D., Nuñez-Gonzalez, J.D.: Bayesian network-based over-sampling method (bosme) with application to indirect cost-sensitive learning. Sci. Rep. 12 (2022). https://doi.org/10.1038/s41598-022-12682-8, https://www.nature.com/articles/s41598-022-12682-8

  13. Wang, Y., et al.: Eann: event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 849–857. KDD 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3219819.3219903

  14. Xu, L.: Synthesizing tabular data using generative adversarial networks (2018)

    Google Scholar 

  15. Zhao, Z., Kunar, A., van der Scheer, H., Birke, R., Chen, L.Y.: Ctab-gan: Effective table data synthesizing. ArXiv abs/2102.08369 (2021)

    Google Scholar 

Download references

Acknowledgments

Authors received research funds from 59 the Basque Government as the head of the Grupo de Inteligencia Computacional, Universidad del Pais Vasco, UPV/EHU, from 2007 until 2025. The current code for the grant is IT1689-22. Additionally, authors participate in Elkartek projects KK-2022/00051 and KK-2021/00070. The Spanish MCIN 5has also granted the authors a research project under code PID2020-116346GB-I00.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asier Garmendia-Orbegozo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garmendia-Orbegozo, A., Nuñez-Gonzalez, J.D., Anton Gonzalez, M.A., Graña, M. (2023). Comprehensive Analysis of Different Techniques for Data Augmentation and Proposal of New Variants of BOSME and GAN. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham. https://doi.org/10.1007/978-3-031-40725-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40725-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40724-6

  • Online ISBN: 978-3-031-40725-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics