Abstract
In many environments in which detection of minority class instances is critical, the available data intended for training Machine Learning models is poorly distributed. The data imbalance usually produces deterioration of the trained model by generalisation of instances belonging to minority class predicting as majority class instances. To avoid these, different techniques have been adopted in the literature and expand the original database such as Generative Adversarial Networks (GANs) or Bayesian network-based over-sampling method (BOSME). Starting from these two methods, in this work we propose three new variants of data augmentation to address data imbalance issue. We use traffic event data from three different areas of California divided in two subgroups attending their severity. Experiments show that top performance cases where reached after using our variants. The importance of data augmentation techniques as preprocessing tool has been proved as well, as a consequence of performance drop of systems in which original databases with imbalanced data where used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caltrans. performance measurement system (pems). Accessed 07 Mar 2023, http://pems.dot.ca.gov,
Anantharam, P., Barnaghi, P., Thirunarayan, K., Sheth, A.: Extracting city traffic events from social streams. ACM Trans. Intell. Syst. Technol. 6, 1–27 (2015). https://doi.org/10.1145/2717317
Anantharam, P., Thirunarayan, K., Marupudi, S., Sheth, A., Banerjee, T.: Understanding city traffic dynamics utilizing sensor and textual observations, vol. 30 (2016)
Camino, R.D., Hammerschmidt, C.A., State, R.: Generating multi-categorical samples with generative adversarial networks. ArXiv abs/1807.01202 (2018)
Chen, Q., Wang, W., Huang, K., De, S., Coenen, F.: Multi-modal generative adversarial networks for traffic event detection in smart cities. Expert Syst. Appl. 177, 114939 (2021). https://doi.org/10.1016/j.eswa.2021.114939, https://www.sciencedirect.com/science/article/pii/S0957417421003808
Ding, H., Chen, L., Dong, L., Fu, Z., Cui, X.: Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection. Future Gener. Comput. Syst. 131, 240–254 (2022). https://doi.org/10.1016/j.future.2022.01.026, https://www.sciencedirect.com/science/article/pii/S0167739X22000346
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems. Curran Associates Inc
Hou, J.C., Wang, S.S., Lai, Y.H., Tsao, Y., Chang, H.W., Wang, H.M.: Audio-visual speech enhancement using multimodal deep convolutional neural networks. IEEE Trans. Emerg. Top. Comput. Intell. 2(2), 117–128 (2018). https://doi.org/10.1109/TETCI.2017.2784878
Lan, J., Liu, X., Li, B., Sun, J., Li, B., Zhao, J.: Member: a multi-task learning model with hybrid deep features for network intrusion detection. Comput. Secur. 123, 102919 (2022). https://doi.org/10.1016/j.cose.2022.102919, https://www.sciencedirect.com/science/article/pii/S016740482200311X
Pan, B., Zheng, Y., Wilkie, D., Shahabi, C.: Crowd sensing of traffic anomalies based on human mobility and social media. IN: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (2013)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2012)
Rosario, D., Nuñez-Gonzalez, J.D.: Bayesian network-based over-sampling method (bosme) with application to indirect cost-sensitive learning. Sci. Rep. 12 (2022). https://doi.org/10.1038/s41598-022-12682-8, https://www.nature.com/articles/s41598-022-12682-8
Wang, Y., et al.: Eann: event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 849–857. KDD 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3219819.3219903
Xu, L.: Synthesizing tabular data using generative adversarial networks (2018)
Zhao, Z., Kunar, A., van der Scheer, H., Birke, R., Chen, L.Y.: Ctab-gan: Effective table data synthesizing. ArXiv abs/2102.08369 (2021)
Acknowledgments
Authors received research funds from 59 the Basque Government as the head of the Grupo de Inteligencia Computacional, Universidad del Pais Vasco, UPV/EHU, from 2007 until 2025. The current code for the grant is IT1689-22. Additionally, authors participate in Elkartek projects KK-2022/00051 and KK-2021/00070. The Spanish MCIN 5has also granted the authors a research project under code PID2020-116346GB-I00.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Garmendia-Orbegozo, A., Nuñez-Gonzalez, J.D., Anton Gonzalez, M.A., Graña, M. (2023). Comprehensive Analysis of Different Techniques for Data Augmentation and Proposal of New Variants of BOSME and GAN. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham. https://doi.org/10.1007/978-3-031-40725-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-40725-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40724-6
Online ISBN: 978-3-031-40725-3
eBook Packages: Computer ScienceComputer Science (R0)