Abstract
Class imbalance is a serious issue in classification as a traditional classifier is generally biased towards the majority class. The accuracy of the classifier could be further impacted in cases where additionally to the class imbalance, there are overlapped data instances. Further, data sparsity has shown to be a possible issue that may lead to non- invariance and poor generalisation. Data augmentation is a technique that can handle the generalisation issue and improve the regularisation of the Deep Neural Network (DNN). A method to handle both class overlap and class imbalance while also incorporating regularisation is proposed in this paper. In our work, the imbalanced dataset is balanced using SMOTETomek, and then the non-categorical attributes are fuzzified. The purpose of fuzzifying the attributes is to handle the overlapping in the data and provide some form of data augmentation that can be used as a regularisation technique. Therefore, in this paper, the invariance is achieved as the augmented data are generated based on the fuzzy concept. The balanced augmented dataset is then trained using a DNN classifier. The datasets used in the experiments were selected from UCI and KEEL data repositories. The experiments show that the proposed Fuzzy data augmentation for handling overlapped and imbalanced data can address the overlapped and imbalanced data issues, and provide regularisation using data augmentation for numerical data to improve the performance of a DNN classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Mexican International Conference on Artificial Intelligence. Springer (2004)
GarcÃa, V., et al.: Combined effects of class imbalance and class overlap on instance-based classification. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer (2006)
Wang, Z., et al.: SMOTETomek-based resampling for personality recognition. IEEE Access 7, 129678–129689 (2019)
Zixi, L., et al.: Nondestructive detection of apple mouldy core disease based on unbalanced dielectric data. In: 2019 Chinese Automation Congress (CAC). IEEE (2019)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Wang, C.-S., et al.: Detecting potential adverse drug reactions using a deep neural network model. J. Med. Internet Res. 21(2), e11016 (2019)
Katzman, J.L., et al.: DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 18(1), 24 (2018)
Kim, H.-C., Bandettini, P.A., Lee, J.-H.: Deep neural network predicts emotional responses of the human brain from functional magnetic resonance imaging. Neuroimage 186, 607–627 (2019)
Saleem, N., et al.: Deep neural network for supervised single-channel speech enhancement. Arch. Acoust. 44(1), 3–12 (2019)
Sumit, S.H., Akhter, S.: C-means clustering and deep-neuro-fuzzy classification for road weight measurement in traffic management system. Soft. Comput. 23(12), 4329–4340 (2018). https://doi.org/10.1007/s00500-018-3086-0
Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019). https://doi.org/10.1186/s40537-019-0192-5
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, Cambridge (2016)
Cui, X., Goel, V., Kingsbury, B.: Data augmentation for deep neural network acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(9), 1469–1477 (2015)
Tomaszewska, K.: The application of horizontal membership functions to fuzzy arithmetic operations. J. Theoret. Appl. Comput. Sci. 8(2), 3–10 (2014)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)
Pruengkarn, R., Wong, K.W., Fung, C.C.: Imbalanced data classification using complementary fuzzy support vector machine techniques and SMOTE. In: Systems, Man, and Cybernetics (SMC), 2017 IEEE International Conference on 2017. IEEE (2017)
Dabare, R., Wong, K.W., Shiratuddin, M.F., Koutsakis, P.: Fuzzy deep neural network for classification of overlapped data. In: Gedeon, T., Wong, K.W., Lee, M. (eds.) ICONIP 2019. LNCS, vol. 11953, pp. 633–643. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36708-4_52
Chen, S.-Y., Feng, Z., Yi, X.: A general introduction to adjustment for multiple comparisons. J. Thorac. Dis. 9(6), 1725 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Dabare, R., Wong, K.W., Shiratuddin, M.F., Koutsakis, P. (2021). Fuzzy Data Augmentation for Handling Overlapped and Imbalanced Data. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_73
Download citation
DOI: https://doi.org/10.1007/978-3-030-92307-5_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)