Abstract
The imbalance data problem appears in data mining fields and has recently attracted the attention of researchers. In order to solve this problem, scholars proposed various approaches such as undersampling majority class, oversampling minority class, synthetic Minority Oversampling (SMOTE) technique, Proximity Weighted Random Affine Shadowsampling (ProWRAS), etc. However, this work proposes a new method called Augmentation Based Synthetic Sampling (ABS) for imbalanced data classification that concatenates data to predict features with imbalance problems. The proposed study integrates sampling and concatenated features to generate synthetic data. This study shows the ability of the proposed method and the average of the AUC (area under the curve) to generate good data samples while experimenting compared to the previous study. In addition, this study merged the proposed method with the boosting to create a technique known as ABSBoost. Therefore, the experimental outcomes show that the proposed ABS method and ABSBoost are effective on the given datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bej, S., Schulz, K., Srivastava, P., Wolfien, M., Wolkenhauer, O.: A multi schematic classifier independent oversampling approach for imbalanced datasets. IEEE Access 9, 123358–123374 (2021)
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using gan for improved liver lesion classification. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 289–293. IEEE (2018)
Gameng, H.A., Gerardo, B.B., Medina, R.P.: Modified adaptive synthetic smote to improve classification performance in imbalanced datasets. In: 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), pp. 1–5. IEEE (2019)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Jiang, X., Ge, Z.: Data augmentation classifier for imbalanced fault classification. IEEE Trans. Autom. Sci. Eng. 18(3), 1206–1217 (2020)
Khurana, A., Verma, O.P.: Optimal feature selection for imbalanced text classification. IEEE Trans. Artifi. Intell. (2022)
Laermann, J., Samek, W., Strodthoff, N.: Achieving generalizable robustness of deep neural networks by stability training. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 360–373. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33676-9_25
Liu, C.L., Chang, Y.H.: Learning from imbalanced data with deep density hybrid sampling. IEEE Trans. Syst. Man Cybern. Syst. (2022)
Liu, C.L., Hsieh, P.Y.: Model-based synthetic sampling for imbalanced data. IEEE Trans. Knowl. Data Eng. 32(8), 1543–1556 (2019)
Niu, J., Liu, Z., Lu, Y., Wen, Z.: Evidential combination of classifiers for imbalanced data. IEEE Trans. Syst. Man Cybern. Syst. (2022)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence rated predictions. Mach. Learn. 37(3), 297–336 (1999)
Taylor, L., Nitschke, G.: Improving deep learning with generic data augmentation. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1542–1547. IEEE (2018)
Wah, Y.B., Rahman, H.A.A., He, H., Bulgiba, A.: Handling imbalanced dataset using svm and knn approach. In: AIP Conference Proceedings, vol. 1750, p. 020023. AIP Publishing LLC (2016)
Yan, Y., Zhu, Y., Liu, R., Zhang, Y., Zhang, Y., Zhang, L.: Spatial distribution-based imbalanced undersampling. IEEE Trans. Knowl. Data Eng. (2022)
Yuan, Z., Zhao, P.: An improved ensemble learning for imbalanced data classification. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 408–411. IEEE (2019)
Yusof, R., Kasmiran, K.A., Mustapha, A., Mustapha, N., MOHD ZIN, N.A.: Techniques for handling imbalanced datasets when producing classifier models. J Theor. Appli. Inf. Technol. 95(7) (2017)
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China under Grants Nos. 62176200, 61773304, and 61871306, the Natural Science Basic Research Program of Shaanxi under Grant No.2022JC-45, 2022JQ-616 and the Open Research Projects of Zhejiang Lab under Grant 2021KG0AB03, the 111 Project, the National Key R &D Program of China, the Guangdong Provincial Key Laboratory under Grant No. 2020B121201001 and the GuangDong Basic and Applied Basic Research Foundation under Grant No. 2021A1515110686.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Asefaw, W.M., Shang, R., Okoth, M.A., Jiao, L. (2022). Augmentation Based Synthetic Sampling and Ensemble Techniques for Imbalanced Data Classification. In: Shi, Z., Jin, Y., Zhang, X. (eds) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-031-14903-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-14903-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14902-3
Online ISBN: 978-3-031-14903-0
eBook Packages: Computer ScienceComputer Science (R0)