Skip to main content

Augmentation Based Synthetic Sampling and Ensemble Techniques for Imbalanced Data Classification

  • Conference paper
  • First Online:
Intelligence Science IV (ICIS 2022)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 659))

Included in the following conference series:

  • 959 Accesses

Abstract

The imbalance data problem appears in data mining fields and has recently attracted the attention of researchers. In order to solve this problem, scholars proposed various approaches such as undersampling majority class, oversampling minority class, synthetic Minority Oversampling (SMOTE) technique, Proximity Weighted Random Affine Shadowsampling (ProWRAS), etc. However, this work proposes a new method called Augmentation Based Synthetic Sampling (ABS) for imbalanced data classification that concatenates data to predict features with imbalance problems. The proposed study integrates sampling and concatenated features to generate synthetic data. This study shows the ability of the proposed method and the average of the AUC (area under the curve) to generate good data samples while experimenting compared to the previous study. In addition, this study merged the proposed method with the boosting to create a technique known as ABSBoost. Therefore, the experimental outcomes show that the proposed ABS method and ABSBoost are effective on the given datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bej, S., Schulz, K., Srivastava, P., Wolfien, M., Wolkenhauer, O.: A multi schematic classifier independent oversampling approach for imbalanced datasets. IEEE Access 9, 123358–123374 (2021)

    Article  Google Scholar 

  2. Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using gan for improved liver lesion classification. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 289–293. IEEE (2018)

    Google Scholar 

  3. Gameng, H.A., Gerardo, B.B., Medina, R.P.: Modified adaptive synthetic smote to improve classification performance in imbalanced datasets. In: 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), pp. 1–5. IEEE (2019)

    Google Scholar 

  4. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  5. Jiang, X., Ge, Z.: Data augmentation classifier for imbalanced fault classification. IEEE Trans. Autom. Sci. Eng. 18(3), 1206–1217 (2020)

    Article  Google Scholar 

  6. Khurana, A., Verma, O.P.: Optimal feature selection for imbalanced text classification. IEEE Trans. Artifi. Intell. (2022)

    Google Scholar 

  7. Laermann, J., Samek, W., Strodthoff, N.: Achieving generalizable robustness of deep neural networks by stability training. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 360–373. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33676-9_25

    Chapter  Google Scholar 

  8. Liu, C.L., Chang, Y.H.: Learning from imbalanced data with deep density hybrid sampling. IEEE Trans. Syst. Man Cybern. Syst. (2022)

    Google Scholar 

  9. Liu, C.L., Hsieh, P.Y.: Model-based synthetic sampling for imbalanced data. IEEE Trans. Knowl. Data Eng. 32(8), 1543–1556 (2019)

    Article  Google Scholar 

  10. Niu, J., Liu, Z., Lu, Y., Wen, Z.: Evidential combination of classifiers for imbalanced data. IEEE Trans. Syst. Man Cybern. Syst. (2022)

    Google Scholar 

  11. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence rated predictions. Mach. Learn. 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  12. Taylor, L., Nitschke, G.: Improving deep learning with generic data augmentation. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1542–1547. IEEE (2018)

    Google Scholar 

  13. Wah, Y.B., Rahman, H.A.A., He, H., Bulgiba, A.: Handling imbalanced dataset using svm and knn approach. In: AIP Conference Proceedings, vol. 1750, p. 020023. AIP Publishing LLC (2016)

    Google Scholar 

  14. Yan, Y., Zhu, Y., Liu, R., Zhang, Y., Zhang, Y., Zhang, L.: Spatial distribution-based imbalanced undersampling. IEEE Trans. Knowl. Data Eng. (2022)

    Google Scholar 

  15. Yuan, Z., Zhao, P.: An improved ensemble learning for imbalanced data classification. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 408–411. IEEE (2019)

    Google Scholar 

  16. Yusof, R., Kasmiran, K.A., Mustapha, A., Mustapha, N., MOHD ZIN, N.A.: Techniques for handling imbalanced datasets when producing classifier models. J Theor. Appli. Inf. Technol. 95(7) (2017)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China under Grants Nos. 62176200, 61773304, and 61871306, the Natural Science Basic Research Program of Shaanxi under Grant No.2022JC-45, 2022JQ-616 and the Open Research Projects of Zhejiang Lab under Grant 2021KG0AB03, the 111 Project, the National Key R &D Program of China, the Guangdong Provincial Key Laboratory under Grant No. 2020B121201001 and the GuangDong Basic and Applied Basic Research Foundation under Grant No. 2021A1515110686.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ronghua Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Asefaw, W.M., Shang, R., Okoth, M.A., Jiao, L. (2022). Augmentation Based Synthetic Sampling and Ensemble Techniques for Imbalanced Data Classification. In: Shi, Z., Jin, Y., Zhang, X. (eds) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-031-14903-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14903-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14902-3

  • Online ISBN: 978-3-031-14903-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics