Skip to main content

Effective Imbalance Learning Utilizing Informative Data

  • Conference paper
  • First Online:
Data Mining (AusDM 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1741))

Included in the following conference series:

  • 372 Accesses

Abstract

The class imbalance problem that is caused by unequal data distribution usually results in poor performance, and it has attracted increasing attention in the research community. The challenge of the problem is the difficulty to extract sufficient information from the minority class. As a result, the classifier converges to a sub-optimal state. While methods based on resampling and reweighing the cost for different classes are the common strategies to address the problem, there are still numerous issues with these methods such as under- or over-sampling that may remove necessary information or introduce noise, respectively; and reweighing may result in an inappropriate cost matrix.

To address the above shortcomings, in this paper, an enhanced approach based on informative samples is proposed. In our approach, the classifier can indicate which class a sample is closer to by comparing it with boundary samples. The informative samples include the samples from both positive and negative samples located around the boundary. Finally, our experiments show that our proposed method outperforms state-of-the-art algorithms by 18% on \(F_1\) score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8(1), 1–54 (2021). https://doi.org/10.1186/s40537-021-00419-9

    Article  Google Scholar 

  2. Ali, S., Majid, A., Javed, S.G., Sattar, M.: Can-CSC-GBE: developing cost-sensitive classifier with gentleboost ensemble for breast cancer classification using protein amino acids and imbalanced data. Comput. Biol. Med. 73, 38–46 (2016)

    Article  Google Scholar 

  3. Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)

    Article  Google Scholar 

  4. Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  6. Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  7. Chen, B., Xia, S., Chen, Z., Wang, B., Wang, G.: RSMOTE: a self-adaptive robust smote for imbalanced problems with label noise. Inf. Sci. 553, 397–428 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cheng, F., Zhang, J., Wen, C.: Cost-sensitive large margin distribution machine for classification of imbalanced data. Pattern Recogn. Lett. 80, 107–112 (2016)

    Article  Google Scholar 

  9. Chi, J., et al.: Learning to undersampling for class imbalanced credit risk forecasting. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 72–81. IEEE (2020)

    Google Scholar 

  10. Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11. Citeseer (2003)

    Google Scholar 

  11. Dumpala, S.H., Chakraborty, R., Kopparapu, S.K., Reseach, T.: A novel data representation for effective learning in class imbalanced scenarios. In: IJCAI, pp. 2100–2106 (2018)

    Google Scholar 

  12. Fernández, A., LóPez, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)

    Article  Google Scholar 

  13. Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156. Citeseer (1996)

    Google Scholar 

  14. García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 25(1), 13–21 (2012)

    Article  Google Scholar 

  15. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  16. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  17. He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)

    Book  MATH  Google Scholar 

  18. Holte, R.C., Acker, L., Porter, B.W., et al.: Concept learning and the problem of small disjuncts. In: IJCAI, vol. 89, pp. 813–818. Citeseer (1989)

    Google Scholar 

  19. Hoyos-Osorio, J., Alvarez-Meza, A., Daza-Santacoloma, G., Orozco-Gutierrez, A., Castellanos-Dominguez, G.: Relevant information undersampling to support imbalanced data classification. Neurocomputing 436, 136–146 (2021)

    Article  Google Scholar 

  20. Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, vol. 2, pp. 13–17. Citeseer (2009)

    Google Scholar 

  21. Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019). https://doi.org/10.1186/s40537-019-0192-5

    Article  Google Scholar 

  22. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  23. Lee, J., Sun, Y.G., Sim, I., Kim, S.H., Kim, D.I., Kim, J.Y.: Non-technical loss detection using deep reinforcement learning for feature cost efficiency and imbalanced dataset. IEEE Access 10, 27084–27095 (2022)

    Article  Google Scholar 

  24. Li, B., Liu, Y., Wang, X.: Gradient harmonized single-stage detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8577–8584 (2019)

    Google Scholar 

  25. Liao, T., Taori, R., Raji, I.D., Schmidt, L.: Are we learning yet? A meta review of evaluation failures across machine learning. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)

    Google Scholar 

  26. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  27. Liu, W., Wang, L., Chen, J., Zhou, Y., Zheng, R., He, J.: A partial label metric learning algorithm for class imbalanced data. In: Asian Conference on Machine Learning, pp. 1413–1428. PMLR (2021)

    Google Scholar 

  28. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)

    Google Scholar 

  29. Mardani, M., Mateos, G., Giannakis, G.B.: Subspace learning and imputation for streaming big data matrices and tensors. IEEE Trans. Signal Process. 63(10), 2663–2677 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  30. Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016). https://doi.org/10.1007/s10844-015-0368-1

    Article  Google Scholar 

  31. Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. In: Proceedings: Fifth International Workshop on Computational Intelligence & Applications, vol. 2009, pp. 24–29. IEEE SMC, Hiroshima Chapter (2009)

    Google Scholar 

  32. Qin, H., Zhou, H., Cao, J.: Imbalanced learning algorithm based intelligent abnormal electricity consumption detection. Neurocomputing 402, 112–123 (2020)

    Article  Google Scholar 

  33. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans 40(1), 185–197 (2009)

    Article  Google Scholar 

  34. Shu, J., et al.: Meta-weight-net: learning an explicit mapping for sample weighting. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  35. Stefanowski, J.: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna, S., Jain, L., Howlett, R. (eds.) Emerging Paradigms in Machine Learning. Smart Innovation, Systems and Technologies, vol. 13, pp. 277–306. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28699-5_11

    Chapter  Google Scholar 

  36. Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021)

    Article  Google Scholar 

  37. Tripathi, A., Chakraborty, R., Kopparapu, S.K.: A novel adaptive minority oversampling technique for improved classification in data imbalanced scenarios. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10650–10657. IEEE (2021)

    Google Scholar 

  38. Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631 (2021)

    Article  Google Scholar 

  39. Wang, L., Han, M., Li, X., Zhang, N., Cheng, H.: Review of classification methods on unbalanced data sets. IEEE Access 9, 64606–64628 (2021)

    Article  Google Scholar 

  40. Wei, T., Shi, J.X., Li, Y.F., Zhang, M.L.: Prototypical classifier for robust class-imbalanced learning. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) Advances in Knowledge Discovery and Data Mining. LNCS, vol. 13281, pp. 44–57. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-05936-0_4

    Chapter  Google Scholar 

  41. Wen, G., Wu, K.: Building decision tree for imbalanced classification via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 1645–1659. PMLR (2021)

    Google Scholar 

  42. Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., Han, X.: A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf. Sci. 572, 574–589 (2021)

    Article  MathSciNet  Google Scholar 

  43. Yin, J., Gan, C., Zhao, K., Lin, X., Quan, Z., Wang, Z.J.: A novel model for imbalanced data classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6680–6687 (2020)

    Google Scholar 

  44. Zhang, C., Gao, W., Song, J., Jiang, J.: An imbalanced data classification algorithm of improved autoencoder neural network. In: 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI), pp. 95–99. IEEE (2016)

    Google Scholar 

  45. Zhao, T., Zhang, X., Wang, S.: GraphSMOTE: imbalanced node classification on graphs with graph neural networks. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 833–841 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raymond Wong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tai, H., Wong, R., Li, B. (2022). Effective Imbalance Learning Utilizing Informative Data. In: Park, L.A.F., et al. Data Mining. AusDM 2022. Communications in Computer and Information Science, vol 1741. Springer, Singapore. https://doi.org/10.1007/978-981-19-8746-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8746-5_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8745-8

  • Online ISBN: 978-981-19-8746-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics