Effective Imbalance Learning Utilizing Informative Data

Tai, Han; Wong, Raymond; Li, Bing

doi:10.1007/978-981-19-8746-5_8

Han Tai¹³,
Raymond Wong¹³ &
Bing Li¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1741))

Included in the following conference series:

Australasian Conference on Data Mining

372 Accesses

Abstract

The class imbalance problem that is caused by unequal data distribution usually results in poor performance, and it has attracted increasing attention in the research community. The challenge of the problem is the difficulty to extract sufficient information from the minority class. As a result, the classifier converges to a sub-optimal state. While methods based on resampling and reweighing the cost for different classes are the common strategies to address the problem, there are still numerous issues with these methods such as under- or over-sampling that may remove necessary information or introduce noise, respectively; and reweighing may result in an inappropriate cost matrix.

To address the above shortcomings, in this paper, an enhanced approach based on informative samples is proposed. In our approach, the classifier can indicate which class a sample is closer to by comparing it with boundary samples. The informative samples include the samples from both positive and negative samples located around the boundary. Finally, our experiments show that our proposed method outperforms state-of-the-art algorithms by 18% on \(F_1\) score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8(1), 1–54 (2021). https://doi.org/10.1186/s40537-021-00419-9
Article Google Scholar
Ali, S., Majid, A., Javed, S.G., Sattar, M.: Can-CSC-GBE: developing cost-sensitive classifier with gentleboost ensemble for breast cancer classification using protein amino acids and imbalanced data. Comput. Biol. Med. 73, 38–46 (2016)
Article Google Scholar
Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)
Article Google Scholar
Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Chen, B., Xia, S., Chen, Z., Wang, B., Wang, G.: RSMOTE: a self-adaptive robust smote for imbalanced problems with label noise. Inf. Sci. 553, 397–428 (2021)
Article MathSciNet MATH Google Scholar
Cheng, F., Zhang, J., Wen, C.: Cost-sensitive large margin distribution machine for classification of imbalanced data. Pattern Recogn. Lett. 80, 107–112 (2016)
Article Google Scholar
Chi, J., et al.: Learning to undersampling for class imbalanced credit risk forecasting. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 72–81. IEEE (2020)
Google Scholar
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11. Citeseer (2003)
Google Scholar
Dumpala, S.H., Chakraborty, R., Kopparapu, S.K., Reseach, T.: A novel data representation for effective learning in class imbalanced scenarios. In: IJCAI, pp. 2100–2106 (2018)
Google Scholar
Fernández, A., LóPez, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)
Article Google Scholar
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156. Citeseer (1996)
Google Scholar
García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 25(1), 13–21 (2012)
Article Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)
Book MATH Google Scholar
Holte, R.C., Acker, L., Porter, B.W., et al.: Concept learning and the problem of small disjuncts. In: IJCAI, vol. 89, pp. 813–818. Citeseer (1989)
Google Scholar
Hoyos-Osorio, J., Alvarez-Meza, A., Daza-Santacoloma, G., Orozco-Gutierrez, A., Castellanos-Dominguez, G.: Relevant information undersampling to support imbalanced data classification. Neurocomputing 436, 136–146 (2021)
Article Google Scholar
Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, vol. 2, pp. 13–17. Citeseer (2009)
Google Scholar
Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019). https://doi.org/10.1186/s40537-019-0192-5
Article Google Scholar
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Article Google Scholar
Lee, J., Sun, Y.G., Sim, I., Kim, S.H., Kim, D.I., Kim, J.Y.: Non-technical loss detection using deep reinforcement learning for feature cost efficiency and imbalanced dataset. IEEE Access 10, 27084–27095 (2022)
Article Google Scholar
Li, B., Liu, Y., Wang, X.: Gradient harmonized single-stage detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8577–8584 (2019)
Google Scholar
Liao, T., Taori, R., Raji, I.D., Schmidt, L.: Are we learning yet? A meta review of evaluation failures across machine learning. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, W., Wang, L., Chen, J., Zhou, Y., Zheng, R., He, J.: A partial label metric learning algorithm for class imbalanced data. In: Asian Conference on Machine Learning, pp. 1413–1428. PMLR (2021)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)
Google Scholar
Mardani, M., Mateos, G., Giannakis, G.B.: Subspace learning and imputation for streaming big data matrices and tensors. IEEE Trans. Signal Process. 63(10), 2663–2677 (2015)
Article MathSciNet MATH Google Scholar
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016). https://doi.org/10.1007/s10844-015-0368-1
Article Google Scholar
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. In: Proceedings: Fifth International Workshop on Computational Intelligence & Applications, vol. 2009, pp. 24–29. IEEE SMC, Hiroshima Chapter (2009)
Google Scholar
Qin, H., Zhou, H., Cao, J.: Imbalanced learning algorithm based intelligent abnormal electricity consumption detection. Neurocomputing 402, 112–123 (2020)
Article Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans 40(1), 185–197 (2009)
Article Google Scholar
Shu, J., et al.: Meta-weight-net: learning an explicit mapping for sample weighting. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Stefanowski, J.: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna, S., Jain, L., Howlett, R. (eds.) Emerging Paradigms in Machine Learning. Smart Innovation, Systems and Technologies, vol. 13, pp. 277–306. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28699-5_11
Chapter Google Scholar
Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021)
Article Google Scholar
Tripathi, A., Chakraborty, R., Kopparapu, S.K.: A novel adaptive minority oversampling technique for improved classification in data imbalanced scenarios. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10650–10657. IEEE (2021)
Google Scholar
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631 (2021)
Article Google Scholar
Wang, L., Han, M., Li, X., Zhang, N., Cheng, H.: Review of classification methods on unbalanced data sets. IEEE Access 9, 64606–64628 (2021)
Article Google Scholar
Wei, T., Shi, J.X., Li, Y.F., Zhang, M.L.: Prototypical classifier for robust class-imbalanced learning. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) Advances in Knowledge Discovery and Data Mining. LNCS, vol. 13281, pp. 44–57. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-05936-0_4
Chapter Google Scholar
Wen, G., Wu, K.: Building decision tree for imbalanced classification via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 1645–1659. PMLR (2021)
Google Scholar
Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., Han, X.: A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf. Sci. 572, 574–589 (2021)
Article MathSciNet Google Scholar
Yin, J., Gan, C., Zhao, K., Lin, X., Quan, Z., Wang, Z.J.: A novel model for imbalanced data classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6680–6687 (2020)
Google Scholar
Zhang, C., Gao, W., Song, J., Jiang, J.: An imbalanced data classification algorithm of improved autoencoder neural network. In: 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI), pp. 95–99. IEEE (2016)
Google Scholar
Zhao, T., Zhang, X., Wang, S.: GraphSMOTE: imbalanced node classification on graphs with graph neural networks. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 833–841 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Han Tai & Raymond Wong
The A*STAR Centre for Frontier AI Research (CFAR), Singapore, Singapore
Bing Li

Authors

Han Tai
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Wong
View author publications
You can also search for this author in PubMed Google Scholar
Bing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raymond Wong .

Editor information

Editors and Affiliations

Western Sydney University, Sydney, NSW, Australia
Laurence A. F. Park
Victoria University of Wellington, Wellington, New Zealand
Heitor Murilo Gomes
Auckland University of Technology, Auckland, New Zealand
Maryam Doborjeh
RMIT University, Melbourne, VIC, Australia
Yee Ling Boo
University of Auckland, Auckland, New Zealand
Yun Sing Koh
CSIRO Scientific Computing, Canberra, ACT, Australia
Yanchang Zhao
Australian National University, Canberra, ACT, Australia
Graham Williams
Western Sydney University, Sydney, NSW, Australia
Simeon Simoff

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tai, H., Wong, R., Li, B. (2022). Effective Imbalance Learning Utilizing Informative Data. In: Park, L.A.F., et al. Data Mining. AusDM 2022. Communications in Computer and Information Science, vol 1741. Springer, Singapore. https://doi.org/10.1007/978-981-19-8746-5_8

Download citation

DOI: https://doi.org/10.1007/978-981-19-8746-5_8
Published: 05 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8745-8
Online ISBN: 978-981-19-8746-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics