Abstract
The class imbalance problem that is caused by unequal data distribution usually results in poor performance, and it has attracted increasing attention in the research community. The challenge of the problem is the difficulty to extract sufficient information from the minority class. As a result, the classifier converges to a sub-optimal state. While methods based on resampling and reweighing the cost for different classes are the common strategies to address the problem, there are still numerous issues with these methods such as under- or over-sampling that may remove necessary information or introduce noise, respectively; and reweighing may result in an inappropriate cost matrix.
To address the above shortcomings, in this paper, an enhanced approach based on informative samples is proposed. In our approach, the classifier can indicate which class a sample is closer to by comparing it with boundary samples. The informative samples include the samples from both positive and negative samples located around the boundary. Finally, our experiments show that our proposed method outperforms state-of-the-art algorithms by 18% on \(F_1\) score.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8(1), 1–54 (2021). https://doi.org/10.1186/s40537-021-00419-9
Ali, S., Majid, A., Javed, S.G., Sattar, M.: Can-CSC-GBE: developing cost-sensitive classifier with gentleboost ensemble for breast cancer classification using protein amino acids and imbalanced data. Comput. Biol. Med. 73, 38–46 (2016)
Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)
Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Chen, B., Xia, S., Chen, Z., Wang, B., Wang, G.: RSMOTE: a self-adaptive robust smote for imbalanced problems with label noise. Inf. Sci. 553, 397–428 (2021)
Cheng, F., Zhang, J., Wen, C.: Cost-sensitive large margin distribution machine for classification of imbalanced data. Pattern Recogn. Lett. 80, 107–112 (2016)
Chi, J., et al.: Learning to undersampling for class imbalanced credit risk forecasting. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 72–81. IEEE (2020)
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11. Citeseer (2003)
Dumpala, S.H., Chakraborty, R., Kopparapu, S.K., Reseach, T.: A novel data representation for effective learning in class imbalanced scenarios. In: IJCAI, pp. 2100–2106 (2018)
Fernández, A., LóPez, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156. Citeseer (1996)
GarcÃa, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 25(1), 13–21 (2012)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)
Holte, R.C., Acker, L., Porter, B.W., et al.: Concept learning and the problem of small disjuncts. In: IJCAI, vol. 89, pp. 813–818. Citeseer (1989)
Hoyos-Osorio, J., Alvarez-Meza, A., Daza-Santacoloma, G., Orozco-Gutierrez, A., Castellanos-Dominguez, G.: Relevant information undersampling to support imbalanced data classification. Neurocomputing 436, 136–146 (2021)
Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, vol. 2, pp. 13–17. Citeseer (2009)
Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019). https://doi.org/10.1186/s40537-019-0192-5
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Lee, J., Sun, Y.G., Sim, I., Kim, S.H., Kim, D.I., Kim, J.Y.: Non-technical loss detection using deep reinforcement learning for feature cost efficiency and imbalanced dataset. IEEE Access 10, 27084–27095 (2022)
Li, B., Liu, Y., Wang, X.: Gradient harmonized single-stage detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8577–8584 (2019)
Liao, T., Taori, R., Raji, I.D., Schmidt, L.: Are we learning yet? A meta review of evaluation failures across machine learning. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, W., Wang, L., Chen, J., Zhou, Y., Zheng, R., He, J.: A partial label metric learning algorithm for class imbalanced data. In: Asian Conference on Machine Learning, pp. 1413–1428. PMLR (2021)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)
Mardani, M., Mateos, G., Giannakis, G.B.: Subspace learning and imputation for streaming big data matrices and tensors. IEEE Trans. Signal Process. 63(10), 2663–2677 (2015)
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016). https://doi.org/10.1007/s10844-015-0368-1
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. In: Proceedings: Fifth International Workshop on Computational Intelligence & Applications, vol. 2009, pp. 24–29. IEEE SMC, Hiroshima Chapter (2009)
Qin, H., Zhou, H., Cao, J.: Imbalanced learning algorithm based intelligent abnormal electricity consumption detection. Neurocomputing 402, 112–123 (2020)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans 40(1), 185–197 (2009)
Shu, J., et al.: Meta-weight-net: learning an explicit mapping for sample weighting. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Stefanowski, J.: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna, S., Jain, L., Howlett, R. (eds.) Emerging Paradigms in Machine Learning. Smart Innovation, Systems and Technologies, vol. 13, pp. 277–306. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28699-5_11
Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021)
Tripathi, A., Chakraborty, R., Kopparapu, S.K.: A novel adaptive minority oversampling technique for improved classification in data imbalanced scenarios. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10650–10657. IEEE (2021)
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631 (2021)
Wang, L., Han, M., Li, X., Zhang, N., Cheng, H.: Review of classification methods on unbalanced data sets. IEEE Access 9, 64606–64628 (2021)
Wei, T., Shi, J.X., Li, Y.F., Zhang, M.L.: Prototypical classifier for robust class-imbalanced learning. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) Advances in Knowledge Discovery and Data Mining. LNCS, vol. 13281, pp. 44–57. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-05936-0_4
Wen, G., Wu, K.: Building decision tree for imbalanced classification via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 1645–1659. PMLR (2021)
Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., Han, X.: A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf. Sci. 572, 574–589 (2021)
Yin, J., Gan, C., Zhao, K., Lin, X., Quan, Z., Wang, Z.J.: A novel model for imbalanced data classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6680–6687 (2020)
Zhang, C., Gao, W., Song, J., Jiang, J.: An imbalanced data classification algorithm of improved autoencoder neural network. In: 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI), pp. 95–99. IEEE (2016)
Zhao, T., Zhang, X., Wang, S.: GraphSMOTE: imbalanced node classification on graphs with graph neural networks. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 833–841 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tai, H., Wong, R., Li, B. (2022). Effective Imbalance Learning Utilizing Informative Data. In: Park, L.A.F., et al. Data Mining. AusDM 2022. Communications in Computer and Information Science, vol 1741. Springer, Singapore. https://doi.org/10.1007/978-981-19-8746-5_8
Download citation
DOI: https://doi.org/10.1007/978-981-19-8746-5_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8745-8
Online ISBN: 978-981-19-8746-5
eBook Packages: Computer ScienceComputer Science (R0)