Skip to main content
Log in

Output Layer Multiplication for Class Imbalance Problem in Convolutional Neural Networks

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have demonstrated remarkable performance in the field of computer vision. However, they are prone to suffer from the class imbalance problem, in which the number of some classes is significantly higher or lower than that of other classes. Commonly, there are two main strategies to handle the problem, including dataset-level methods via resampling and algorithmic-level methods by modifying the existing learning frameworks. However, most of these methods need extra data resampling or elaborate algorithm design. In this work we provide an effective but extremely simple approach to tackle the imbalance problem in CNNs with cross-entropy loss. Specifically, we multiply a coefficient \( \alpha > 1 \) to output of the last layer in a CNN model. With this modification, the final loss function can dynamically adjust the contributions of examples from different classes during the imbalanced training procedure. Because of its simplicity, the proposed method can be easily applied in the off-the-shelf models with little change. To prove the effectiveness on imbalance problem, we design three experiments on classification tasks of increasing complexity. The experimental results show that our approach could improve the convergence rate in the training stage and/or increase accuracy for test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Alejo R, GarciaV P-SJ (2015) An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Process Lett 42:603–617

    Article  Google Scholar 

  2. Aurelio YS, De Almeida GM, De Castro CL, Braga AP (2019) Learning from imbalanced data sets with weighted cross-entropy function. Neural Process Lett 50:1937–1949

    Article  Google Scholar 

  3. Batuwita R, Palade V (2010) Efficient resampling methods for training support vector machines with imbalanced datasets. In: IEEE international joint conference on neural networks, pp 1–8

  4. Barandela R, Valdovinos RM, Sanchez JS (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256

    Article  MathSciNet  Google Scholar 

  5. Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. https://doi.org/10.1016/j.neunet.2018.07.011

    Article  Google Scholar 

  6. Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Neural information processing systems, pp 1567–1578

  7. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  8. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles and practice of knowledge discovery in databases, pp 107–119

  9. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  10. Chen Q, Huang J, Feris R, Brown LM, Dong J, Yan S (2018) Deep domain adaptation for describing people based on fine-grained clothing attributes. In: IEEE Conference on computer vision and pattern recognition, pp 5315–5325

  11. Cui Y, Jia M, Lin TY, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: IEEE conference on computer vision and pattern recognition

  12. Dong Q, Gong S, Zhu X (2019) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381

    Article  Google Scholar 

  13. Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: ICML workshop on learning from imbalanced data II, pp 1–8

  14. Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence, pp 973–978

  15. Everingham M, Gool LV, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  16. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: gagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C 42(4):463–484

    Article  Google Scholar 

  17. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587

  18. Guo S, Liu Y, Chen R, Sun X, Wang X (2019) Improved SMOTE algorithm to deal with imbalanced activity classes in smart homes. Neural Process Lett 50:1503–1526

    Article  Google Scholar 

  19. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, pp 1322–1328

  20. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  21. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778

  22. Hensman P, Masko D (2015) The impact of imbalanced training data for convolutional neural networks. Degree project, KTH Royal Institute of Technology

  23. Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: IEEE conference on computer vision and pattern recognition, pp 5375–5384

  24. Huang C, Li Y, Loy CC, Tang X (2018) Deep imbalanced learning for face recognition and attribute prediction. arXiv: 1806.00194

  25. Huang K, Zhang R, Yin XC (2015) Learning imbalanced classifiers locally and globally with one-side probability machine. Neural Process Lett 41:311–323

    Article  Google Scholar 

  26. Katharopoulos A, Fleuret F (2018) Not all samples are created equal: deep learning with importance sampling. In: International conference on machine learning, pp 2525–2534

  27. Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587

    Article  Google Scholar 

  28. Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images. Ms. thesis, University of Toronto

  29. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Neural information processing systems, pp 1097–1105

  30. Kumar N, Berg A, Belhumeur PN, Nayar S (2011) Describable visual attributes for face verification and image search. IEEE Trans Pattern Anal Mach Intell 33(10):1962–1977

    Article  Google Scholar 

  31. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  32. Li S, Deng W (2016) Real world expression recognition: a highly imbalanced detection problem. In: IEEE international conference on biometrics, pp 1–6

  33. Lim P, Goh CK, Tan KC (2017) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans Cybern 47(9):2850–2861

    Article  Google Scholar 

  34. Lin TY, Goyal P, Girshick R, He K, Dallar P (2014) Focal loss for dense object detection. In: IEEE international conference on computer vision, pp 2999–3007

  35. Ling CX, Sheng VS (2017) Cost-sensitive learning. Encyclopedia of machine learning and data mining. Springer, Boston

    Google Scholar 

  36. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2018) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37

  37. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and recognition, pp 779–788

  38. Shelhamer E, Long J, Darrell T (2017) Fully Convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Article  Google Scholar 

  39. Oksuz K, Cam BC, Kalkan S, Akbas E (2019) Imbalance problems in object detection: a review. arXiv: 1909.00169

  40. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra R-CNN: towards balanced learning for object detection. arXiv: 1904.02701

  41. Pytorch, https://pytorch.org/

  42. Ren M, Zeng W, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: International conference on machine learning, pp 4334–4343

  43. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  44. Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197

    Article  Google Scholar 

  45. Tripathi S, Chandra S, Agrawal A, Tyagi A, Rehg JM, Chari V (2019) Learning to generate synthetic data via compositing. In: IEEE conference on computer vision and pattern recognition, pp 461–470

  46. Vuttipittayamongkol P, Elyan E (2020) Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci 509:47–70

    Article  Google Scholar 

  47. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: hard positive generation via adversary for object detection. In: IEEE conference on computer vision and pattern recognition, pp 3039–3048

  48. Wang J, Xu M, Wang H, Zhang J (2006) Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding. In: International conference on signal processing, https://doi.org/10.1109/icosp.2006.345752

  49. Wang P, Su F, Zhao Z, Guo Y, Zhao Y, Zhuang B (2019) Deep class-skewed learning for face recognition. Neurocomputing 363:35–45

    Article  Google Scholar 

  50. Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd counting in the wild. In: IEEE conference on computer vision and pattern recognition, pp 8190–8199

  51. Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Trans Image Process 28(9):4376–4386

    Article  MathSciNet  Google Scholar 

  52. Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE symposium on computational intelligence and data mining, pp 324–331

  53. Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36:5718–5727

    Article  Google Scholar 

  54. Zhang C, Tan KC. Ren R (2016) Training cost-sensitive deep belief networks on imbalance data problems. In: International joint conference on neural networks, pp 4362–4367

  55. Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recogn 48:3102–3112

    Article  Google Scholar 

  56. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: IEEE international conference on computer vision, pp 1116–1124

Download references

Acknowledgements

This research was supported by NSFC (No. 61501177, 61772455, U1713213, 41601394, 61902084), Guangzhou University’s training program for excellent new-recruited doctors (No. YB201712), Major Science and Technology Project of Precious Metal Materials Genetic Engineering in Yunnan Province (No. 2019ZE001-1, 202002AB080001), Yunnan Natural Science Funds (No. 2018FY001(-013), 2019FA-045), Yunnan University Natural Science Funds (No. 2018YDJQ004), the Project of Innovative Research Team of Yunnan Province (No. 2018HC019), Guangdong Natural Science Foundation (No. 2017A030310639), and Featured Innovation Project of Guangdong Education Department (No. 2018KTSCX174).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dapeng Tao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof for that Eq. (10) is negative with \( \alpha > 1 \).

Assume that there is a function

$$ f\left( x \right) = \frac{log\left( x \right)}{1 - x} , x \in \left( {0 , 1} \right). $$
(15)

By computing its derivation with \( x \), we get

$$ \frac{\partial f\left( x \right)}{\partial x} = \frac{{\frac{1}{x} - 1 + log\left( x \right)}}{{\left( {1 - x} \right)^{2} }}. $$
(16)

Let \( g\left( x \right) \) denote the numerator of Eq. (16), we have

$$ g\left( x \right) = \frac{1}{x} - 1 + \log \left( x \right) , x \in \left( {0 , 1} \right). $$
(17)

By computing its derivation with \( x \), we have

$$ \frac{\partial g\left( x \right)}{\partial x} = \frac{1}{x}\left( {1 - \frac{1}{x}} \right). $$
(18)

Since Eq. (18) is always negative with \( x \in \left( {0 , 1} \right) \), we can conclude that \( g\left( x \right) \) is a decreasing function. The minimum value of \( g\left( x \right) \) approaches zero, as \( g\left( 1 \right) = 0 \). Thus, \( g\left( x \right) \) is always positive with \( x \in \left( {0 , 1} \right) \). It indicates that \( f\left( x \right) \) is an increasing function with \( x \in \left( {0 , 1} \right) \).

Let \( p_{1} = e^{{o_{k} }} /\mathop \sum_{j = 1}^{C} e^{{o_{j} }} , \) \( p_{\alpha } = e^{{\alpha o_{k} }} /\mathop \sum _{j = 1}^{C} e^{{\alpha o_{j} }} \) for short. There are \( p_{1} , p_{\alpha } \in \left( {0 , 1} \right) \) and \( p_{1} < p_{\alpha } \) with \( \alpha > 1 \), which can be inferred from the proof of Theorem 2. By considering the monotonicity of \( f\left( x \right) \), we have

$$ \frac{{log\left( {p_{1} } \right)}}{{1 - p_{1} }} < \frac{{log\left( {p_{a} } \right)}}{{1 - p_{a} }}, $$
(19)

which can be transformed as

$$ \left( {1 - p_{a} } \right)log\left( {p_{1} } \right) < \left( {1 - p_{1} } \right)log\left( {p_{a} } \right). $$
(20)

Because both sides of Eq. (20) are negative, we can obtain

$$ \alpha \left( {1 - p_{a} } \right)log\left( {p_{1} } \right) < {\left( {1 - p_{1} } \right)log\left( {p_{a} } \right) , \alpha } > 1. $$
(21)

So we can conclude that Eq. (10) is negative with \( \alpha > 1 \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Zhu, Y., Liu, T. et al. Output Layer Multiplication for Class Imbalance Problem in Convolutional Neural Networks. Neural Process Lett 52, 2637–2653 (2020). https://doi.org/10.1007/s11063-020-10366-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10366-w

Keywords

Navigation