Skip to main content
Log in

Imbalanced driving scene recognition with class focal loss and data augmentation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Driving scene recognition based on visual features is essential to develop intelligent transportation systems. However, real-world driving scene data is class imbalanced by nature, leading to the majority classes and the minority classes present different distribution patterns. Specifically, some classes have sufficient samples, while for other massive classes, only very few samples are available. With this distribution, deep neural networks have been found to perform poorly on minority classes. To handle the class Imbalance of Driving Scene Recognition (IDSR), this paper presents a novel class focal loss for imbalanced driving scene recognition to improve recognition performance in minority scenes. It introduces the quantity distribution of categories based on focal loss, which can better balance quantity and difficulty in the training process. In addition, this paper explores a data augmentation method for imbalanced driving scene to improve performance. To evaluate the performance of the proposed method, comprehensive experiments were conducted on real-world driving scene datasets. The results show that the proposed method can substantially outperform state-of-the-art methods in class imbalanced driving scene recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The driving scene data were collected by China Automotive Technology and Research Center Company Ltd.

References

  1. Ferdowsi A, Challita U, Saad W (2019) Deep learning for reliable mobile edge analytics in intelligent transportation systems: an overview. IEEE Veh Technol Mag 14(1):62–70

    Article  Google Scholar 

  2. Zhou B, Cui Q, Wei X-S, Chen Z-M (2020) Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9719–9728

  3. Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y (2019) Decoupling representation and classifier for long-tailed recognition. In: International conference on learning representations

  4. Cui Y, Jia M, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9268–9277

  5. Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259

    Article  Google Scholar 

  6. Sarafianos N, Xu X, Kakadiaris IA (2018) Deep imbalanced attribute classification using visual attention aggregation. In: Proceedings of the European conference on computer vision, pp 680–697

  7. Shen L, Lin Z, Huang Q (2016) Relay backpropagation for effective learning of deep convolutional neural networks. In: Proceedings of the European Conference on Computer Vision, pp 467–482

  8. More A (2016) Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048

  9. Drummond C, Holte RC, et al (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: International conference on machine learning workshop on learning from imbalanced datasets, vol 2, no 11, pp 1–8

  10. Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5375–5384

  11. Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. In: Advances in neural information processing systems, pp 7032–7042

  12. Huang C, Li Y, Loy CC, Tang X (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans Pattern Anal Mach Intell 42(11):2781–2794

    Article  Google Scholar 

  13. Wang Y, Gan W, Yang J, Wu W, Yan J (2019) Dynamic curriculum learning for imbalanced data classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5017–5026

  14. Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in neural information processing systems, pp 1567–1578

  15. Ren J, Yu C, Sheng S, Ma X, Zhao H, Yi S, Li H (2020) Balanced meta-softmax for long-tailed visual recognition. In: Advances in neural information processing systems, pp 4175–4186

  16. Hayat M, Khan S, Zamir SW, Shen J, Shao L (2019) Gaussian affinity for max-margin class imbalanced learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6469–6479

  17. Wang P, Han K, Wei X-S, Zhang L, Wang L (2021) Contrastive learning based hybrid networks for long-tailed image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 943–952

  18. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp 2980–2988

  19. Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4109–4118

  20. Chu P, Bian X, Liu S, Ling H (2020) Feature space augmentation for long-tailed data. In: Proceedings of the European conference on computer vision, pp 694–710

  21. Liu J, Sun Y, Han C, Dou Z, Li W (2020) Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2970–2979

  22. Kim J, Jeong J, Shin J (2020) M2m: Imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13896–13905

  23. Chou H-P, Chang S-C, Pan J-Y, Wei W, Juan D-C (2020) Remix: rebalanced mixup. In: Proceedings of the European conference on computer vision, pp 95–110

  24. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) mixup: Beyond empirical risk minimization. In: International conference on learning representations

  25. Zhang Y, Wei X, Zhou B, Wu J (2021) Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, no 4, pp 3447–3455

  26. Saito M, Matsumoto E, Saito S (2017) Temporal generative adversarial nets with singular value clipping. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2830–2839

  27. Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587

    Google Scholar 

  28. Dong Q, Gong S, Zhu X (2017) Class rectification hard mining for imbalanced deep learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1851–1860

  29. Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, Van Der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: Proceedings of the European conference on computer vision, pp 181–196

  30. Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1695–1704

  31. Chapelle O, Weston J, Bottou L, Vapnik V (2001) Vicinal risk minimization. Adv Neural Inf Process Syst 13:416–422

    Google Scholar 

  32. Bellinger C, Corizzo R, Japkowicz N (2020) Remix: calibrated resampling for class imbalance in deep learning. arXiv preprint arXiv:2012.02312

  33. Yun S, Oh SJ, Heo B, Han D, Kim J (2020) Videomix: rethinking data augmentation for video classification. arXiv preprint arXiv:2012.03457

  34. Duan H, Zhao Y, Xiong Y, Liu W, Lin D (2020) Omni-sourced webly-supervised learning for video recognition. In: Proceedings of the European conference on computer vision, pp 670–688

  35. Wang J, Lin Y, Ma AJ (2020) Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition. arXiv preprint arXiv:2008.02086

  36. Kim T, Lee H, Cho M, Lee HS, Cho DH, Lee S (2020) Learning temporally invariant and localizable features via data augmentation for video recognition. In: Proceedings of the European conference on computer vision, pp 386–403

  37. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2625–2634

  38. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 248–255

  39. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  40. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–778

  41. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516

    Article  Google Scholar 

  42. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

  43. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, pp 807–814

  44. Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270

    Article  MathSciNet  Google Scholar 

  45. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  46. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  47. Graves A, Fernández S, Schmidhuber J (2005) Bidirectional lstm networks for improved phoneme classification and recognition. In: International conference on artificial neural networks, vol 3694, pp 799–804

  48. Graves A, Jaitly N, Mohamed A-R (2013) Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding, pp 273–278

  49. Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, no 01, pp 8577–8584

  50. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62076179 and Grant 61732011, in part by the Beijing Natural Science Foundation under Grant Z180006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liu Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (RAR 15,681 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Men, J., Yang, L. et al. Imbalanced driving scene recognition with class focal loss and data augmentation. Int. J. Mach. Learn. & Cyber. 13, 2957–2975 (2022). https://doi.org/10.1007/s13042-022-01575-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01575-x

Keywords

Navigation