Abstract
Driving scene recognition based on visual features is essential to develop intelligent transportation systems. However, real-world driving scene data is class imbalanced by nature, leading to the majority classes and the minority classes present different distribution patterns. Specifically, some classes have sufficient samples, while for other massive classes, only very few samples are available. With this distribution, deep neural networks have been found to perform poorly on minority classes. To handle the class Imbalance of Driving Scene Recognition (IDSR), this paper presents a novel class focal loss for imbalanced driving scene recognition to improve recognition performance in minority scenes. It introduces the quantity distribution of categories based on focal loss, which can better balance quantity and difficulty in the training process. In addition, this paper explores a data augmentation method for imbalanced driving scene to improve performance. To evaluate the performance of the proposed method, comprehensive experiments were conducted on real-world driving scene datasets. The results show that the proposed method can substantially outperform state-of-the-art methods in class imbalanced driving scene recognition.
Similar content being viewed by others
Notes
The driving scene data were collected by China Automotive Technology and Research Center Company Ltd.
References
Ferdowsi A, Challita U, Saad W (2019) Deep learning for reliable mobile edge analytics in intelligent transportation systems: an overview. IEEE Veh Technol Mag 14(1):62–70
Zhou B, Cui Q, Wei X-S, Chen Z-M (2020) Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9719–9728
Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y (2019) Decoupling representation and classifier for long-tailed recognition. In: International conference on learning representations
Cui Y, Jia M, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9268–9277
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Sarafianos N, Xu X, Kakadiaris IA (2018) Deep imbalanced attribute classification using visual attention aggregation. In: Proceedings of the European conference on computer vision, pp 680–697
Shen L, Lin Z, Huang Q (2016) Relay backpropagation for effective learning of deep convolutional neural networks. In: Proceedings of the European Conference on Computer Vision, pp 467–482
More A (2016) Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048
Drummond C, Holte RC, et al (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: International conference on machine learning workshop on learning from imbalanced datasets, vol 2, no 11, pp 1–8
Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5375–5384
Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. In: Advances in neural information processing systems, pp 7032–7042
Huang C, Li Y, Loy CC, Tang X (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans Pattern Anal Mach Intell 42(11):2781–2794
Wang Y, Gan W, Yang J, Wu W, Yan J (2019) Dynamic curriculum learning for imbalanced data classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5017–5026
Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in neural information processing systems, pp 1567–1578
Ren J, Yu C, Sheng S, Ma X, Zhao H, Yi S, Li H (2020) Balanced meta-softmax for long-tailed visual recognition. In: Advances in neural information processing systems, pp 4175–4186
Hayat M, Khan S, Zamir SW, Shen J, Shao L (2019) Gaussian affinity for max-margin class imbalanced learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6469–6479
Wang P, Han K, Wei X-S, Zhang L, Wang L (2021) Contrastive learning based hybrid networks for long-tailed image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 943–952
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp 2980–2988
Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4109–4118
Chu P, Bian X, Liu S, Ling H (2020) Feature space augmentation for long-tailed data. In: Proceedings of the European conference on computer vision, pp 694–710
Liu J, Sun Y, Han C, Dou Z, Li W (2020) Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2970–2979
Kim J, Jeong J, Shin J (2020) M2m: Imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13896–13905
Chou H-P, Chang S-C, Pan J-Y, Wei W, Juan D-C (2020) Remix: rebalanced mixup. In: Proceedings of the European conference on computer vision, pp 95–110
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) mixup: Beyond empirical risk minimization. In: International conference on learning representations
Zhang Y, Wei X, Zhou B, Wu J (2021) Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, no 4, pp 3447–3455
Saito M, Matsumoto E, Saito S (2017) Temporal generative adversarial nets with singular value clipping. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2830–2839
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Dong Q, Gong S, Zhu X (2017) Class rectification hard mining for imbalanced deep learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1851–1860
Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, Van Der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: Proceedings of the European conference on computer vision, pp 181–196
Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1695–1704
Chapelle O, Weston J, Bottou L, Vapnik V (2001) Vicinal risk minimization. Adv Neural Inf Process Syst 13:416–422
Bellinger C, Corizzo R, Japkowicz N (2020) Remix: calibrated resampling for class imbalance in deep learning. arXiv preprint arXiv:2012.02312
Yun S, Oh SJ, Heo B, Han D, Kim J (2020) Videomix: rethinking data augmentation for video classification. arXiv preprint arXiv:2012.03457
Duan H, Zhao Y, Xiong Y, Liu W, Lin D (2020) Omni-sourced webly-supervised learning for video recognition. In: Proceedings of the European conference on computer vision, pp 670–688
Wang J, Lin Y, Ma AJ (2020) Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition. arXiv preprint arXiv:2008.02086
Kim T, Lee H, Cho M, Lee HS, Cho DH, Lee S (2020) Learning temporally invariant and localizable features via data augmentation for video recognition. In: Proceedings of the European conference on computer vision, pp 386–403
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2625–2634
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 248–255
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–778
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, pp 807–814
Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Graves A, Fernández S, Schmidhuber J (2005) Bidirectional lstm networks for improved phoneme classification and recognition. In: International conference on artificial neural networks, vol 3694, pp 799–804
Graves A, Jaitly N, Mohamed A-R (2013) Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding, pp 273–278
Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, no 01, pp 8577–8584
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62076179 and Grant 61732011, in part by the Beijing Natural Science Foundation under Grant Z180006.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhu, X., Men, J., Yang, L. et al. Imbalanced driving scene recognition with class focal loss and data augmentation. Int. J. Mach. Learn. & Cyber. 13, 2957–2975 (2022). https://doi.org/10.1007/s13042-022-01575-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01575-x