Abstract
Vehicle Color Recognition (VCR) plays a vital role in intelligent traffic management and criminal investigation assistance. However, the existing vehicle color datasets only cover 13 classes, which can not meet the current actual demand. Besides, although lots of efforts are devoted to VCR, they suffer from the problem of class imbalance in datasets. To address these challenges, in this paper, we propose a novel VCR method based on Smooth Modulation Neural Network with Multi-Scale Feature Fusion (SMNN-MSFF). Specifically, to construct the benchmark of model training and evaluation, we first present a new VCR dataset with 24 vehicle classes, Vehicle Color-24, consisting of 10091 vehicle images from a 100-hour urban road surveillance video. Then, to tackle the problem of long-tail distribution and improve the recognition performance, we propose the SMNN-MSFF model with multi-scale feature fusion and smooth modulation. The former aims to extract feature information from local to global, and the latter could increase the loss of the images of tail class instances for training with class-imbalance. Finally, comprehensive experimental evaluation on Vehicle Color-24 and previously three representative datasets demonstrate that our proposed SMNN-MSFF outperformed state-of-the-art VCR methods. And extensive ablation studies also demonstrate that each module of our method is effective, especially, the smooth modulation efficiently help feature learning of the minority or tail classes. Vehicle Color-24 and the code of SMNN-MSFF are publicly available and can contact the author to obtain.
Similar content being viewed by others
References
Ke X, Zhang Y F. Fine-grained vehicle type detection and recognition based on dense attention network. Neurocomputing, 2020, 399: 247–257
Tariq A, Khan M Z, Khan M U G. Real time vehicle detection and colour recognition using tuned features of faster-RCNN. In: Proceedings of the 1st International Conference on Artificial Intelligence and Data Analytics. 2021, 262–267
Chen P, Bai X, Liu W Y. Vehicle color recognition on urban road by feature context. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(5): 2340–2346
Jeong Y, Park K H, Park D. Homogeneity patch search method for voting-based efficient vehicle color classification using front-of-vehicle image. Multimedia Tools and Applications, 2019, 78(20): 28633–28648
Tilakaratna D S B, Watchareeruetai U, Siddhichai S, Natcharapinchai N. Image analysis algorithms for vehicle color recognition. In: Proceedings of 2017 International Electrical Engineering Congress. 2017, 1–4
Dule E, Gökmen M, Beratoğlu M S. A convenient feature vector construction for vehicle color recognition. In: Proceedings of the 11th WSEAS International Conference on Nural Networks and 11th WSEAS International Conference on Evolutionary Computing and 11th WSEAS International Conference on Fuzzy Systems. 2010, 250–255
Hu C P, Bai X, Qi L, Chen P, Xue G J, Mei L. Vehicle color recognition with spatial pyramid deep learning. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(5): 2925–2934
Rachmadi R F, Purnama I K E. Vehicle color recognition using convolutional neural network. 2015, arXiv preprint arXiv: 1510.07391
Zhuo L, Zhang Q, Li J F, Zhang J, Li X G, Zhang H. High-accuracy vehicle color recognition using hierarchical fine-tuning strategy for urban surveillance videos. Journal of Electronic Imaging, 2018, 27(5): 051203
Fu H Y, Ma H D, Wang G Y, Zhang X M, Zhang Y F. MCFF-CNN: multiscale comprehensive feature fusion convolutional neural network for vehicle color recognition based on residual learning. Neurocomputing, 2020, 395: 178–187
Nafzi M, Brauckmann M, Glasmachers T. Vehicle shape and color classification using convolutional neural network. 2019, arXiv preprint arXiv: 1905.08612
Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: single shot MultiBox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21–37
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 779–788
Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection. 2020, arXiv preprint arXiv: 2004.10934
Tan M X, Pang R M, Le Q V. EfficientDet: scalable and efficient object detection. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 10778–10787
Zhou X Y, Wang D Q, Krähenbühl P. Objects as points. 2019, arXiv preprint arXiv: 1904.07850
Lin T Y, Goyal P, Girshick R, He K M, Dollár P. Focal loss for dense object detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2999–3007
Tang K H, Huang J Q, Zhang H W. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020
Wang X G, Bai X, Liu W Y, Latecki L J. Feature context for image classification and object detection. In: Proceedings of the CVPR 2011. 2011, 961–968
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 2012, 1106–1114
Cui Y, Jia M L, Lin T Y, Song Y, Belongie S. Class-balanced loss based on effective number of samples. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9260–9269
Cao K D, Wei C L, Gaidon A, Arechiga N, Ma T Y. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 140
Ren M Y, Zeng W Y, Yang B, Urtasun R. Learning to reweight examples for robust deep learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018
Shu J, Xie Q, Yi L X, Zhao Q, Zhou S P, Xu Z B, Meng D Y. Meta-weight-net: learning an explicit mapping for sample weighting. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 172
Jamal M A, Brown M, Yang M H, Wang L Q, Gong B Q. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 7607–7616
Kang B Y, Xie S N, Rohrbach M, Yan Z C, Gordo A, Feng J S, Kalantidis Y. Decoupling representation and classifier for long-tailed recognition. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Zhou B Y, Cui Q, Wei X S, Chen Z M. BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9716–9725
Yin X, Yu X, Sohn K, Liu X M, Chandraker M. Feature transfer learning for face recognition with under-represented data. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 5697–5706
Liu J L, Sun Y F, Han C C, Dou Z P, Li W H. Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 2967–2976
Liu Z W, Miao Z Q, Zhan X H, Wang J Y, Gong B Q, Yu S X. Large-scale long-tailed recognition in an open world. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2532–2541
Chu P, Bian X, Liu S P, Ling H B. Feature space augmentation for long-tailed data. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 694–710
Menon A K, Jayasumana S, Rawat A S, Jain H, Veit A, Kumar S. Long-tail learning via logit adjustment. In: Proceedings of the International Conference on Learning Representations. 2020
Xiang L Y, Ding G G, Han J G. Learning from multiple experts: self-paced knowledge distillation for long-tailed classification. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 247–263
Li Y, Wang T, Kang B Y, Tang S, Wang C F, Li J T, Feng J S. Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 10988–10997
Wang X D, Lian L, Miao Z Q, Liu Z W, Yu S X. Long-tailed recognition by routing diverse distribution-aware experts. 2021, arXiv preprint arXiv: 2010.01809
Xue X Q, Ding J K, Shi Y J. Research and application of illumination processing method in vehicle color recognition. In: Proceedings of the 3rd IEEE International Conference on Computer and Communications. 2017, 1662–1666
Seifert C, Aamir A, Balagopalan A, Jain D, Sharma A, Grottel S, Gumhold S. Visualizations of deep neural networks in computer vision: a survey. In: Cerquitelli T, Quercia D, Pasquale F, eds. Transparent Data Mining for Big and Small Data. Cham: Springer, 2017, 123–144
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 936–944
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 62071378), the Shaanxi Province International Science and Technology Cooperation Program (2022KW-04), and the Xi’an Science and Technology Plan Project (21XJZZ0072).
Author information
Authors and Affiliations
Corresponding author
Additional information
Mingdi Hu, doctor, associate professor. She obtained doctor of science degree from school of mathematics and statistics at Shaanxi Normal University, China. Her research interests include image recognition, target retrieval and classification, data enhancement, machine learning, artificial intelligence and fuzzy information processing.
Long Bai, master student of Xi’an University of Posts and Telecommunications, China. He received the BS degree in communication engineering from Xi’an University of Science and Technology, China. His research interests include machine learning, deep neural network, image target recognition and artificial intelligence.
Jiulun Fan, doctor, professor. He graduated from Xidian University, China, majoring in signal and information processing, and obtained doctor degree in engineering. His research interests include pattern recognition and image processing, fuzzy information processing theory and application, image security technology.
Sirui Zhao, doctor student of University of Science and Technology of China, China. His research interests include human-computer interaction, affective computing, computer vision and knowledge representation. He has published several papers in refereed conferences and journals, such as ACM MM2021, Neural Networks.
Enhong Chen, doctor, professor of University of Science and Technology of China, China. He is CCF Fellow, IEEE Senior Member. His research interests includes data mining and machine learning, especially social network analysis and recommender systems. He has published more than 200 papers in refereed conferences and journals, such as TKED, KDD, ICDM, NIPS.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Hu, M., Bai, L., Fan, J. et al. Vehicle color recognition based on smooth modulation neural network with multi-scale feature fusion. Front. Comput. Sci. 17, 173321 (2023). https://doi.org/10.1007/s11704-022-1389-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-022-1389-x