Abstract
Vehicle make and model recognition (VMMR) is a pivotal task for developing automatic vehicle recognition systems. In recent decades, this field has attracted significant attention from the computer vision and artificial intelligence communities. Previous research heavily emphasized improving recognition by focusing on implementing different types of attention mechanisms. The attention mechanism has demonstrated its effectiveness in considering features and uncovering distinctive local and global intricacies. However, one significant issue with this approach is that it increases complexity, which leads to a costly and pointless computational burden. To this end, we introduce a deep neural network model, called simple sequential attention network, which concurrently blends a sequential multi-kernel approach to achieve a trade-off between complexity and performance. This method reduces the computational load and adopts a faster approach while efficiently capturing essential information from feature maps of different scales, from local to global. To demonstrate the effectiveness of the proposed method, we conduct experiments on a variety of publicly accessible VMMR datasets, including Stanford Cars, Comprehensive Cars (CompCars), Comprehensive Cars Surveillance Nature (CompCarsSV), and the Vehicle Images dataset. The suggested approach performs better in the vehicle make and model recognition task than the most advanced models. With 94.47% accuracy on Stanford Cars, 98.34% on CompCars, 99.20% on CompCarsSV, and 97.20% on the Vehicle Images dataset, our model achieves state-of-the-art performance. The implementation details with the source code can be found at: https://github.com/JUVCSE/SIMSANET.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility
All datasets used in this work are publicly available.
Code availability
The implementation details and source code for this work are available via a GitHub link: https://github.com/JUVCSE/SIMSANET.
References
Islam A, Mallik S, Roy A, Agrebi M, Singh PK (2023) A filter-based feature selection framework for vehicle/non-vehicle classification. Measurements and instrumentation for machine vision. Taylor, North Mankato, pp 677–684
Maity S, Chakraborty A, Singh PK, Sarkar R (2023) Performance comparison of various yolo models for vehicle detection: an experimental study. In: International conference on data analytics & management. Springer, 677–684
Maity S, Bhattacharyya A, Singh PK, Kumar M, Sarkar R (2022) Last decade in vehicle detection and classification: a comprehensive survey. Arch Comput Methods Eng 29:1–38
Chougula B, Tigadi A, Manage P, Kulkarni S (2020) Road segmentation for autonomous vehicle: a review. In: 2020 3rd international conference on intelligent sustainable systems (ICISS). IEEE, pp 362–365
Tian B, Morris BT, Tang M, Liu Y, Yao Y, Gou C, Shen D, Tang S (2014) Hierarchical and networked vehicle surveillance in its: a survey. IEEE Trans Intell Transp Syst 16(2):557–580
Gayen S, Maity S, Singh PK, Geem ZW, Sarkar R (2023) Two decades of vehicle make and model recognition-survey, challenges and future directions. J King Saud Univ Comput Inf Sci 36:101885
Bhattacharyya A, Bhattacharya A, Maity S, Singh PK, Sarkar R (2023) Juvdsi v1: developing and benchmarking a new still image database in Indian scenario for automatic vehicle detection. Multimed Tools Appl 82:1–33
Ali A, Sarkar R, Das DK (2023) Iruvd: a new still-image based dataset for automatic vehicle detection. Multimed Tools Appl 83:1–27
Lin H-Y, Tu K-C, Li C-Y (2020) Vaid: an aerial image dataset for vehicle detection and classification. IEEE Access 8:212209–212219
Xu B, Wang B, Gu Y (2019) Vehicle detection in aerial images using modified yolo. In: 2019 IEEE 19th international conference on communication technology (ICCT). IEEE, pp 1669–1672
Zhang X, Zhu X (2019) Vehicle detection in the aerial infrared images via an improved yolov3 network. In: 2019 IEEE 4th international conference on signal and image processing (ICSIP), pp 372–376. https://doi.org/10.1109/SIPROCESS.2019.8868430
Maity S, Saha D, Singh PK, Sarkar R (2024) Juivcdv1: development of a still-image based dataset for Indian vehicle classification. Multimed Tools Appl 83:1–28
Maity S, Singh PK, Kaplun D, Sarkar R (2024) Current datasets and their inherent challenges for automatic vehicle classification. Machine learning for cyber physical system: advances and challenges. Springer, Berlin, pp 377–406
Li X, Yu L, Chang D, Ma Z, Cao J (2019) Dual cross-entropy loss for small-sample fine-grained vehicle classification. IEEE Trans Veh Technol 68(5):4204–4212
Yang K, Hu X, Stiefelhagen R (2021) Is context-aware cnn ready for the surroundings? Panoramic semantic segmentation in the wild. IEEE Trans Image Process 30:1866–1881
Keswani M, Ramakrishnan S, Reddy N, Balasubramanian VN (2022) Proto2proto: can you recognize the car, the way i do? In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10233–10243
Ajitha P, Sivasangari A et al (2021) Vehicle model classification using deep learning. In: 2021 5th international conference on trends in electronics and informatics (ICOEI). IEEE, pp 1544–1548
Lu L, Wang P, Cao Y (2022) A novel part-level feature extraction method for fine-grained vehicle recognition. Pattern Recogn 131:108869
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp 5209–5217
Ji R, Wen L, Zhang L, Du D, Wu Y, Zhao C, Liu X, Huang F (2020) Attention convolutional binary neural tree for fine-grained visual categorization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10468–10477
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256
Zheng H, Fu J, Zha Z-J, Luo J, Mei T (2019) Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans Image Process 29:476–488
Yang L, Zhang R-Y, Li L, Xie X (2021) Simam: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp 11863–11874
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. CoRR abs/1807.06521 1807.06521
Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. CoRR abs/1709.01507 1709.01507
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Yang L, Luo P, Change Loy C, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3973–3981
Ali M, Tahir MA, Durrani MN (2022) Vehicle images dataset for make and model recognition. Data Brief 42:108107
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, Part IV 14. Springer, pp 630–645
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Xu S, Chang D, Xie J, Ma Z (2021) Grad-cam guided channel-spatial attention module for fine-grained visual classification. In: 2021 IEEE 31st international workshop on machine learning for signal processing (MLSP), pp 1–6. https://doi.org/10.1109/MLSP52302.2021.9596481
Xu K, Lai R, Gu L, Li Y (2021) Multiresolution discriminative mixup network for fine-grained visual categorization. IEEE Trans Neural Netw Learn Syst 34:3488
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5157–5166
Luo W, Zhang H, Li J, Wei X-S (2020) Learning semantically enhanced feature for fine-grained image classification. IEEE Signal Process Lett 27:1545–1549
Wang P, Cao Y, Lu L (2022) A novel part feature integration and fusion method for fine-grained vehicle recognition. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1990–1994
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412
Xu X, Mo J, Chen M (2021) Application optimization of fine-grained vehicle classification based on backbone network. In: 2021 27th international conference on mechatronics and machine vision in practice (M2VIP). IEEE, pp 54–59
Hassan A, Ali M, Durrani NM, Tahir MA (2021) An empirical analysis of deep learning architectures for vehicle make and model recognition. IEEE Access 9:91487–91499
Hu Q, Wang H, Li T, Shen C (2017) Deep cnns with spatially weighted pooling for fine-grained car recognition. IEEE Trans Intell Transp Syst 18(11):3147–3156
Dai X, Southall B, Trinh N, Matei B (2017) Efficient fine-grained classification and part localization using one compact network. In: Proceedings of the ieee international conference on computer vision (ICCV) workshops
Ma Z, Chang D, Li X (2019) Channel max pooling layer for fine-grained vehicle classification. arXiv preprint arXiv:1902.11107
Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5012–5021
Ma X, Boukerche A (2020) An AI-based visual attention model for vehicle make and model recognition. In: 2020 IEEE symposium on computers and communications (ISCC). IEEE, pp 1–6
Ji R, Wen L, Zhang L, Du D, Wu Y, Zhao C, Liu X, Huang F (2020) Attention convolutional binary neural tree for fine-grained visual categorization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Boukerche A, Ma X (2021) A novel smart lightweight visual attention model for fine-grained vehicle recognition. IEEE Trans Intell Transp Syst 23(8):13846–13862
Li M, Zhou G, Cai W, Li J, Li M, He M, Hu Y, Li L (2022) Multi-scale sparse network with cross-attention mechanism for image-based butterflies fine-grained classification. Appl Soft Comput 117:108419
Dai X, Southall B, Trinh N, Matei B (2017) Efficient fine-grained classification and part localization using one compact network. In: Proceedings of the IEEE international conference on computer vision workshops, pp 996–1004
Lu L, Cai Y, Huang H, Wang P (2023) An efficient fine-grained vehicle recognition method based on part-level feature optimization. Neurocomputing 536:40–49
Tan SH, Chuah JH, Chow C-O, Kanesan J (2023) Coarse-to-fine context aggregation network for vehicle make and model recognition. IEEE Access
Tian Y, Zhang W, Zhang Q, Lu G, Wu X (2018) Selective multi-convolutional region feature extraction based iterative discrimination cnn for fine-grained vehicle model recognition. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 3279–3284
Amirkhani A, Barshooi AH (2022) Deepcar 5.0: vehicle make and model recognition under challenging conditions. IEEE Trans Intell Transp Syst 24(1):541–553
Fang J, Zhou Y, Yu Y, Du S (2016) Fine-grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture. IEEE Trans Intell Transp Syst 18(7):1782–1792
Zhang Q, Zhuo L, Zhang S, Li J, Zhang H, Li X (2018) Fine-grained vehicle recognition using lightweight convolutional neural network with combined learning strategy. In: 2018 IEEE fourth international conference on multimedia big data (BigMM). IEEE, pp 1–5
Yu Y, Jin Q, Chen CW (2018) Ff-cmnet: a cnn-based model for fine-grained classification of car models based on feature fusion. In: 2018 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Selvaraju R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2022) Grad-cam: visual explanations from deep networks via gradient-based localization. arxiv 2016. arXiv preprint arXiv:1610.02391
Singh PK, Sarkar R, Nasipuri M (2016) Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int J Comput Sci Math 7(5):410–442
Singh PK, Sarkar R, Nasipuri M (2015) Statistical validation of multiple classifiers over multiple datasets in the field of pattern recognition. Int J Appl Pattern Recognit 2(1):1–23
Hossain A, Willan AR, Beyene J (2013) An improved method on Wilcoxon rank sum test for gene selection from microarray experiments. Commun Stat Simul Comput 42(7):1563–1577
Tabassum S, Ullah S, Al-Nur NH, Shatabda S (2020) Poribohon-bd: Bangladeshi local vehicle image dataset with annotation for classification. Data Brief 33:106465
Acknowledgements
We are grateful to the CMATER Research Laboratory of the Department of Computer Science and Engineering, Jadavpur University, India, for providing indispensable help in furnishing essential infrastructure support.
Funding
Funding information is not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interest, according to the authors.
Ethical approval and consent to participate
This study does not involve any data that require ethical approval or informed consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gayen, S., Maity, S., Singh, P.K. et al. SimSANet: a simple sequential attention-aided deep neural network for vehicle make and model recognition. Neural Comput & Applic 37, 319–339 (2025). https://doi.org/10.1007/s00521-024-10480-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-10480-z