Abstract
Intelligent transport systems (ITSs) have emerged as a groundbreaking solution to address the challenges associated with road traffic, which are enhancing road utilization efficiency, providing convenient and safe transportation, and reducing energy consumption. ITS leverages advanced technologies to collect, store, and deliver real-time road traffic information, enabling intelligent decision-making and optimizing various aspects of transportation systems. As a contribution in this matter, we propose in this paper a novel efficient macroscopic approach, based on the multihead self-attention vision transformer (MSViT), for categorizing road traffic congestion, from nighttime videos, into three classes: light, medium, and heavy. To assess the performance of our approach, we conducted experiments using the nighttime UCSD (University of California San Diego) dataset, which includes various weather conditions (clear, overcast, and rainy) and traffic scenarios (light, medium, and heavy). The classification accuracy reached a high level of 94.24%. By incorporating a support vector machine (SVM) classifier into this method, we managed to enhance this accuracy to the outstanding level of 98.92%, thus outperforming the existing state-of-the-art methods that were evaluated using the same UCSD dataset, furthermore, the execution time was optimized.




REFERENCES
Ouessai, A. and Keche, M., IMM/EKF filter based classification of real-time freeway video traffic without learning, Transp. Lett., 2021, vol. 14, no. 6, pp. 610–621. https://doi.org/10.1080/19427867.2021.1913304
Asmaa, O., Mokhtar, K., and Abdelaziz, O., Road traffic density estimation using microscopic and macroscopic parameters, Image Vision Comput., 2013, vol. 31, no. 11, pp. 887–894. https://doi.org/10.1016/j.imavis.2013.09.006
Chan, A.B. and Vasconcelos, N., Classification and retrieval of traffic video using auto-regressive stochastic processes, IEEE Proc. Intelligent Vehicles Symp., 2005., Las Vegas, 2005, IEEE, 2005, pp. 771–776. https://doi.org/10.1109/ivs.2005.1505198
Dosovitskiy, A., An image is worth 16x16 words: Transformers for image recognition at scale, Int. Conf. on Learning Representations, 2020. https://openreview.net/forum?id=YicbFdNTTy.
Chen, H.-T., Tsai, L.-W., Gu, H.-Z., Lee, S.-Y., and Lin, B.-S.P., Traffic congestion classification for nighttime surveillance videos, 2012 IEEE Int. Conf. on Multimedia and Expo Workshops, Melbourne, Australia, 2012, IEEE, 2012, pp. 169–174. https://doi.org/10.1109/icmew.2012.36
Statistical Visual Computing Lab, Analysis of Traffic Video. http://www.svcl.ucsd.edu/projects/traffic/.
Chakraborty, P., Adu-Gyamfi, Ya.O., Poddar, S., Ahsani, V., Sharma, A., and Sarkar, S., Traffic congestion detection from camera images using deep convolution neural networks, Transp. Res. Rec.: J. Transp. Res. Board, 2018, vol. 2672, no. 45, pp. 222–231. https://doi.org/10.1177/0361198118777631
Guo, F., Wang, Yi., and Qian, Yu., Real-time dense traffic detection using lightweight backbone and improved path aggregation feature pyramid network, J. Ind. Inf. Integr., 2023, vol. 31, p. 100427. https://doi.org/10.1016/j.jii.2022.100427
Ramana, K., Srivastava, G., Kumar, M.R., Gadekallu, T.R., Lin, J.C.-W., Alazab, M., and Iwendi, C., A vision transformer approach for traffic congestion prediction in urban areas, IEEE Trans. Intell. Transp. Syst., 2023, vol. 24, no. 4, pp. 3922–3934. https://doi.org/10.1109/tits.2022.3233801
Wang, X., Zeng, R., Zou, F., Liao, L., and Huang, F., STTF: An efficient transformer model for traffic congestion prediction, Int. J. Comput. Intell. Syst., 2023, vol. 16, no. 1, p. 2. https://doi.org/10.1007/s44196-022-00177-3
Al-Qatf, M., Lasheng, Yu., Al-Habib, M., and Al-Sabahi, K., Deep learning approach combining sparse autoencoder with SVM for network intrusion detection, IEEE Access, 2018, vol. 6, pp. 52843–52856. https://doi.org/10.1109/access.2018.2869577
Alharbi, O., A deep learning approach combining CNN and Bi-LSTM with SVM classifier for Arabic sentiment analysis, Int. J. Adv. Comput. Sci. Appl., 2021, vol. 12, no. 6. https://doi.org/10.14569/ijacsa.2021.0120618
Fu, R., Li, B., Gao, Yi., and Wang, P., Content-based image retrieval based on CNN and SVM, IEEE Int. Conf. on Computer and Communications (ICCC), Chengdu, China, 2016, IEEE, 2016, pp. 638–642. https://doi.org/10.1109/compcomm.2016.7924779
Zheng, H., Wang, G., and Li, X., Identifying strawberry appearance quality by vision transformers and support vector machine, J. Food Process Eng., 2022, vol. 45, no. 10, p. e14132. https://doi.org/10.1111/jfpe.14132
Jerbi, F., Aboudi, N., and Khlifa, N., Automatic classification of ultrasound thyroids images using vision transformers and generative adversarial networks, Sci. Afr., 2023, vol. 20, p. e01679. https://doi.org/10.1016/j.sciaf.2023.e01679
Sun, X., Liu, L., Li, Ch., Yin, J., Zhao, J., and Si, W., Classification for remote sensing data with improved CNN-SVM method, IEEE Access, 2019, vol. 7, pp. 164507–164516. https://doi.org/10.1109/access.2019.2952946
Khalladi, S.A., Ouessai, A., and Keche, M., Nighttime road traffic videos classification based on a custom deep convolutional neural network, Abstracts of the First International Conference on Advances in Electrical and Computer Engineering 2023, AIJR Abstracts, 2024, pp. 145–146. https://doi.org/10.21467/abstracts.163
Khalladi, S.A., Ouessai, A., and Keche, M., Vision based classification of nocturnal road traffic using a custom deep convolution neural network, Adv. Syst. Sci. Appl., 2024, vol. 24, no. 1, pp. 129–141. https://doi.org/10.25728/assa.2024.24.1.1412
Khalladi, S.A., Ouessai, A., Benamara, N.K., and Keche, M., Efficient road traffic video congestion classification based on the multi-head self-attention vision transformer model, Transp. Telecommun. J., 2024, vol. 25, no. 1, pp. 20–30. https://doi.org/10.2478/ttj-2024-0003
Hsu, C.-W. and Lin, C.-J., A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Networks, 2002, vol. 13, no. 2, pp. 415–425. https://doi.org/10.1109/72.991427
Tang, Y., Deep learning using linear support vector machines, arXiv Preprint, 2013. https://doi.org/10.48550/arXiv.1306.0239
ACKNOWLEDGMENTS
The authors would like to thank Dr. Antoni. B. Chan from the City University of Hong Kong for providing the UCSD nighttime video traffic dataset.
Funding
This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors of this work declare that they have no conflicts of interest.
Additional information
Publisher’s Note.
Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
AI tools may have been used in the translation or editing of this article.
About this article
Cite this article
Sofiane Abdelkrim Khalladi, Ouessai, A. & Keche, M. Road Traffic Classification from Nighttime Videos Using the Multihead Self-Attention Vision Transformer Model and the SVM. Aut. Control Comp. Sci. 58, 544–554 (2024). https://doi.org/10.3103/S0146411624700652
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411624700652