Skip to main content
Log in

Road Traffic Classification from Nighttime Videos Using the Multihead Self-Attention Vision Transformer Model and the SVM

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

Intelligent transport systems (ITSs) have emerged as a groundbreaking solution to address the challenges associated with road traffic, which are enhancing road utilization efficiency, providing convenient and safe transportation, and reducing energy consumption. ITS leverages advanced technologies to collect, store, and deliver real-time road traffic information, enabling intelligent decision-making and optimizing various aspects of transportation systems. As a contribution in this matter, we propose in this paper a novel efficient macroscopic approach, based on the multihead self-attention vision transformer (MSViT), for categorizing road traffic congestion, from nighttime videos, into three classes: light, medium, and heavy. To assess the performance of our approach, we conducted experiments using the nighttime UCSD (University of California San Diego) dataset, which includes various weather conditions (clear, overcast, and rainy) and traffic scenarios (light, medium, and heavy). The classification accuracy reached a high level of 94.24%. By incorporating a support vector machine (SVM) classifier into this method, we managed to enhance this accuracy to the outstanding level of 98.92%, thus outperforming the existing state-of-the-art methods that were evaluated using the same UCSD dataset, furthermore, the execution time was optimized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

REFERENCES

  1. Ouessai, A. and Keche, M., IMM/EKF filter based classification of real-time freeway video traffic without learning, Transp. Lett., 2021, vol. 14, no. 6, pp. 610–621. https://doi.org/10.1080/19427867.2021.1913304

    Article  Google Scholar 

  2. Asmaa, O., Mokhtar, K., and Abdelaziz, O., Road traffic density estimation using microscopic and macroscopic parameters, Image Vision Comput., 2013, vol. 31, no. 11, pp. 887–894. https://doi.org/10.1016/j.imavis.2013.09.006

    Article  Google Scholar 

  3. Chan, A.B. and Vasconcelos, N., Classification and retrieval of traffic video using auto-regressive stochastic processes, IEEE Proc. Intelligent Vehicles Symp., 2005., Las Vegas, 2005, IEEE, 2005, pp. 771–776. https://doi.org/10.1109/ivs.2005.1505198

  4. Dosovitskiy, A., An image is worth 16x16 words: Transformers for image recognition at scale, Int. Conf. on Learning Representations, 2020. https://openreview.net/forum?id=YicbFdNTTy.

  5. Chen, H.-T., Tsai, L.-W., Gu, H.-Z., Lee, S.-Y., and Lin, B.-S.P., Traffic congestion classification for nighttime surveillance videos, 2012 IEEE Int. Conf. on Multimedia and Expo Workshops, Melbourne, Australia, 2012, IEEE, 2012, pp. 169–174. https://doi.org/10.1109/icmew.2012.36

  6. Statistical Visual Computing Lab, Analysis of Traffic Video. http://www.svcl.ucsd.edu/projects/traffic/.

  7. Chakraborty, P., Adu-Gyamfi, Ya.O., Poddar, S., Ahsani, V., Sharma, A., and Sarkar, S., Traffic congestion detection from camera images using deep convolution neural networks, Transp. Res. Rec.: J. Transp. Res. Board, 2018, vol. 2672, no. 45, pp. 222–231. https://doi.org/10.1177/0361198118777631

    Article  Google Scholar 

  8. Guo, F., Wang, Yi., and Qian, Yu., Real-time dense traffic detection using lightweight backbone and improved path aggregation feature pyramid network, J. Ind. Inf. Integr., 2023, vol. 31, p. 100427. https://doi.org/10.1016/j.jii.2022.100427

    Article  Google Scholar 

  9. Ramana, K., Srivastava, G., Kumar, M.R., Gadekallu, T.R., Lin, J.C.-W., Alazab, M., and Iwendi, C., A vision transformer approach for traffic congestion prediction in urban areas, IEEE Trans. Intell. Transp. Syst., 2023, vol. 24, no. 4, pp. 3922–3934. https://doi.org/10.1109/tits.2022.3233801

    Article  Google Scholar 

  10. Wang, X., Zeng, R., Zou, F., Liao, L., and Huang, F., STTF: An efficient transformer model for traffic congestion prediction, Int. J. Comput. Intell. Syst., 2023, vol. 16, no. 1, p. 2. https://doi.org/10.1007/s44196-022-00177-3

    Article  Google Scholar 

  11. Al-Qatf, M., Lasheng, Yu., Al-Habib, M., and Al-Sabahi, K., Deep learning approach combining sparse autoencoder with SVM for network intrusion detection, IEEE Access, 2018, vol. 6, pp. 52843–52856. https://doi.org/10.1109/access.2018.2869577

    Article  Google Scholar 

  12. Alharbi, O., A deep learning approach combining CNN and Bi-LSTM with SVM classifier for Arabic sentiment analysis, Int. J. Adv. Comput. Sci. Appl., 2021, vol. 12, no. 6. https://doi.org/10.14569/ijacsa.2021.0120618

  13. Fu, R., Li, B., Gao, Yi., and Wang, P., Content-based image retrieval based on CNN and SVM, IEEE Int. Conf. on Computer and Communications (ICCC), Chengdu, China, 2016, IEEE, 2016, pp. 638–642. https://doi.org/10.1109/compcomm.2016.7924779

  14. Zheng, H., Wang, G., and Li, X., Identifying strawberry appearance quality by vision transformers and support vector machine, J. Food Process Eng., 2022, vol. 45, no. 10, p. e14132. https://doi.org/10.1111/jfpe.14132

    Article  Google Scholar 

  15. Jerbi, F., Aboudi, N., and Khlifa, N., Automatic classification of ultrasound thyroids images using vision transformers and generative adversarial networks, Sci. Afr., 2023, vol. 20, p. e01679. https://doi.org/10.1016/j.sciaf.2023.e01679

    Article  Google Scholar 

  16. Sun, X., Liu, L., Li, Ch., Yin, J., Zhao, J., and Si, W., Classification for remote sensing data with improved CNN-SVM method, IEEE Access, 2019, vol. 7, pp. 164507–164516. https://doi.org/10.1109/access.2019.2952946

    Article  Google Scholar 

  17. Khalladi, S.A., Ouessai, A., and Keche, M., Nighttime road traffic videos classification based on a custom deep convolutional neural network, Abstracts of the First International Conference on Advances in Electrical and Computer Engineering 2023, AIJR Abstracts, 2024, pp. 145–146. https://doi.org/10.21467/abstracts.163

  18. Khalladi, S.A., Ouessai, A., and Keche, M., Vision based classification of nocturnal road traffic using a custom deep convolution neural network, Adv. Syst. Sci. Appl., 2024, vol. 24, no. 1, pp. 129–141. https://doi.org/10.25728/assa.2024.24.1.1412

    Article  Google Scholar 

  19. Khalladi, S.A., Ouessai, A., Benamara, N.K., and Keche, M., Efficient road traffic video congestion classification based on the multi-head self-attention vision transformer model, Transp. Telecommun. J., 2024, vol. 25, no. 1, pp. 20–30. https://doi.org/10.2478/ttj-2024-0003

    Article  Google Scholar 

  20. Hsu, C.-W. and Lin, C.-J., A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Networks, 2002, vol. 13, no. 2, pp. 415–425. https://doi.org/10.1109/72.991427

    Article  Google Scholar 

  21. Tang, Y., Deep learning using linear support vector machines, arXiv Preprint, 2013. https://doi.org/10.48550/arXiv.1306.0239

Download references

ACKNOWLEDGMENTS

The authors would like to thank Dr. Antoni. B. Chan from the City University of Hong Kong for providing the UCSD nighttime video traffic dataset.

Funding

This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sofiane Abdelkrim Khalladi.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Publisher’s Note.

Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

AI tools may have been used in the translation or editing of this article.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sofiane Abdelkrim Khalladi, Ouessai, A. & Keche, M. Road Traffic Classification from Nighttime Videos Using the Multihead Self-Attention Vision Transformer Model and the SVM. Aut. Control Comp. Sci. 58, 544–554 (2024). https://doi.org/10.3103/S0146411624700652

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411624700652

Keywords: