Skip to main content
Log in

An attentive convolutional transformer-based network for road safety

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The sharp surge in the number of vehicles on the road leads to numerous traffic violations (Road Transport and Highways in Ministry of road transport and highways, 2021. https://morth.nic.in/road-accident-in-india). The detection of traffic violations in a dynamic environment is a complex task. This paper focuses on detecting a particular traffic violation, such as riding a motorcycle without a helmet. We propose to address the problem as an object detection task. In this paper, we propose a novel convolutional encoder–transformer decoder architecture (CETD) for the object detection task. The proposed architecture comprises two primary modules: a convolutional neural network (CNN)-based convolution encoder that extracts high-level features from input images and a transformer-based decoder that utilizes attention mechanisms to identify relevant components, such as helmets or missing helmets, in the image. This architecture is designed to achieve accurate object detection and localization in images by combining advanced feature extraction techniques with state-of-the-art attention mechanisms. Layer normalization module of the proposed architecture acts as an intermediate bias stabilizer for the encoder–decoder network. The design also includes a standard backbone feature extractor and fused backbone feature extractor. The model gives strong confidence in detecting occluded objects compared to other state-of-the-art models. The detector works in an end-to-end fashion with fewer handcrafted features. We have studied the applicability of the model with a miniature version of the COCO dataset. The model gives a competitive performance with models like Faster region-based convolutional neural network (Faster R-CNN) and Mask region-based convolutional neural network on this dataset. The proposed model is also fine-tuned on the traffic data with occlusion for helmet detection. The model’s performance on helmet detection from traffic data is comparable with the state-of-the-art real-time detectors such as EspiNet, a modified Faster R-CNN network, single-shot multibox detector (SSD), and You Only Look Once version5 (YOLO v5) detector. Specifically, the model has outperformed EspiNet V2 (a modified Faster R-CNN network) by 0.47, SSD by 3.9, and YOLO v5 by 0.37 in terms of mean average precision. Moreover, the model’s mean average precision has been further improved by 0.87 using object-aware copy–paste augmentation. The model’s average occlusion detection confidence is 5.1 percent more than YOLO v5. Experimental results show that the proposed model has better adaptivity on specific object (helmet) detection and generic object detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and materials

Data will be available on request.

References

  1. Ahmed M, Hashmi KA, Pagani A et al (2021) Survey and performance analysis of deep learning based object detection in challenging environments. Sensors. https://doi.org/10.3390/s21155116

    Article  Google Scholar 

  2. Anwer M, Shareef S, Ali A (2021) Accident vehicle types classification: a comparative study between different deep learning models. Indones J Electr Eng Comput Sci 21:1474–1484. https://doi.org/10.11591/ijeecs.v21.i3.pp1474-1484

    Article  Google Scholar 

  3. Arman M, Hasan M, Sadia F et al (2020) Detection and classification of road damage using R-CNN and faster R-CNN: a deep learning approach, pp 730–741. https://doi.org/10.1007/978-3-030-52856-0_58

  4. Ba J, Kiros J, Hinton GE (2016) Layer normalization. arXiv:1607.06450

  5. Buch N, Velastin SA, Orwell J (2011) A review of computer vision techniques for the analysis of urban traffic. IEEE Trans Intell Transp Syst 12(3):920–939

    Article  Google Scholar 

  6. Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. arXiv:2005.12872

  7. Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T et al (eds) Computer vision—ECCV 2020. Springer, Cham, pp 213–229

    Chapter  Google Scholar 

  8. Caron M, Misra I, Mairal J et al (2020) Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS)

  9. Chalavadi V, Singh D, Mohan CK et al (2017) Detection of motorcyclists without helmet in videos using convolutional neural network. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3036–3041

  10. Chen Q, Wang Y, Yang T et al (2021) You only look one-level feature. arXiv:2103.09460

  11. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 886–893

  12. Dequito C, Dichaves I, Juan R et al (2021) Vision-based bicycle and motorcycle detection using a yolo-based network. J Phys Conf Ser 1922:012,003. https://doi.org/10.1088/1742-6596/1922/1/012003

    Article  Google Scholar 

  13. Dhyanjith G, Manohar N, Raj AV (2021) Helmet detection using yolo v3 and single shot detector. In: 2021 6th International Conference on Communication and Electronics Systems (ICCES), pp 1844–1848. https://doi.org/10.1109/ICCES51350.2021.9489194

  14. Espinosa JE, Velastín SA, Branch JW (2020) Detection of motorcycles in urban traffic using video analysis: a review. IEEE Trans Intell Transp Syst 22:6115–6130

    Article  Google Scholar 

  15. Farhadi A, Redmon J (2018) Yolov3: an incremental improvement. In: Computer Vision and Pattern Recognition, pp 1804–02767

  16. Felzenszwalb PF, Girshick RB, McAllester DA et al (2009) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645

    Article  Google Scholar 

  17. Ghiasi G, Cui Y, Srinivas A et al (2020) Simple copy-paste is a strong data augmentation method for instance segmentation. arXiv:2012.07177

  18. Ghiasi G, Cui Y, Srinivas A et al (2021) Simple copy-paste is a strong data augmentation method for instance segmentation. In: CVPR

  19. Girshick R, Donahue J, Darrell T et al (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. https://doi.org/10.1109/TPAMI.2015.2437384

    Article  Google Scholar 

  20. Girshick RB (2015) Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448

  21. Girshick RB, Felzenszwalb PF, McAllester D (2012) Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/~rbg/latent-release5/

  22. Guindel C, Martín D, Armingol JM (2017) Joint object detection and viewpoint estimation using cnn features. In: 2017 IEEE International Conference on Vehicular Electronics and Safety (ICVES), pp 145–150

  23. Guo M, Xue D, Li P et al (2020) Vehicle pedestrian detection method based on spatial pyramid pooling and attention mechanism. Information. https://doi.org/10.3390/info11120583

    Article  Google Scholar 

  24. Han C, Gao G, Zhang Y (2018) Real-time small traffic sign detection with revised faster-rcnn. Multimed Tools Appl 78:13263–13278

    Article  Google Scholar 

  25. Haris M, Glowacz A (2021) Road object detection: a comparative study of deep learning-based algorithms. Electronics. https://doi.org/10.3390/electronics10161932

    Article  Google Scholar 

  26. He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916

    Article  Google Scholar 

  27. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  28. Jia W, Xu S, Liang Z et al (2021) Real-time automatic helmet detection of motorcyclists in urban traffic using improved yolov5 detector. IET Image Process. https://doi.org/10.1049/ipr2.12295

    Article  Google Scholar 

  29. Khan S, Rahmani H, Shah SAA et al (2018) A guide to convolutional neural networks for computer vision. In: A Guide to Convolutional Neural Networks for Computer Vision

  30. Kumar A, Zhang ZJ, Lyu H (2020) Object detection in real time based on improved single shot multi-box detector algorithm. EURASIP J Wirel Commun Netw 2020(1):204. https://doi.org/10.1186/s13638-020-01826-x

    Article  Google Scholar 

  31. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: 15th European Conference on Computer Vision, ECCV 2018. Springer, pp 765–781

  32. Li Y, Wei H, Han Z et al (2020) Deep learning-based safety helmet detection in engineering management based on convolutional neural networks. Adv Civ Eng 2020:9703560

    Google Scholar 

  33. Lin H, Deng JD, Albers D et al (2020) Helmet use detection of tracked motorcycles using cnn-based multi-task learning. IEEE Access 8:162073–162084. https://doi.org/10.1109/ACCESS.2020.3021357

    Article  Google Scholar 

  34. Lin TY, Goyal P, Girshick RB et al (2017) Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2999–3007

  35. Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318. https://doi.org/10.1007/s11263-019-01247-4

    Article  MATH  Google Scholar 

  36. Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: ECCV

  37. Mao L (2022) Layer normalization explained. Layer Normalization Explained, https://leimao.github.io/blog/Layer-Normalization/

  38. Messelodi S, Modena C, Zanin M (2005) A computer vision system for the detection and classification of vehicles at urban road intersections. Pattern Anal Appl 8:17–31. https://doi.org/10.1007/s10044-004-0239-9

    Article  MathSciNet  Google Scholar 

  39. Miller D (2020) Probabilistic object detection with an ensemble of experts. In: European Conference on Computer Vision, Springer, pp 46–55

  40. Nguyen M, Dang TH, Nguyen TT et al (2022) Improve object detection performance with efficient task-alignment module. In: 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE). IEEE, pp 930–933

  41. Organization WH et al (2018) Global status report on road safety 2018: summary. Technical report, World Health Organization

  42. Oviedo JEE, Velast SA, Bedoya JWB (2019) Espinet v2: a region based deep learning model for detecting motorcycles in urban scenarios. DYNA

  43. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525

  44. Redmon J, Divvala S, Girshick RB et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788

  45. Ren S, He K, Girshick RB et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149

    Article  Google Scholar 

  46. Road Transport and Highways (2021) Ministry of road transport and highways. https://morth.nic.in/road-accident-in-india

  47. Samet N, Hicsonmez S, Akbas E (2020) Houghnet: integrating near and long-range evidence for bottom-up object detection. In: European Conference on Computer Vision (ECCV)

  48. Silva R, Aires K, Santos T, et al (2013) Automatic detection of motorcyclists without helmet. In: Proceedings of the 2013 39th Latin American Computing Conference, CLEI 2013, pp 1–7. https://doi.org/10.1109/CLEI.2013.6670613

  49. Singh B, Davis L (2018) An analysis of scale invariance in object detection—snip. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3578–3587

  50. Sivasankaran SK, Rangam H, Balasubramanian V (2021) Investigation of factors contributing to injury severity in single vehicle motorcycle crashes in India. Int J Inj Control Saf Promot 28(2):243–254. https://doi.org/10.1080/17457300.2021.1908367

    Article  Google Scholar 

  51. Song S, Que Z, Hou J et al (2019) An efficient convolutional neural network for small traffic sign detection. J Syst Archit 97:269–277

    Article  Google Scholar 

  52. Soviany P, Ionescu RT (2018) Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. In: 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp 209–214. https://doi.org/10.1109/SYNASC.2018.00041

  53. Sun P, Jiang Y, Xie E et al (2021) What makes for end-to-end object detection? arXiv:2012.05780

  54. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10,778–10,787

  55. Tian Z, Shen C, Chen H et al (2019) Fcos: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 9626–9635

  56. Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv:1607.08022

  57. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, et al (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  58. Velastin S, Fernandez R, Espinosa Oviedo J et al (2020) Detecting, tracking and counting people getting on/off a metropolitan train using a standard video camera. Sensors 2020:6251. https://doi.org/10.3390/s20216251

    Article  Google Scholar 

  59. Viola PA, Jones M (2001) Robust real-time object detection. In: Robust Real-time Object Detection

  60. Wang J, Song L, Li Z et al (2020) End-to-end object detection with fully convolutional network. arXiv preprint arXiv:2012.03544

  61. Wang W, Wu B, Yang S et al (2018) Road damage detection and classification with faster r-cnn. In: 2018 IEEE International Conference on Big Data (Big Data), pp 5220–5223. https://doi.org/10.1109/BigData.2018.8622354

  62. Wen X, Yuan H, Song C et al (2007) An algorithm based on svm ensembles for motorcycle recognition. In: 2007 IEEE International Conference on Vehicular Electronics and Safety, pp 1–5

  63. Wu X, Sahoo D, Hoi S (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64

    Article  Google Scholar 

  64. Wu Z, Shen C, Hengel AV (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recognit 90:119–133

    Article  Google Scholar 

  65. Xiong R, Yang Y, He D et al (2020) On layer normalization in the transformer architecture. arXiv:2002.04745

  66. Xu J, Sun X, Zhang Z et al (2019) Understanding and improving layer normalization. arXiv:1911.07013

  67. Yang H, Fan B, Guo L (2020) Anchor-free object detection with mask attention. EURASIP J Image Video Process 2020(1):29. https://doi.org/10.1186/s13640-020-00517-3

    Article  Google Scholar 

  68. Zhao ZQ, Zheng P, Xu ST et al (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Article  Google Scholar 

  69. Zou Z, Shi Z, Guo Y et al (2019) Object detection in 20 years: a survey. arXiv:1905.05055

Download references

Acknowledgements

The authors expressed their in-depth gratitude for providing GPU time for training by the Centre of Excellence Artificial Intelligence Lab, NIT, Tiruchirappalli.

Funding

No funding was provided to carry out this research work.

Author information

Authors and Affiliations

Authors

Contributions

Both the authors contributed equally to this work.

Corresponding author

Correspondence to K. S. Jayanthan.

Ethics declarations

Conflict of interest

The authors have no other conflict of interest to disclose.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jayanthan, K.S., Domnic, S. An attentive convolutional transformer-based network for road safety. J Supercomput 79, 16351–16377 (2023). https://doi.org/10.1007/s11227-023-05293-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05293-1

Keywords

Navigation