An attentive convolutional transformer-based network for road safety

Jayanthan, K. S.; Domnic, S.

doi:10.1007/s11227-023-05293-1

An attentive convolutional transformer-based network for road safety

Published: 27 April 2023

Volume 79, pages 16351–16377, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

K. S. Jayanthan¹ &
S. Domnic¹

196 Accesses
Explore all metrics

Abstract

The sharp surge in the number of vehicles on the road leads to numerous traffic violations (Road Transport and Highways in Ministry of road transport and highways, 2021. https://morth.nic.in/road-accident-in-india). The detection of traffic violations in a dynamic environment is a complex task. This paper focuses on detecting a particular traffic violation, such as riding a motorcycle without a helmet. We propose to address the problem as an object detection task. In this paper, we propose a novel convolutional encoder–transformer decoder architecture (CETD) for the object detection task. The proposed architecture comprises two primary modules: a convolutional neural network (CNN)-based convolution encoder that extracts high-level features from input images and a transformer-based decoder that utilizes attention mechanisms to identify relevant components, such as helmets or missing helmets, in the image. This architecture is designed to achieve accurate object detection and localization in images by combining advanced feature extraction techniques with state-of-the-art attention mechanisms. Layer normalization module of the proposed architecture acts as an intermediate bias stabilizer for the encoder–decoder network. The design also includes a standard backbone feature extractor and fused backbone feature extractor. The model gives strong confidence in detecting occluded objects compared to other state-of-the-art models. The detector works in an end-to-end fashion with fewer handcrafted features. We have studied the applicability of the model with a miniature version of the COCO dataset. The model gives a competitive performance with models like Faster region-based convolutional neural network (Faster R-CNN) and Mask region-based convolutional neural network on this dataset. The proposed model is also fine-tuned on the traffic data with occlusion for helmet detection. The model’s performance on helmet detection from traffic data is comparable with the state-of-the-art real-time detectors such as EspiNet, a modified Faster R-CNN network, single-shot multibox detector (SSD), and You Only Look Once version5 (YOLO v5) detector. Specifically, the model has outperformed EspiNet V2 (a modified Faster R-CNN network) by 0.47, SSD by 3.9, and YOLO v5 by 0.37 in terms of mean average precision. Moreover, the model’s mean average precision has been further improved by 0.87 using object-aware copy–paste augmentation. The model’s average occlusion detection confidence is 5.1 percent more than YOLO v5. Experimental results show that the proposed model has better adaptivity on specific object (helmet) detection and generic object detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Lightweight convolutional neural network for real-time 3D object detection in road and railway environments

Article Open access 11 February 2022

DETR-SPP: a fine-tuned vehicle detection with transformer

Article 22 August 2023

Availability of data and materials

Data will be available on request.

References

Ahmed M, Hashmi KA, Pagani A et al (2021) Survey and performance analysis of deep learning based object detection in challenging environments. Sensors. https://doi.org/10.3390/s21155116
Article Google Scholar
Anwer M, Shareef S, Ali A (2021) Accident vehicle types classification: a comparative study between different deep learning models. Indones J Electr Eng Comput Sci 21:1474–1484. https://doi.org/10.11591/ijeecs.v21.i3.pp1474-1484
Article Google Scholar
Arman M, Hasan M, Sadia F et al (2020) Detection and classification of road damage using R-CNN and faster R-CNN: a deep learning approach, pp 730–741. https://doi.org/10.1007/978-3-030-52856-0_58
Ba J, Kiros J, Hinton GE (2016) Layer normalization. arXiv:1607.06450
Buch N, Velastin SA, Orwell J (2011) A review of computer vision techniques for the analysis of urban traffic. IEEE Trans Intell Transp Syst 12(3):920–939
Article Google Scholar
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. arXiv:2005.12872
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T et al (eds) Computer vision—ECCV 2020. Springer, Cham, pp 213–229
Chapter Google Scholar
Caron M, Misra I, Mairal J et al (2020) Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS)
Chalavadi V, Singh D, Mohan CK et al (2017) Detection of motorcyclists without helmet in videos using convolutional neural network. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3036–3041
Chen Q, Wang Y, Yang T et al (2021) You only look one-level feature. arXiv:2103.09460
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 886–893
Dequito C, Dichaves I, Juan R et al (2021) Vision-based bicycle and motorcycle detection using a yolo-based network. J Phys Conf Ser 1922:012,003. https://doi.org/10.1088/1742-6596/1922/1/012003
Article Google Scholar
Dhyanjith G, Manohar N, Raj AV (2021) Helmet detection using yolo v3 and single shot detector. In: 2021 6th International Conference on Communication and Electronics Systems (ICCES), pp 1844–1848. https://doi.org/10.1109/ICCES51350.2021.9489194
Espinosa JE, Velastín SA, Branch JW (2020) Detection of motorcycles in urban traffic using video analysis: a review. IEEE Trans Intell Transp Syst 22:6115–6130
Article Google Scholar
Farhadi A, Redmon J (2018) Yolov3: an incremental improvement. In: Computer Vision and Pattern Recognition, pp 1804–02767
Felzenszwalb PF, Girshick RB, McAllester DA et al (2009) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645
Article Google Scholar
Ghiasi G, Cui Y, Srinivas A et al (2020) Simple copy-paste is a strong data augmentation method for instance segmentation. arXiv:2012.07177
Ghiasi G, Cui Y, Srinivas A et al (2021) Simple copy-paste is a strong data augmentation method for instance segmentation. In: CVPR
Girshick R, Donahue J, Darrell T et al (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. https://doi.org/10.1109/TPAMI.2015.2437384
Article Google Scholar
Girshick RB (2015) Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448
Girshick RB, Felzenszwalb PF, McAllester D (2012) Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/~rbg/latent-release5/
Guindel C, Martín D, Armingol JM (2017) Joint object detection and viewpoint estimation using cnn features. In: 2017 IEEE International Conference on Vehicular Electronics and Safety (ICVES), pp 145–150
Guo M, Xue D, Li P et al (2020) Vehicle pedestrian detection method based on spatial pyramid pooling and attention mechanism. Information. https://doi.org/10.3390/info11120583
Article Google Scholar
Han C, Gao G, Zhang Y (2018) Real-time small traffic sign detection with revised faster-rcnn. Multimed Tools Appl 78:13263–13278
Article Google Scholar
Haris M, Glowacz A (2021) Road object detection: a comparative study of deep learning-based algorithms. Electronics. https://doi.org/10.3390/electronics10161932
Article Google Scholar
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Jia W, Xu S, Liang Z et al (2021) Real-time automatic helmet detection of motorcyclists in urban traffic using improved yolov5 detector. IET Image Process. https://doi.org/10.1049/ipr2.12295
Article Google Scholar
Khan S, Rahmani H, Shah SAA et al (2018) A guide to convolutional neural networks for computer vision. In: A Guide to Convolutional Neural Networks for Computer Vision
Kumar A, Zhang ZJ, Lyu H (2020) Object detection in real time based on improved single shot multi-box detector algorithm. EURASIP J Wirel Commun Netw 2020(1):204. https://doi.org/10.1186/s13638-020-01826-x
Article Google Scholar
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: 15th European Conference on Computer Vision, ECCV 2018. Springer, pp 765–781
Li Y, Wei H, Han Z et al (2020) Deep learning-based safety helmet detection in engineering management based on convolutional neural networks. Adv Civ Eng 2020:9703560
Google Scholar
Lin H, Deng JD, Albers D et al (2020) Helmet use detection of tracked motorcycles using cnn-based multi-task learning. IEEE Access 8:162073–162084. https://doi.org/10.1109/ACCESS.2020.3021357
Article Google Scholar
Lin TY, Goyal P, Girshick RB et al (2017) Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2999–3007
Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318. https://doi.org/10.1007/s11263-019-01247-4
Article MATH Google Scholar
Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: ECCV
Mao L (2022) Layer normalization explained. Layer Normalization Explained, https://leimao.github.io/blog/Layer-Normalization/
Messelodi S, Modena C, Zanin M (2005) A computer vision system for the detection and classification of vehicles at urban road intersections. Pattern Anal Appl 8:17–31. https://doi.org/10.1007/s10044-004-0239-9
Article MathSciNet Google Scholar
Miller D (2020) Probabilistic object detection with an ensemble of experts. In: European Conference on Computer Vision, Springer, pp 46–55
Nguyen M, Dang TH, Nguyen TT et al (2022) Improve object detection performance with efficient task-alignment module. In: 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE). IEEE, pp 930–933
Organization WH et al (2018) Global status report on road safety 2018: summary. Technical report, World Health Organization
Oviedo JEE, Velast SA, Bedoya JWB (2019) Espinet v2: a region based deep learning model for detecting motorcycles in urban scenarios. DYNA
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525
Redmon J, Divvala S, Girshick RB et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788
Ren S, He K, Girshick RB et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Article Google Scholar
Road Transport and Highways (2021) Ministry of road transport and highways. https://morth.nic.in/road-accident-in-india
Samet N, Hicsonmez S, Akbas E (2020) Houghnet: integrating near and long-range evidence for bottom-up object detection. In: European Conference on Computer Vision (ECCV)
Silva R, Aires K, Santos T, et al (2013) Automatic detection of motorcyclists without helmet. In: Proceedings of the 2013 39th Latin American Computing Conference, CLEI 2013, pp 1–7. https://doi.org/10.1109/CLEI.2013.6670613
Singh B, Davis L (2018) An analysis of scale invariance in object detection—snip. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3578–3587
Sivasankaran SK, Rangam H, Balasubramanian V (2021) Investigation of factors contributing to injury severity in single vehicle motorcycle crashes in India. Int J Inj Control Saf Promot 28(2):243–254. https://doi.org/10.1080/17457300.2021.1908367
Article Google Scholar
Song S, Que Z, Hou J et al (2019) An efficient convolutional neural network for small traffic sign detection. J Syst Archit 97:269–277
Article Google Scholar
Soviany P, Ionescu RT (2018) Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. In: 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp 209–214. https://doi.org/10.1109/SYNASC.2018.00041
Sun P, Jiang Y, Xie E et al (2021) What makes for end-to-end object detection? arXiv:2012.05780
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10,778–10,787
Tian Z, Shen C, Chen H et al (2019) Fcos: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 9626–9635
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv:1607.08022
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, et al (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Velastin S, Fernandez R, Espinosa Oviedo J et al (2020) Detecting, tracking and counting people getting on/off a metropolitan train using a standard video camera. Sensors 2020:6251. https://doi.org/10.3390/s20216251
Article Google Scholar
Viola PA, Jones M (2001) Robust real-time object detection. In: Robust Real-time Object Detection
Wang J, Song L, Li Z et al (2020) End-to-end object detection with fully convolutional network. arXiv preprint arXiv:2012.03544
Wang W, Wu B, Yang S et al (2018) Road damage detection and classification with faster r-cnn. In: 2018 IEEE International Conference on Big Data (Big Data), pp 5220–5223. https://doi.org/10.1109/BigData.2018.8622354
Wen X, Yuan H, Song C et al (2007) An algorithm based on svm ensembles for motorcycle recognition. In: 2007 IEEE International Conference on Vehicular Electronics and Safety, pp 1–5
Wu X, Sahoo D, Hoi S (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64
Article Google Scholar
Wu Z, Shen C, Hengel AV (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recognit 90:119–133
Article Google Scholar
Xiong R, Yang Y, He D et al (2020) On layer normalization in the transformer architecture. arXiv:2002.04745
Xu J, Sun X, Zhang Z et al (2019) Understanding and improving layer normalization. arXiv:1911.07013
Yang H, Fan B, Guo L (2020) Anchor-free object detection with mask attention. EURASIP J Image Video Process 2020(1):29. https://doi.org/10.1186/s13640-020-00517-3
Article Google Scholar
Zhao ZQ, Zheng P, Xu ST et al (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Article Google Scholar
Zou Z, Shi Z, Guo Y et al (2019) Object detection in 20 years: a survey. arXiv:1905.05055

Download references

Acknowledgements

The authors expressed their in-depth gratitude for providing GPU time for training by the Centre of Excellence Artificial Intelligence Lab, NIT, Tiruchirappalli.

Funding

No funding was provided to carry out this research work.

Author information

Authors and Affiliations

Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tiruchirappalli, Tamil Nadu, 620005, India
K. S. Jayanthan & S. Domnic

Authors

K. S. Jayanthan
View author publications
You can also search for this author in PubMed Google Scholar
S. Domnic
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both the authors contributed equally to this work.

Corresponding author

Correspondence to K. S. Jayanthan.

Ethics declarations

Conflict of interest

The authors have no other conflict of interest to disclose.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jayanthan, K.S., Domnic, S. An attentive convolutional transformer-based network for road safety. J Supercomput 79, 16351–16377 (2023). https://doi.org/10.1007/s11227-023-05293-1

Download citation

Accepted: 10 April 2023
Published: 27 April 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11227-023-05293-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An attentive convolutional transformer-based network for road safety

Abstract

Access this article

Similar content being viewed by others

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Lightweight convolutional neural network for real-time 3D object detection in road and railway environments

DETR-SPP: a fine-tuned vehicle detection with transformer

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An attentive convolutional transformer-based network for road safety

Abstract

Access this article

Similar content being viewed by others

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Lightweight convolutional neural network for real-time 3D object detection in road and railway environments

DETR-SPP: a fine-tuned vehicle detection with transformer

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation