Detection of helmetless motorcycle riders by video captioning using deep recurrent neural network

Dasgupta, Madhuchhanda; Bandyopadhyay, Oishila; Chatterji, Sanjay

doi:10.1007/s11042-022-13473-z

Detection of helmetless motorcycle riders by video captioning using deep recurrent neural network

Published: 01 August 2022

Volume 82, pages 5857–5877, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Madhuchhanda Dasgupta¹,
Oishila Bandyopadhyay ORCID: orcid.org/0000-0002-0348-371X¹ &
Sanjay Chatterji¹

304 Accesses
2 Citations
Explore all metrics

Abstract

In a country like India with high density of population, motorcycle is one of the common and viable mode of transport. It is observed that many motorcyclists refrain from wearing helmets while driving. This results in fatal road accidents every year. In crowded roads and highways, it becomes difficult for the police to identify such cases and to take necessary actions. These traffic rule violators can be detected by analysing the traffic videos of surveillance camera. The main objective of this work is to detect the helmetless motorcyclists (and pillion riders) and generate appropriate video caption to help the traffic authority to take fast action against the rule violators. The system can also detect helmetless multiple riders and child rider cases from the video captions. A deep neural network based approach is proposed to generate the video captions for motorcycle riders from surveillance video analysis. In the proposed encoder-decoder based model, Convolutional Neural Network (CNN) along with optical flow guided approach are used for visual feature extraction in encoder part. In the decoder part, Recurrent Neural Network (RNN) based Long-Short-Term-Memory (LSTM) with Soft Attention (SA) technique is applied to achieve best result for video caption generation. The effectiveness of the proposed approach is evaluated by computing BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) metrices. The extensive experimental results show that the proposed method outperforms other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Traffic sign recognition based on deep learning

Article Open access 07 March 2022

References

Ba J, Mnih V, Kavukcuoglu K (2015) Multiple object recognition with visual attention. ICLR, arXiv:1412.7755
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR), arXiv:1409.0473
Banerjee S, Lavie A (June 2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. https://www.aclweb.org/anthology/W05-0909. Association for Computational Linguistics, Ann Arbor, pp 65–72
Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3286–3295
Chen S, Jiang Y-G (2019) Motion guided spatial attention for video captioning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp 8191–8198
Chiverton J (2012) Helmet presence classification with motorcycle detection and tracking. IET Intell Transp Syst 6(3):259–269
Article Google Scholar
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259
Dasgupta M, Bandyopadhyay O, Chatterji S (2019) Automated helmet detection for multiple motorcycle riders using CNN. In: IEEE Conference on information and communication technology, pp 1–4
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
E Silva RRV, Aires KRT, Veras RdMS (2014) Helmet detection on motorcyclists using image descriptors and classifiers. In: 2014 27th SIBGRAPI conference on graphics, patterns and images, IEEE, pp 141–148
Espinosa JE, Velastin SA, Branch JW (2018) Motorcycle detection and classification in urban scenarios using a model based on faster r-cnn. ArXiv:1808.02299
Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based lstm and semantic consistency. IEEE Trans Multimed 19 (9):2045–2055
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neur Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Hu J, Shen L, Albanie S, Sun G, Vedaldi A (2018) Gather-excite: exploiting feature context in convolutional neural networks. Adv Neur Inform Process Syst 31:9401–9411
Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv:1602.07360
Karpathy A, Fei-Fei L (2014) Deep visual-semantic alignments for generating image descriptions. CVPR, arXiv:cs.CV/1412.2306
Kiros R, Salakhutdinov R, Zemel R (2014) Unifying visual-semantic embeddings with multimodal neural language models. ACL, arXiv:1411.2539
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Kunar A (2020) Object detection with ssd and mobilenet, https://aditya-kunar-52859.medium.com/object-detection-with-ssd-and-mobilenet-aeedc5917ad0. Accessed 08 June 2021
Li X, Zhang X, Huang W, Wang Q (2021) Truncation cross entropy loss for remote sensing image captioning. IEEE Trans Geosci Remote Sens 59(6):5246–5257. https://doi.org/10.1109/TGRS.2020.3010106 https://doi.org/10.1109/TGRS.2020.3010106
Article Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Yg, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Mallela NC, Volety R, Srinivasa PR, Nadesh RK (2021) Detection of the triple riding and speed violation on two-wheelers using deep learning algorithms. Multimedia Tools and Application, https://doi.org/10.1007/s11042-020-10126-x
Panesar S, Sanjeev KS (2019) Motorcycle helmet use and its correlates in fatal crashes. Prof RK Sharma 13(3):43
Google Scholar
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Ruder S (2020) An overview of gradient descent optimization algorithms. arXiv:1609.04747
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 27
Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W (2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1390–1399
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In: Proceedings of the IEEE international conference on computer vision, pp 4534–4542
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision, Springer, pp 20–36
Wang Q, Huang W, Zhang X, Li X (2020) Word-sentence framework for remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing, https://doi.org/10.1109/TGRS.2020.3044054 https://doi.org/10.1109/TGRS.2020.3044054
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Indian Institute of Information Technology Kalyani, Kalyani, India
Madhuchhanda Dasgupta, Oishila Bandyopadhyay & Sanjay Chatterji

Authors

Madhuchhanda Dasgupta
View author publications
You can also search for this author in PubMed Google Scholar
Oishila Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Chatterji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oishila Bandyopadhyay.

Ethics declarations

Conflicts of interest/Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dasgupta, M., Bandyopadhyay, O. & Chatterji, S. Detection of helmetless motorcycle riders by video captioning using deep recurrent neural network. Multimed Tools Appl 82, 5857–5877 (2023). https://doi.org/10.1007/s11042-022-13473-z

Download citation

Received: 21 March 2021
Revised: 18 March 2022
Accepted: 13 July 2022
Published: 01 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11042-022-13473-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of helmetless motorcycle riders by video captioning using deep recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Traffic sign recognition based on deep learning

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detection of helmetless motorcycle riders by video captioning using deep recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Traffic sign recognition based on deep learning

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation