A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos

Pham, Dinh-Duy; Dao, Minh-Son; Nguyen, Thanh-Binh

doi:10.1007/978-3-031-27077-2_32

A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos

Conference paper
First Online: 29 March 2023

1343 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Abstract

Dashcam video has become popular recently due to the safety of both individuals and communities. While individuals can have undeniable evidence for legal and insurance, communities can benefit from sharing these dashcam videos for further traffic education and criminal investigation. Moreover, relying on recent computer vision and AI development, a few companies have launched the so-called AI dashcam that can alert drivers to near-risk accidents (e.g., following distance detection, forward collision warning) to improve driver safety. Unfortunately, even though dashcam videos create a driver’s travel log (i.e., a traveling diary), little research focuses on creating a valuable and friendly tool to find any incident or event with few described sketches by users. Inspired by these observations, we introduce an interactive incident detection and retrieval system for first-view travel-log data that can retrieve fine-grained incidents for both defined and undefined incidents. Moreover, the system gives promising results when evaluated on several public datasets and popular text-image retrieval methods. The source code is published at https://github.com/PDD0911-HCMUS/Cross_Model_Attention

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Xu, Y., Liang, X., Dong, X.. Chen, W.: Intelligent Transportation System and Future of Road Safety. In: 2019 IEEE International Conference On Smart Cloud (SmartCloud), pp. 209–214 (2019)
Google Scholar
Lee, K., Choi, J., Park, J., Lee., S.:D Metadata-driven Dashcam Analysis System. DFRWS APAC, Your Car Is Recording (2021)
Google Scholar
Kim, J., Park, S., Lee, U.: Dashcam Witness: video sharing motives and privacy concerns across different nations. IEEE Access. 8, 110425–110437 (2020)
Article Google Scholar
Evans, J., Waterson, B., Hamilton, A.: Evolution and Future of Urban Road Incident Detection Algorithms. J. Transport. Eng. Part A: Syst. 146 (2020)
Google Scholar
Adamová, V.: Dashcam as a device to increase the road safety level. In: International Conference on Innovations in Science and Education (CBU), pp. 1–5 (2020)
Google Scholar
Bazilinskyy, P., Eisma, Y., Dodou, D., Winter, J.: Risk perception: a study using dashcam videos and participants from different world regions. Traffic Inj. Prev. 21, 347–353 (2020)
Article Google Scholar
Mohandu, A., Kubendiran, M.: Survey on Big Data Techniques in Intelligent Transportation System (ITS). Materials Today: Proceedings (2021)
Google Scholar
Cao, M., Li, S., Li. T., Nie, L., Zhang, M.: Image-text Retrieval: a survey on recent research and development, https://doi.org/10.48550/arXiv.2203.14713
Frolov, S., Hinz, T., Raue, T., Hees, J., Dengel, A.: Adversarial Text-to-Image Synthesis: A Review, https://doi.org/10.48550/arXiv.2101.09983
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. CoRR. abs/1610.02357 (2016). http://arxiv.org/abs/1610.02357
Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. CoRR. abs/1908.08962 (2019). http://arxiv.org/abs/1908.08962
Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. ArXiv Preprint ArXiv:1908.08962v2 (2019)
Vaswani, A., et al.: Attention Is All You Need. CoRR. abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Tian, Y., Chen, L.: Cross-modal attention modulates tactile subitizing but not tactile numerosity estimation. Attention, Perception, Psychophys. 80(5), 1229–1239 (2018). https://doi.org/10.3758/s13414-018-1507-x
Article Google Scholar
Wang, M., Xu, X., Yue, Q., Wang, Y.A.: Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. CoRR. abs/2101.12631 (2021). https://arxiv.org/abs/2101.12631
Guo, R., et al.: Accelerating Large-Scale Inference with Anisotropic Vector Quantization. International Conference On Machine Learning. (2020). https://arxiv.org/abs/1908.10396
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7, 535–547 (2019)
Article Google Scholar
Wen, K., Gu, X., Cheng, Q.: Learning Dual Semantic Relations with Graph Attention for Image-Text Matching. CoRR. abs/2010.11550 (2020). https://arxiv.org/abs/2010.11550
Li, K., Zhang, Y., Li, K., Li, Y., Fu, Y.: Visual Semantic Reasoning for Image-Text Matching. CoRR. abs/1909.02701 (2019). http://arxiv.org/abs/1909.02701
Lee, K., Chen, X., Hua, G., Hu, H., He, X.: Stacked Cross Attention for Image-Text Matching. CoRR. abs/1803.08024 (2018). http://arxiv.org/abs/1803.08024
Dao, M., Pham, D., Nguyen, M., Nguyen, T., Zettu, K.: MM-trafficEvent: An Interactive Incident Retrieval System for First-view Travel-log Data. In: 2021 IEEE International Conference On Big Data (Big Data), pp. 4842–4851 (2021)
Google Scholar
Haresh, S., Kumar, S., Zia, M.Z., Tran, Q.H.: Towards anomaly detection in dashcam videos. In: IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1407–1414
Google Scholar
Yu, F., et al.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning (2020)
Google Scholar
Levering, A., Tomko, M., Tuia, D., Khoshelham, K.: Detecting unsigned physical road incidents from driver-view images. IEEE Trans. Intell. Veh. 6(1), 24–33 (2021)
Article Google Scholar
Zhao, P., Dao, M.-S., Nguyen, N.-T., Nguyen, T.-B., Dang-Nguyen, D.-T., Gurrin, C.: Overview of mediaeval 2020 insights for wellbeing: Multimodal personal health lifelog data analysis. In: MediaEval (2020)
Google Scholar

Download references

Acknowledgement

The results of this study are based on collaborative research on “Research and Development of Interactive Visual Lifelog Retrieval Method for Multimedia Sensing” between National Institute of Information and Communications Technology, Japan and University of Science, Vietnam National University - Ho Chi Minh City, Vietnam from April 2020 to March 2022.

Author information

Authors and Affiliations

AISIA Research Lab-University of Science-Vietnam National University, Ho Chi Minh City, Vietnam
Dinh-Duy Pham & Thanh-Binh Nguyen
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao

Authors

Dinh-Duy Pham
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Son Dao
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Binh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minh-Son Dao .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pham, DD., Dao, MS., Nguyen, TB. (2023). A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-27077-2_32
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics