Skip to main content

A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Abstract

Dashcam video has become popular recently due to the safety of both individuals and communities. While individuals can have undeniable evidence for legal and insurance, communities can benefit from sharing these dashcam videos for further traffic education and criminal investigation. Moreover, relying on recent computer vision and AI development, a few companies have launched the so-called AI dashcam that can alert drivers to near-risk accidents (e.g., following distance detection, forward collision warning) to improve driver safety. Unfortunately, even though dashcam videos create a driver’s travel log (i.e., a traveling diary), little research focuses on creating a valuable and friendly tool to find any incident or event with few described sketches by users. Inspired by these observations, we introduce an interactive incident detection and retrieval system for first-view travel-log data that can retrieve fine-grained incidents for both defined and undefined incidents. Moreover, the system gives promising results when evaluated on several public datasets and popular text-image retrieval methods. The source code is published at https://github.com/PDD0911-HCMUS/Cross_Model_Attention

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Xu, Y., Liang, X., Dong, X.. Chen, W.: Intelligent Transportation System and Future of Road Safety. In: 2019 IEEE International Conference On Smart Cloud (SmartCloud), pp. 209–214 (2019)

    Google Scholar 

  2. Lee, K., Choi, J., Park, J., Lee., S.:D Metadata-driven Dashcam Analysis System. DFRWS APAC, Your Car Is Recording (2021)

    Google Scholar 

  3. Kim, J., Park, S., Lee, U.: Dashcam Witness: video sharing motives and privacy concerns across different nations. IEEE Access. 8, 110425–110437 (2020)

    Article  Google Scholar 

  4. Evans, J., Waterson, B., Hamilton, A.: Evolution and Future of Urban Road Incident Detection Algorithms. J. Transport. Eng. Part A: Syst. 146 (2020)

    Google Scholar 

  5. Adamová, V.: Dashcam as a device to increase the road safety level. In: International Conference on Innovations in Science and Education (CBU), pp. 1–5 (2020)

    Google Scholar 

  6. Bazilinskyy, P., Eisma, Y., Dodou, D., Winter, J.: Risk perception: a study using dashcam videos and participants from different world regions. Traffic Inj. Prev. 21, 347–353 (2020)

    Article  Google Scholar 

  7. Mohandu, A., Kubendiran, M.: Survey on Big Data Techniques in Intelligent Transportation System (ITS). Materials Today: Proceedings (2021)

    Google Scholar 

  8. Cao, M., Li, S., Li. T., Nie, L., Zhang, M.: Image-text Retrieval: a survey on recent research and development, https://doi.org/10.48550/arXiv.2203.14713

  9. Frolov, S., Hinz, T., Raue, T., Hees, J., Dengel, A.: Adversarial Text-to-Image Synthesis: A Review, https://doi.org/10.48550/arXiv.2101.09983

  10. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. CoRR. abs/1610.02357 (2016). http://arxiv.org/abs/1610.02357

  11. Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. CoRR. abs/1908.08962 (2019). http://arxiv.org/abs/1908.08962

  12. Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. ArXiv Preprint ArXiv:1908.08962v2 (2019)

  13. Vaswani, A., et al.: Attention Is All You Need. CoRR. abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

  14. Tian, Y., Chen, L.: Cross-modal attention modulates tactile subitizing but not tactile numerosity estimation. Attention, Perception, Psychophys. 80(5), 1229–1239 (2018). https://doi.org/10.3758/s13414-018-1507-x

    Article  Google Scholar 

  15. Wang, M., Xu, X., Yue, Q., Wang, Y.A.: Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. CoRR. abs/2101.12631 (2021). https://arxiv.org/abs/2101.12631

  16. Guo, R., et al.: Accelerating Large-Scale Inference with Anisotropic Vector Quantization. International Conference On Machine Learning. (2020). https://arxiv.org/abs/1908.10396

  17. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7, 535–547 (2019)

    Article  Google Scholar 

  18. Wen, K., Gu, X., Cheng, Q.: Learning Dual Semantic Relations with Graph Attention for Image-Text Matching. CoRR. abs/2010.11550 (2020). https://arxiv.org/abs/2010.11550

  19. Li, K., Zhang, Y., Li, K., Li, Y., Fu, Y.: Visual Semantic Reasoning for Image-Text Matching. CoRR. abs/1909.02701 (2019). http://arxiv.org/abs/1909.02701

  20. Lee, K., Chen, X., Hua, G., Hu, H., He, X.: Stacked Cross Attention for Image-Text Matching. CoRR. abs/1803.08024 (2018). http://arxiv.org/abs/1803.08024

  21. Dao, M., Pham, D., Nguyen, M., Nguyen, T., Zettu, K.: MM-trafficEvent: An Interactive Incident Retrieval System for First-view Travel-log Data. In: 2021 IEEE International Conference On Big Data (Big Data), pp. 4842–4851 (2021)

    Google Scholar 

  22. Haresh, S., Kumar, S., Zia, M.Z., Tran, Q.H.: Towards anomaly detection in dashcam videos. In: IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1407–1414

    Google Scholar 

  23. Yu, F., et al.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning (2020)

    Google Scholar 

  24. Levering, A., Tomko, M., Tuia, D., Khoshelham, K.: Detecting unsigned physical road incidents from driver-view images. IEEE Trans. Intell. Veh. 6(1), 24–33 (2021)

    Article  Google Scholar 

  25. Zhao, P., Dao, M.-S., Nguyen, N.-T., Nguyen, T.-B., Dang-Nguyen, D.-T., Gurrin, C.: Overview of mediaeval 2020 insights for wellbeing: Multimodal personal health lifelog data analysis. In: MediaEval (2020)

    Google Scholar 

Download references

Acknowledgement

The results of this study are based on collaborative research on “Research and Development of Interactive Visual Lifelog Retrieval Method for Multimedia Sensing” between National Institute of Information and Communications Technology, Japan and University of Science, Vietnam National University - Ho Chi Minh City, Vietnam from April 2020 to March 2022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh-Son Dao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pham, DD., Dao, MS., Nguyen, TB. (2023). A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics