Abstract
Dashcam video has become popular recently due to the safety of both individuals and communities. While individuals can have undeniable evidence for legal and insurance, communities can benefit from sharing these dashcam videos for further traffic education and criminal investigation. Moreover, relying on recent computer vision and AI development, a few companies have launched the so-called AI dashcam that can alert drivers to near-risk accidents (e.g., following distance detection, forward collision warning) to improve driver safety. Unfortunately, even though dashcam videos create a driver’s travel log (i.e., a traveling diary), little research focuses on creating a valuable and friendly tool to find any incident or event with few described sketches by users. Inspired by these observations, we introduce an interactive incident detection and retrieval system for first-view travel-log data that can retrieve fine-grained incidents for both defined and undefined incidents. Moreover, the system gives promising results when evaluated on several public datasets and popular text-image retrieval methods. The source code is published at https://github.com/PDD0911-HCMUS/Cross_Model_Attention
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Xu, Y., Liang, X., Dong, X.. Chen, W.: Intelligent Transportation System and Future of Road Safety. In: 2019 IEEE International Conference On Smart Cloud (SmartCloud), pp. 209–214 (2019)
Lee, K., Choi, J., Park, J., Lee., S.:D Metadata-driven Dashcam Analysis System. DFRWS APAC, Your Car Is Recording (2021)
Kim, J., Park, S., Lee, U.: Dashcam Witness: video sharing motives and privacy concerns across different nations. IEEE Access. 8, 110425–110437 (2020)
Evans, J., Waterson, B., Hamilton, A.: Evolution and Future of Urban Road Incident Detection Algorithms. J. Transport. Eng. Part A: Syst. 146 (2020)
Adamová, V.: Dashcam as a device to increase the road safety level. In: International Conference on Innovations in Science and Education (CBU), pp. 1–5 (2020)
Bazilinskyy, P., Eisma, Y., Dodou, D., Winter, J.: Risk perception: a study using dashcam videos and participants from different world regions. Traffic Inj. Prev. 21, 347–353 (2020)
Mohandu, A., Kubendiran, M.: Survey on Big Data Techniques in Intelligent Transportation System (ITS). Materials Today: Proceedings (2021)
Cao, M., Li, S., Li. T., Nie, L., Zhang, M.: Image-text Retrieval: a survey on recent research and development, https://doi.org/10.48550/arXiv.2203.14713
Frolov, S., Hinz, T., Raue, T., Hees, J., Dengel, A.: Adversarial Text-to-Image Synthesis: A Review, https://doi.org/10.48550/arXiv.2101.09983
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. CoRR. abs/1610.02357 (2016). http://arxiv.org/abs/1610.02357
Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. CoRR. abs/1908.08962 (2019). http://arxiv.org/abs/1908.08962
Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. ArXiv Preprint ArXiv:1908.08962v2 (2019)
Vaswani, A., et al.: Attention Is All You Need. CoRR. abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Tian, Y., Chen, L.: Cross-modal attention modulates tactile subitizing but not tactile numerosity estimation. Attention, Perception, Psychophys. 80(5), 1229–1239 (2018). https://doi.org/10.3758/s13414-018-1507-x
Wang, M., Xu, X., Yue, Q., Wang, Y.A.: Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. CoRR. abs/2101.12631 (2021). https://arxiv.org/abs/2101.12631
Guo, R., et al.: Accelerating Large-Scale Inference with Anisotropic Vector Quantization. International Conference On Machine Learning. (2020). https://arxiv.org/abs/1908.10396
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7, 535–547 (2019)
Wen, K., Gu, X., Cheng, Q.: Learning Dual Semantic Relations with Graph Attention for Image-Text Matching. CoRR. abs/2010.11550 (2020). https://arxiv.org/abs/2010.11550
Li, K., Zhang, Y., Li, K., Li, Y., Fu, Y.: Visual Semantic Reasoning for Image-Text Matching. CoRR. abs/1909.02701 (2019). http://arxiv.org/abs/1909.02701
Lee, K., Chen, X., Hua, G., Hu, H., He, X.: Stacked Cross Attention for Image-Text Matching. CoRR. abs/1803.08024 (2018). http://arxiv.org/abs/1803.08024
Dao, M., Pham, D., Nguyen, M., Nguyen, T., Zettu, K.: MM-trafficEvent: An Interactive Incident Retrieval System for First-view Travel-log Data. In: 2021 IEEE International Conference On Big Data (Big Data), pp. 4842–4851 (2021)
Haresh, S., Kumar, S., Zia, M.Z., Tran, Q.H.: Towards anomaly detection in dashcam videos. In: IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1407–1414
Yu, F., et al.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning (2020)
Levering, A., Tomko, M., Tuia, D., Khoshelham, K.: Detecting unsigned physical road incidents from driver-view images. IEEE Trans. Intell. Veh. 6(1), 24–33 (2021)
Zhao, P., Dao, M.-S., Nguyen, N.-T., Nguyen, T.-B., Dang-Nguyen, D.-T., Gurrin, C.: Overview of mediaeval 2020 insights for wellbeing: Multimodal personal health lifelog data analysis. In: MediaEval (2020)
Acknowledgement
The results of this study are based on collaborative research on “Research and Development of Interactive Visual Lifelog Retrieval Method for Multimedia Sensing” between National Institute of Information and Communications Technology, Japan and University of Science, Vietnam National University - Ho Chi Minh City, Vietnam from April 2020 to March 2022.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pham, DD., Dao, MS., Nguyen, TB. (2023). A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-27077-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)