Anticipating Accidents in Dashcam Videos

Chan, Fu-Hsiang; Chen, Yu-Ting; Xiang, Yu; Sun, Min

doi:10.1007/978-3-319-54190-7_9

Anticipating Accidents in Dashcam Videos

Fu-Hsiang Chan¹⁷,
Yu-Ting Chen¹⁷,
Yu Xiang¹⁸ &
…
Min Sun¹⁷

Conference paper
First Online: 12 March 2017

3774 Accesses
85 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10114))

Abstract

We propose a Dynamic-Spatial-Attention (DSA) Recurrent Neural Network (RNN) for anticipating accidents in dashcam videos (Fig. 1). Our DSA-RNN learns to (1) distribute soft-attention to candidate objects dynamically to gather subtle cues and (2) model the temporal dependencies of all cues to robustly anticipate an accident. Anticipating accidents is much less addressed than anticipating events such as changing a lane, making a turn, etc., since accidents are rare to be observed and can happen in many different ways mostly in a sudden. To overcome these challenges, we (1) utilize state-of-the-art object detector [3] to detect candidate objects, and (2) incorporate full-frame and object-based appearance and motion features in our model. We also harvest a diverse dataset of 678 dashcam accident videos on the web (Fig. 3). The dataset is unique, since various accidents (e.g., a motorbike hits a car, a car hits another car, etc.) occur in all videos. We manually mark the time-location of accidents and use them as supervision to train and evaluate our method. We show that our method anticipates accidents about 2 s before they occur with 80% recall and 56.14% precision. Most importantly, it achieves the highest mean average precision (74.35%) outperforming other baselines without attention or RNN.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The subscript \(*\) denotes any symbol.
2.
\(\varvec{\alpha }_t\) is often omitted for conciseness.
3.
https://www.youtube.com/watch?v=YHFvSCAg4DE.
4.
Hence, we use the first 90 frames to anticipate accidents.
5.
IDT also includes Histogram of Oriented Gradient (HOG) [37] (an appearance feature) on the motion boundary.
6.
Human, bicycle, motorbike, car and bus.

References

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Google Inc.: Google self-driving car project monthly report (2015)
Google Scholar
National highway traffic safety administration: 2012 motor vehicle crashes: overview (2013)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Jain, A., Singh, A., Koppula, H.S., Soh, S., Saxena, A.: Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In: ICRA (2016)
Google Scholar
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)
Google Scholar
Hoai, M., De la Torre, F.: Max-margin early event detectors. In: CVPR (2012)
Google Scholar
Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_45
Google Scholar
Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_15
Chapter Google Scholar
Yuen, J., Torralba, A.: A data-driven approach for event prediction. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 707–720. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15552-9_51
Chapter Google Scholar
Walker, J., Gupta, A., Hebert, M.: Patch to the future: unsupervised visual prediction. In: CVPR (2014)
Google Scholar
Wang, Z., Deisenroth, M., Ben Amor, H., Vogt, D., Schölkopf, B., Peters, J.: Probabilistic modeling of human movements for intention inference. In: RSS (2012)
Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. PAMI 38, 14–29 (2016)
Article Google Scholar
Koppula, H.S., Jain, A., Saxena, A.: Anticipatory planning for human-robot teams. In: ISER (2014)
Google Scholar
Mainprice, J., Berenson, D.: Human-robot collaborative manipulation planning using early prediction of human motion. In: IROS (2013)
Google Scholar
Berndt, H., Emmert, J., Dietmayer, K.: Continuous driver intention recognition with hidden markov models. In: Intelligent Transportation Systems (2008)
Google Scholar
Frohlich, B., Enzweiler, M., Franke, U.: Will this car change the lane? - Turn signal recognition in the frequency domain. In: Intelligent Vehicles Symposium (IV) (2014)
Google Scholar
Kumar, P., Perrollaz, M., Lefévre, S., Laugier, C.: Learning-based approach for online lane change intention prediction. In: Intelligent Vehicles Symposium (IV) (2013)
Google Scholar
Liebner, M., Baumann, M., Klanner, F., Stiller, C.: Driver intent inference at urban intersections using the intelligent driver model. In: Intelligent Vehicles Symposium (IV) (2012)
Google Scholar
Morris, B., Doshi, A., Trivedi, M.: Lane change intent prediction for driver assistance: on-road design and evaluation. In: Intelligent Vehicles Symposium (IV) (2011)
Google Scholar
Doshi, A., Morris, B., Trivedi, M.: On-road prediction of driver’s intent with multimodal sensory cues. IEEE Pervasive Comput. 10, 22–34 (2011)
Article Google Scholar
Trivedi, M.M., Gandhi, T., McCall, J.: Looking-in and looking-out of a vehicle: computer-vision-based enhanced vehicle safety. IEEE Trans. Intell. Transp. Syst. 8, 108–120 (2007)
Article Google Scholar
Jain, A., Koppula, H.S., Raghavan, B., Soh, S., Saxena, A.: Car that knows before you do: anticipating maneuvers via learning temporal driving models. In: ICCV (2015)
Google Scholar
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville., A.: Describing videos by exploiting temporal structure. In: ICCV (2015)
Google Scholar
Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. arXiv preprint (2015). arXiv:1502.03044
Mnih, V., Heess, N., Graves, A., kavukcuoglu, k.: Recurrent models of visual attention. In: NIPS (2014)
Google Scholar
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. In: ICLR (2015)
Google Scholar
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88682-2_5
Chapter Google Scholar
Leibe, B., Cornelis, N., Cornelis, K., Gool, L.V.: Dynamic 3D scene analysis from a moving vehicle. In: CVPR (2007)
Google Scholar
Scharwächter, T., Enzweiler, M., Franke, U., Roth, S.: Efficient multi-cue scene segmentation. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 435–445. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40602-7_46
Chapter Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
Cordts, M., Omran, M., Ramos, S., Scharwächter, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset. In: CVPR Workshop on the Future of Datasets in Vision (2015)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. arXiv preprint (2012). arXiv:1211.5063
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 1550–1560 (1990)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_48
Google Scholar
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software. tensorflow.org

Download references

Acknowledgements

We thank Industrial Technology Research Institute for their support.

Author information

Authors and Affiliations

Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
Fu-Hsiang Chan, Yu-Ting Chen & Min Sun
Department of Computer Science and Engineering, University of Washington, Seattle, USA
Yu Xiang

Authors

Fu-Hsiang Chan
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Ting Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Min Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fu-Hsiang Chan .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo, Tokyo, Japan
Yoichi Sato

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 587 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chan, FH., Chen, YT., Xiang, Y., Sun, M. (2017). Anticipating Accidents in Dashcam Videos. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10114. Springer, Cham. https://doi.org/10.1007/978-3-319-54190-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-54190-7_9
Published: 12 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54189-1
Online ISBN: 978-3-319-54190-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics