Skip to main content

Anticipating Accidents in Dashcam Videos

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10114))

Abstract

We propose a Dynamic-Spatial-Attention (DSA) Recurrent Neural Network (RNN) for anticipating accidents in dashcam videos (Fig. 1). Our DSA-RNN learns to (1) distribute soft-attention to candidate objects dynamically to gather subtle cues and (2) model the temporal dependencies of all cues to robustly anticipate an accident. Anticipating accidents is much less addressed than anticipating events such as changing a lane, making a turn, etc., since accidents are rare to be observed and can happen in many different ways mostly in a sudden. To overcome these challenges, we (1) utilize state-of-the-art object detector [3] to detect candidate objects, and (2) incorporate full-frame and object-based appearance and motion features in our model. We also harvest a diverse dataset of 678 dashcam accident videos on the web (Fig. 3). The dataset is unique, since various accidents (e.g., a motorbike hits a car, a car hits another car, etc.) occur in all videos. We manually mark the time-location of accidents and use them as supervision to train and evaluate our method. We show that our method anticipates accidents about 2 s before they occur with 80% recall and 56.14% precision. Most importantly, it achieves the highest mean average precision (74.35%) outperforming other baselines without attention or RNN.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The subscript \(*\) denotes any symbol.

  2. 2.

    \(\varvec{\alpha }_t\) is often omitted for conciseness.

  3. 3.

    https://www.youtube.com/watch?v=YHFvSCAg4DE.

  4. 4.

    Hence, we use the first 90 frames to anticipate accidents.

  5. 5.

    IDT also includes Histogram of Oriented Gradient (HOG) [37] (an appearance feature) on the motion boundary.

  6. 6.

    Human, bicycle, motorbike, car and bus.

References

  1. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  2. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)

    Google Scholar 

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  4. Google Inc.: Google self-driving car project monthly report (2015)

    Google Scholar 

  5. National highway traffic safety administration: 2012 motor vehicle crashes: overview (2013)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  7. Jain, A., Singh, A., Koppula, H.S., Soh, S., Saxena, A.: Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In: ICRA (2016)

    Google Scholar 

  8. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)

    Google Scholar 

  9. Hoai, M., De la Torre, F.: Max-margin early event detectors. In: CVPR (2012)

    Google Scholar 

  10. Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_45

    Google Scholar 

  11. Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_15

    Chapter  Google Scholar 

  12. Yuen, J., Torralba, A.: A data-driven approach for event prediction. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 707–720. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15552-9_51

    Chapter  Google Scholar 

  13. Walker, J., Gupta, A., Hebert, M.: Patch to the future: unsupervised visual prediction. In: CVPR (2014)

    Google Scholar 

  14. Wang, Z., Deisenroth, M., Ben Amor, H., Vogt, D., Schölkopf, B., Peters, J.: Probabilistic modeling of human movements for intention inference. In: RSS (2012)

    Google Scholar 

  15. Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. PAMI 38, 14–29 (2016)

    Article  Google Scholar 

  16. Koppula, H.S., Jain, A., Saxena, A.: Anticipatory planning for human-robot teams. In: ISER (2014)

    Google Scholar 

  17. Mainprice, J., Berenson, D.: Human-robot collaborative manipulation planning using early prediction of human motion. In: IROS (2013)

    Google Scholar 

  18. Berndt, H., Emmert, J., Dietmayer, K.: Continuous driver intention recognition with hidden markov models. In: Intelligent Transportation Systems (2008)

    Google Scholar 

  19. Frohlich, B., Enzweiler, M., Franke, U.: Will this car change the lane? - Turn signal recognition in the frequency domain. In: Intelligent Vehicles Symposium (IV) (2014)

    Google Scholar 

  20. Kumar, P., Perrollaz, M., Lefévre, S., Laugier, C.: Learning-based approach for online lane change intention prediction. In: Intelligent Vehicles Symposium (IV) (2013)

    Google Scholar 

  21. Liebner, M., Baumann, M., Klanner, F., Stiller, C.: Driver intent inference at urban intersections using the intelligent driver model. In: Intelligent Vehicles Symposium (IV) (2012)

    Google Scholar 

  22. Morris, B., Doshi, A., Trivedi, M.: Lane change intent prediction for driver assistance: on-road design and evaluation. In: Intelligent Vehicles Symposium (IV) (2011)

    Google Scholar 

  23. Doshi, A., Morris, B., Trivedi, M.: On-road prediction of driver’s intent with multimodal sensory cues. IEEE Pervasive Comput. 10, 22–34 (2011)

    Article  Google Scholar 

  24. Trivedi, M.M., Gandhi, T., McCall, J.: Looking-in and looking-out of a vehicle: computer-vision-based enhanced vehicle safety. IEEE Trans. Intell. Transp. Syst. 8, 108–120 (2007)

    Article  Google Scholar 

  25. Jain, A., Koppula, H.S., Raghavan, B., Soh, S., Saxena, A.: Car that knows before you do: anticipating maneuvers via learning temporal driving models. In: ICCV (2015)

    Google Scholar 

  26. Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville., A.: Describing videos by exploiting temporal structure. In: ICCV (2015)

    Google Scholar 

  27. Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. arXiv preprint (2015). arXiv:1502.03044

  28. Mnih, V., Heess, N., Graves, A., kavukcuoglu, k.: Recurrent models of visual attention. In: NIPS (2014)

    Google Scholar 

  29. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. In: ICLR (2015)

    Google Scholar 

  30. Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88682-2_5

    Chapter  Google Scholar 

  31. Leibe, B., Cornelis, N., Cornelis, K., Gool, L.V.: Dynamic 3D scene analysis from a moving vehicle. In: CVPR (2007)

    Google Scholar 

  32. Scharwächter, T., Enzweiler, M., Franke, U., Roth, S.: Efficient multi-cue scene segmentation. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 435–445. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40602-7_46

    Chapter  Google Scholar 

  33. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)

    Google Scholar 

  34. Cordts, M., Omran, M., Ramos, S., Scharwächter, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset. In: CVPR Workshop on the Future of Datasets in Vision (2015)

    Google Scholar 

  35. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. arXiv preprint (2012). arXiv:1211.5063

  36. Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 1550–1560 (1990)

    Article  Google Scholar 

  37. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

    Google Scholar 

  38. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_48

    Google Scholar 

  39. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software. tensorflow.org

Download references

Acknowledgements

We thank Industrial Technology Research Institute for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fu-Hsiang Chan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 587 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chan, FH., Chen, YT., Xiang, Y., Sun, M. (2017). Anticipating Accidents in Dashcam Videos. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10114. Springer, Cham. https://doi.org/10.1007/978-3-319-54190-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54190-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54189-1

  • Online ISBN: 978-3-319-54190-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics