Skip to main content
Log in

Joint representation learning of appearance and motion for abnormal event detection

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a joint learning of spatio-temporal representation based on 3D deep convolutional neural network for simultaneous representation of appearance and motion information in 3D volumes which are extracted from the multiple consecutive frames, and an end-to-end learning framework to detect abnormal events in surveillance scenes. By using the joint learning approach, the proposed framework can detect various abnormal events which can appear with diverse motion and appearance patterns. The proposed framework detects abnormal events in each volume by analyzing the spatio-temporal representation trained by the joint learning method. This volume-level event detection approach makes it possible to localize an abnormal event. We verify the proposed joint learning and the framework on the publicly available abnormal event datasets containing UMN dataset, UCSD dataset, and subway dataset, by comparing it with existing state-of-the-art methods. The experimental results demonstrate that the proposed joint learning and event detection method not only detect various abnormal events more efficiently but also localize anomalous regions more accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008)

    Article  Google Scholar 

  2. Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, IEEE, pp. 1–8 (2008)

  3. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 4724–4733 (2017)

  4. Chong, Y.S., Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks, Springer, pp. 189–196 (2017)

  5. Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn. 46, 1851–1864 (2013)

    Article  Google Scholar 

  6. Cui, X., Liu, Q., Gao, M., Metaxas, D.N.: Abnormal detection using interaction energy potentials. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, IEEE, pp. 3161–3167 (2011)

  7. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 8609–8613 (2013)

  8. Du, D., Qi, H., Huang, Q., Zeng, W., Zhang, C.: Abnormal event detection in crowded scenes based on structural multi-scale motion interrelated patterns. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp. 1–6 (2013)

  9. Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)

  10. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 733–742 (2016)

  11. Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2555–2562 (2013)

  12. Javed, O., Shah, M.: Tracking and object classification for automated surveillance. In: European Conference on Computer Vision, Springer, pp. 343–357 (2002)

  13. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  14. Jiang, Y.G., Dai, Q., Liu, W., Xue, X., Ngo, C.W.: Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans. Image Process. 24(11), 3781–3795 (2015)

    Article  MathSciNet  Google Scholar 

  15. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

  16. Kim, J., Grauman, K.: Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009), IEEE, pp. 2921–2928 (2009)

  17. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  18. Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008—19th British Machine Vision Conference, British Machine Vision Association, pp. 275-1 (2008)

  19. Kratz, L., Nishino, K.: Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009), IEEE, pp. 1446–1453 (2009)

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 1097–1105. Curran Associates, Inc. (2012)

  21. LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, pp. 396–404. MIT Press, Cambridge (1990)

    Google Scholar 

  22. Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013)

  23. Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 1975–1981

  24. Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: International Conference on Artificial Neural Networks, Springer, pp. 52–59 (2011)

  25. Maturana, D., Scherer, S.: Voxnet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 922–928 (2015)

  26. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009), IEEE, pp. 935–942 (2009)

  27. Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)

  28. Mousavi, H., Mohammadi, S., Perina, A., Chellali, R., Murino, V.: Analyzing tracklets for the detection of abnormal crowd behavior. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 148–155 (2015)

  29. Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.: Real-time anomaly detection and localization in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–62 (2015)

  30. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, pp. 568–576. MIT Press, Cambridge (2014a)

    Google Scholar 

  31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014b)

  32. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, pp. 4489–4497 (2015)

  33. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), IEEE, pp. 3551–3558 (2013)

  34. Wang, X., Tieu, K., Grimson, E.: Learning semantic scene models by trajectory analysis. In: European Conference on Computer Vision, Springer, pp. 110–123 (2006)

  35. Wang, X., Ma, X., Grimson, E.: Unsupervised activity perception by hierarchical bayesian models. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07, IEEE, pp. 1–8 (2007)

  36. Wang, X., Ma, X., Grimson, W.E.L.: Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 539–555 (2009)

    Article  Google Scholar 

  37. Xiang, T., Gong, S.: Incremental and adaptive abnormal behaviour detection. Comput. Vis. Image Underst. 111(1), 59–73 (2008)

    Article  Google Scholar 

  38. Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015)

  39. Y Cong, Y., Yuan, J., Liu, J.: Sparse reconstruction cost for abnormal event detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, IEEE, pp. 3449–3456 (2011)

  40. Zhang, Y., Qin, L., Yao, H., Huang, Q.: Abnormal crowd behavior detection based on social attribute-aware force model. In: 19th IEEE International Conference on Image Processing (ICIP), 2012, IEEE, pp. 2689–2692 (2012)

  41. Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, IEEE, pp. 3313–3320 (2011)

Download references

Acknowledgements

This work was supported by Institute for Information and Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. B0101-15-0525, Development of global multi-target tracking and event prediction techniques based on real-time large-scale video analysis).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moongu Jeon.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Yow, K.C. & Jeon, M. Joint representation learning of appearance and motion for abnormal event detection. Machine Vision and Applications 29, 1157–1170 (2018). https://doi.org/10.1007/s00138-018-0961-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-018-0961-8

Keywords

Navigation