Joint representation learning of appearance and motion for abnormal event detection

Yu, Jongmin; Yow, Kin Choong; Jeon, Moongu

doi:10.1007/s00138-018-0961-8

Joint representation learning of appearance and motion for abnormal event detection

Original Paper
Published: 26 July 2018

Volume 29, pages 1157–1170, (2018)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Jongmin Yu¹,
Kin Choong Yow¹ &
Moongu Jeon¹

640 Accesses
17 Citations
Explore all metrics

Abstract

In this paper, we propose a joint learning of spatio-temporal representation based on 3D deep convolutional neural network for simultaneous representation of appearance and motion information in 3D volumes which are extracted from the multiple consecutive frames, and an end-to-end learning framework to detect abnormal events in surveillance scenes. By using the joint learning approach, the proposed framework can detect various abnormal events which can appear with diverse motion and appearance patterns. The proposed framework detects abnormal events in each volume by analyzing the spatio-temporal representation trained by the joint learning method. This volume-level event detection approach makes it possible to localize an abnormal event. We verify the proposed joint learning and the framework on the publicly available abnormal event datasets containing UMN dataset, UCSD dataset, and subway dataset, by comparing it with existing state-of-the-art methods. The experimental results demonstrate that the proposed joint learning and event detection method not only detect various abnormal events more efficiently but also localize anomalous regions more accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

Abnormal Events Detection Using Deep Networks for Video Surveillance

AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance

Article 02 January 2024

Aqib Mumtaz, Allah Bux Sargano & Zulfiqar Habib

Robust learning for real-world anomalies in surveillance videos

Article 31 January 2023

Aqib Mumtaz, Allah Bux Sargano & Zulfiqar Habib

References

Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008)
Article Google Scholar
Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, IEEE, pp. 1–8 (2008)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 4724–4733 (2017)
Chong, Y.S., Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks, Springer, pp. 189–196 (2017)
Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn. 46, 1851–1864 (2013)
Article Google Scholar
Cui, X., Liu, Q., Gao, M., Metaxas, D.N.: Abnormal detection using interaction energy potentials. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, IEEE, pp. 3161–3167 (2011)
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 8609–8613 (2013)
Du, D., Qi, H., Huang, Q., Zeng, W., Zhang, C.: Abnormal event detection in crowded scenes based on structural multi-scale motion interrelated patterns. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp. 1–6 (2013)
Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 733–742 (2016)
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2555–2562 (2013)
Javed, O., Shah, M.: Tracking and object classification for automated surveillance. In: European Conference on Computer Vision, Springer, pp. 343–357 (2002)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Jiang, Y.G., Dai, Q., Liu, W., Xue, X., Ngo, C.W.: Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans. Image Process. 24(11), 3781–3795 (2015)
Article MathSciNet Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Kim, J., Grauman, K.: Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009), IEEE, pp. 2921–2928 (2009)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008—19th British Machine Vision Conference, British Machine Vision Association, pp. 275-1 (2008)
Kratz, L., Nishino, K.: Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009), IEEE, pp. 1446–1453 (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 1097–1105. Curran Associates, Inc. (2012)
LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, pp. 396–404. MIT Press, Cambridge (1990)
Google Scholar
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013)
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 1975–1981
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: International Conference on Artificial Neural Networks, Springer, pp. 52–59 (2011)
Maturana, D., Scherer, S.: Voxnet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 922–928 (2015)
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009), IEEE, pp. 935–942 (2009)
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)
Mousavi, H., Mohammadi, S., Perina, A., Chellali, R., Murino, V.: Analyzing tracklets for the detection of abnormal crowd behavior. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 148–155 (2015)
Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.: Real-time anomaly detection and localization in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–62 (2015)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, pp. 568–576. MIT Press, Cambridge (2014a)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014b)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, pp. 4489–4497 (2015)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), IEEE, pp. 3551–3558 (2013)
Wang, X., Tieu, K., Grimson, E.: Learning semantic scene models by trajectory analysis. In: European Conference on Computer Vision, Springer, pp. 110–123 (2006)
Wang, X., Ma, X., Grimson, E.: Unsupervised activity perception by hierarchical bayesian models. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07, IEEE, pp. 1–8 (2007)
Wang, X., Ma, X., Grimson, W.E.L.: Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 539–555 (2009)
Article Google Scholar
Xiang, T., Gong, S.: Incremental and adaptive abnormal behaviour detection. Comput. Vis. Image Underst. 111(1), 59–73 (2008)
Article Google Scholar
Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015)
Y Cong, Y., Yuan, J., Liu, J.: Sparse reconstruction cost for abnormal event detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, IEEE, pp. 3449–3456 (2011)
Zhang, Y., Qin, L., Yao, H., Huang, Q.: Abnormal crowd behavior detection based on social attribute-aware force model. In: 19th IEEE International Conference on Image Processing (ICIP), 2012, IEEE, pp. 2689–2692 (2012)
Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, IEEE, pp. 3313–3320 (2011)

Download references

Acknowledgements

This work was supported by Institute for Information and Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. B0101-15-0525, Development of global multi-target tracking and event prediction techniques based on real-time large-scale video analysis).

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, GIST, Gwangju, South Korea
Jongmin Yu, Kin Choong Yow & Moongu Jeon

Authors

Jongmin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kin Choong Yow
View author publications
You can also search for this author in PubMed Google Scholar
Moongu Jeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moongu Jeon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J., Yow, K.C. & Jeon, M. Joint representation learning of appearance and motion for abnormal event detection. Machine Vision and Applications 29, 1157–1170 (2018). https://doi.org/10.1007/s00138-018-0961-8

Download citation

Received: 16 August 2017
Revised: 26 March 2018
Accepted: 25 June 2018
Published: 26 July 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s00138-018-0961-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint representation learning of appearance and motion for abnormal event detection

Abstract

Access this article

Similar content being viewed by others

Abnormal Events Detection Using Deep Networks for Video Surveillance

AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance

Robust learning for real-world anomalies in surveillance videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint representation learning of appearance and motion for abnormal event detection

Abstract

Access this article

Similar content being viewed by others

Abnormal Events Detection Using Deep Networks for Video Surveillance

AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillance

Robust learning for real-world anomalies in surveillance videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation