Real-Time Action Recognition in Surveillance Videos Using ConvNets

Luo, Sheng; Yang, Haojin; Wang, Cheng; Che, Xiaoyin; Meinel, Christoph

doi:10.1007/978-3-319-46675-0_58

Sheng Luo¹⁹,
Haojin Yang¹⁹,
Cheng Wang¹⁹,
Xiaoyin Che¹⁹ &
…
Christoph Meinel¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9949))

Included in the following conference series:

International Conference on Neural Information Processing

3161 Accesses
3 Citations

Abstract

The explosive growth of surveillance cameras and its 7 * 24 recording period brings massive surveillance videos data. Therefore how to efficiently retrieve the rare but important event information inside the videos is eager to be solved. Recently deep convolutinal networks shows its outstanding performance in event recognition on general videos. Hence we study the characteristic of surveillance video context and propose a very competitive ConvNets approach for real-time event recognition on surveillance videos. Our approach adopts two-steam ConvNets to respectively recognition spatial and temporal information of one action. In particular, we propose to use fast feature cascades and motion history image as the template of spatial and temporal stream. We conducted our experiments on UCF-ARG and UT-interaction dataset. The experimental results show that our approach acquires superior recognition accuracy and runs in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://youtu.be/IwG5Q0zwOzU.

References

Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004)
Chapter Google Scholar
UCF-ARG Data Set. http://crcv.ucf.edu/data/UCF-ARG.php. Accessed 10 Nov 2015
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011)
Google Scholar
Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR (2012)
Google Scholar
Bilinski, P., Bremond, F.: Statistics of pairwise co-occurring local spatio-temporal features for human action recognition. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part I. LNCS, vol. 7583, pp. 311–320. Springer, Heidelberg (2012)
Google Scholar
Bradski, G.: Dr. Dobb’s J. Softw. Tools (2000). Article ID 2236121
Google Scholar
Cropley, J.: Top video surveillance trends for 2016, February 2016. https://technology.ihs.com/api/binary/572252
Davis, J.W., Bobick, A.E.: The representation and recognition of human movement using temporal templates. In: IEEE Computer Society Conference on CVPR, 1997, pp. 928–934. IEEE (1997)
Google Scholar
Dollár, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC, vol. 2, p. 7. Citeseer (2010)
Google Scholar
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACMMM, pp. 675–678. ACM (2014)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on CVPR, pp. 1725–1732. IEEE (2014)
Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Ryoo, M.S., Chen, C.-C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 270–285. Springer, Heidelberg (2010)
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y.: Towards good practices for very deep two-stream convnets. CoRR abs/1507.02159 (2015)
Google Scholar
Wu, Z., Wang, X., Jiang, Y.G., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of the 23rd ACM MM, pp. 461–470. ACM (2015)
Google Scholar
Ye, H., Wu, Z., Zhao, R.W., Wang, X., Jiang, Y.G., Xue, X.: Evaluating two-stream CNN for video classification. In: ICMR 2015, pp. 435–442. ACM (2015)
Google Scholar
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: 2015 IEEE Conference on CVPR, pp. 4694–4702 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Str. 2-3, 14482, Potsdam, Germany
Sheng Luo, Haojin Yang, Cheng Wang, Xiaoyin Che & Christoph Meinel

Authors

Sheng Luo
View author publications
You can also search for this author in PubMed Google Scholar
Haojin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyin Che
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Luo .

Editor information

Editors and Affiliations

The University of Tokyo , Tokyo, Japan
Akira Hirose
Kobe University , Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology , Ikoma, Japan
Kazushi Ikeda
Kyungpook National University , Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences , Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, S., Yang, H., Wang, C., Che, X., Meinel, C. (2016). Real-Time Action Recognition in Surveillance Videos Using ConvNets. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_58

Download citation

DOI: https://doi.org/10.1007/978-3-319-46675-0_58
Published: 29 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46674-3
Online ISBN: 978-3-319-46675-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics