Abstract
This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of the proposed work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can result in reduced hardware utilization and faster recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust to outdoor as well as indoor testing scenarios. We have evaluated the performance of the proposed method on two benchmark action datasets and achieved more than 85 % accuracy. The proposed algorithm classifies actions with speed ( > 2,000 fps) approximately 100 times faster than existing state-of-the-art pixel-domain algorithms.







Similar content being viewed by others
References
Amiri SM, Nasiopoulos P, Leung, VCM (2012) Non-negative sparse coding for human action recognition. Proceedings of the IEEE International Conference on Image Processing
Babu RV, Anantharaman B, Ramakrishnan KR, Srinivasan SH (2002) Compressed domain action classification using HMM. Pattern Recogn Lett 23:1203–1213
Babu RV, Ramakrishnan KR (2004) Recognition of human actions using motion history information extracted from the compressed video. Image Vis Comput 22(8):597–607
Biswas S, Babu RV (2013) H.264 compressed video classification using histogram of oriented motion vectors (HOMV). In: Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2040–2044
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Proceedings of the Tenth International Conference on Computer Vision
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2 (27):1–27
Chuohao Y, Ahammad P, Ramchandran K, Sastry SS (2008) High-speed action recognition and localization in compressed domain videos. IEEE Trans Circ Syst Video Technol 18(8):1006–1015
Efros AA, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. Proc Int Conf Comp Vision 2:726–733
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
http://www.axis.com/products/video/about-networkvideo/compression-formats.htm
Joint model H.264/AVC reference software. http://iphome.hhi.de/suehring/tml/
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (ICCV)
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2/3):107–123
Li Z, Fu Y, Huang T, Yan S (2008) Real-time human action recognition by luminance field trajectory analysis. In: Proceedings of the 16th ACM International conference on Multimedia
Lin CA, Lin YY, Liao HYM, Jeng SK (2012) Action recognition using instance-specific and class-consistent cues. In: Proceedings of the IEEE International Conference on Image Processing
Liu C, Yuen PC (2010) Human action recognition using boosted eigenactions. Image Vis Comput 28(5):825–835
Ozer B, Wolf W, Akansu AN (2000) Human activity detection in MPEG sequences. In: Proceedings of the Workshop on Human Motion
Poppe R (2010) A survey on vision-based human action recognition. Int J Comput Vis 28(2/3):976–990
Sadek S, Al-Hamadi A, Michaelis B, Sayed U (2012) A fast statistical approach for human activity recognition. Int J Intell Sci 2(1):9–15
Schldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402
Sullivan G, Ohm J, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Video Technol 22(12):1649–1668
Tom M, Babu RV (2013) Fast moving-object detection in H.264/AVC compressed domain for video surveillance. In: Proceedings of the National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics
Wang H, Ullah MM, Klser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: Proceedings of British Machine Vision Conference
Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding 115(2):224–241
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circ Syst Video Technol 13(7):560–576
Wu B, Yuan C, Hu W (2012) Human action recognition based on a heat kernel structural descriptor. In: Proceedings of the IEEE International Conference on Image Processing
x264 reference software. http://www.videolan.org/developers/x264.html
Yu TH, Kim TK, Cipolla R (2010) Real-time action recognition by spatiotemporal semantic and structural forests. In: British Machine Vision Conference
Zhang X, Miao Z, Wan L (2012) Human action categories using motion descriptors. In: Proceedings of the IEEE International Conference on Image Processing
Acknowledgments
This work was supported by CARS (CARS-25) project from Centre for Artificial Intelligence and Robotics, Defence Research and Development Organization (DRDO), Govt. of India.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tom, M., Babu, R.V. & Praveen, R.G. Compressed domain human action recognition in H.264/AVC video streams. Multimed Tools Appl 74, 9323–9338 (2015). https://doi.org/10.1007/s11042-014-2083-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2083-2