ABSTRACT
Human activity recognition in unconstrained RGB--D videos has extensive applications in surveillance, multimedia data analytics, human-computer interaction, etc, but remains a challenging problem due to the background clutter, camera motion, viewpoint changes, etc. We develop a novel RGB--D activity recognition approach that leverages the dense trajectory feature in RGB videos. By mapping the 2D positions of the dense trajectories from RGB video to the corresponding positions in the depth video, we can recover the 3D trajectory of the tracked interest points, which captures important motion information along the depth direction. To characterize the 3D trajectories, we apply motion boundary histogram (MBH) to depth direction and propose 3D trajectory shape descriptors. Our proposed 3D trajectory feature is a good complementary to dense trajectory feature extracted from RGB video only. The performance evaluation on a challenging unconstrained RGB--D activity recognition dataset, i.e., Hollywood 3D, shows that our proposed method outperforms the baseline methods (STIP-based) significantly, and achieves the state-of-the-art performance.
- Chang, C.-C., and Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM Trans. on Intell. Syst. Tech. 2, 27:1--27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
- Farnebäck, G. 2003. Two-frame motion estimation based on polynomial expansion. In SCIA. 363--370. Google ScholarDigital Library
- Hadfield, S., and Bowden, R. 2013. Hollywood 3d: Recognizing actions in 3d natural scenes. In CVPR, 3398--3405. Google ScholarDigital Library
- Herbst, E., Ren, X., and Fox, D. 2013. Rgb-d flow: Dense 3-d motion estimation using color and depth. In ICRA, 2276--2282.Google Scholar
- Kliper-Gross, O., Gurovich, Y., Hassner, T., and Wolf, L. 2012. Motion interchange patterns for action recognition in unconstrained videos. In ECCV. 256--269. Google ScholarDigital Library
- Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. 2008. Learning realistic human actions from movies. In CVPR, 1--8.Google Scholar
- Laptev, I. 2005. On space-time interest points. Int. J. of Comput. Vision 64, 2-3, 107--123. Google ScholarDigital Library
- Liu, J., Luo, J., and Shah, M. 2009. Recognizing realistic actions from videos "in the wild". In CVPR, 1996--2003.Google Scholar
- Ni, B., Wang, G., and Moulin, P. 2013. Rgbd-hudaact: A color-depth video database for human daily activity recognition. In Consumer Depth Cameras for Computer Vision. 193--208.Google Scholar
- Perronnin, F., Sánchez, J., and Mensink, T. 2010. Improving the fisher kernel for large-scale image classification. In ECCV. 143--156. Google ScholarDigital Library
- Soomro, K., Zamir, A. R., and Shah, M. 2012. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.Google Scholar
- Sung, J., Ponce, C., Selman, B., and Saxena, A. 2011. Human activity detection from rgbd images. AAAI Workshop on Plan, Activity, and Intent Recognition Proceedings 11, 16.Google Scholar
- Vedaldi, A., and Fulkerson, B. 2010. Vlfeat: An open and portable library of computer vision algorithms. In ACM Multimedia, 1469--1472. Google ScholarDigital Library
- Wang, H., and Schmid, C. 2013. Action recognition with improved trajectories. In ICCV. Google ScholarDigital Library
- Wang, H., Kläser, A., Schmid, C., and Liu, C.-L. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. of Comput. Vision 103, 1, 60--79.Google ScholarCross Ref
- Wu, J., Zhang, Y., and Lin, W. 2014. Towards good practices for action video encoding. In CVPR, 2577--2584. Google ScholarDigital Library
- Yuan, J., Liu, Z., and Wu, Y. 2011. Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33, 9 (Sept.), 1728--1743. Google ScholarDigital Library
- Zhang, H., and Parker, L. E. 2011. 4-dimensional local spatio-temporal features for human activity recognition. In IROS, 2044--2049.Google Scholar
Index Terms
- Activity recognition in unconstrained RGB-D video using 3D trajectories
Recommendations
Live RGB-D camera tracking for television production studios
Highlights A novel low-cost tool for camera tracking in broadcasting studio environments. Driftless tracking with keyframes. Real-time performance using a GPU. Allows moving actors in the scene while tracking. Comparison with Kinfu. In this work, a real-...
Video understanding for complex activity recognition
AbstractThis paper presents a real-time video understanding system which automatically recognises activities occurring in environments observed through video surveillance cameras. Our approach consists in three main stages: Scene Tracking, Coherence ...
Group Activity Recognition via Computing Human Pose Motion History and Collective Map from Video
Pattern RecognitionAbstractIn this paper, we propose a deep learning based approach that exploits multi-person pose estimation from an image sequence to predict individual actions as well as the collective activity for a group scene. We first apply multi-person pose ...
Comments