ABSTRACT
Recently, human activity recognition has obtained increasing attention due to its wide range of potential applications. Much progress has been made to improve the performance on single actions in videos while few on collective and interactive activities. Human interaction is a more challenging task owing to multi-actors in an execution. In this paper, we utilize multi-scale dense trajectories and explore four advanced feature encoding methods on the human interaction dataset with a bag-of-features framework. Particularly, dense trajectories are described by shape, histogram of gradient orientation, histogram of flow orientation and motion boundary histogram, and all these are computed by integral images. Experimental results on the UT-Interaction dataset show that our approach outperforms state-of-the-art methods by 7-14%. Additionally, we thoroughly analyse a finding that the performance of vector quantization is on par with or even better than other sophisticated feature encoding methods by using dense trajectories in videos.
- M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, volume 2, pages 1395--1402, 2005. Google ScholarDigital Library
- Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In CVPR, pages 2559--2566, 2010.Google ScholarCross Ref
- W. Brendel and S. Todorovic. Learning spatiotemporal graphs of human activities. In ICCV, pages 778--785, 2011. Google ScholarDigital Library
- N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. ECCV, pages 428--441, 2006. Google ScholarDigital Library
- P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pages 65--72, 2005.Google ScholarCross Ref
- A. Klaser, M. Marszalek, C. Schmid, et al. A spatio-temporal descriptor based on 3d-gradients. In BMVC, 2008.Google ScholarCross Ref
- H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. Hmdb: A large video database for human motion recognition. In ICCV, pages 2556--2563, 2011. Google ScholarDigital Library
- I. Laptev. On space-time interest points. IJCV, 64(2):107--123, 2005. Google ScholarDigital Library
- I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
- H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparse coding algorithms. In NIPS, volume 19, pages 801--808, 2007.Google Scholar
- L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In ICCV, pages 2486--2493, 2011. Google ScholarDigital Library
- M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In CVPR, pages 2929--2936, 2009.Google ScholarCross Ref
- M. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In ICCV, pages 1036--1043, 2011. Google ScholarDigital Library
- M. Ryoo, C.-C. Chen, J. Aggarwal, and A. Roy-Chowdhury. An overview of contest on semantic description of human activities (sdha) 2010. Recognizing Patterns in Signals, Speech, Images and Videos, pages 270--285, 2010. Google ScholarDigital Library
- C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR, volume 3, pages 32--36, 2004. Google ScholarDigital Library
- P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In MM, pages 357--360. ACM, 2007. Google ScholarDigital Library
- D. Waltisberg, A. Yao, J. Gall, and L. Van Gool. Variations of a hough-voting action recognition system. Recognizing Patterns in Signals, Speech, Images and Videos, pages 306--312, 2010. Google ScholarDigital Library
- H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, pages 3169--3176, 2011. Google ScholarDigital Library
- H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. IJCV, Mar. 2013.Google ScholarCross Ref
- J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, pages 3360--3367, 2010.Google ScholarCross Ref
- G. Willems, T. Tuytelaars, and L. Van Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. ECCV, pages 650--663, 2008. Google ScholarDigital Library
- J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, pages 1794--1801, 2009.Google Scholar
- L. Yeffet and L. Wolf. Local trinary patterns for human action recognition. In ICCV, pages 492--497, 2009.Google ScholarCross Ref
Index Terms
- Exploring dense trajectory feature and encoding methods for human interaction recognition
Recommendations
Weighted feature trajectories and concatenated bag-of-features for action recognition
Key-point trajectory based approaches to recognizing human actions in realistic videos have recently shown promising results. However, their coverage of the entire actor is not sufficient for describing human actions, and the trajectories often ...
Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm
AbstractThe local spatio-temporal descriptor and feature encoding algorithm are two crucial key steps for human action recognition based on spatio-temporal interest points (STIP). Since the local descriptors for STIP are essentially a type of ...
Random interest regions for object recognition based on texture descriptors and bag of features
In this work we propose a novel method for object recognition based on a random selection of interest regions, texture features (local binary/ternary patterns and local phase quantization) for describing each region, a bag-of-features approach for ...
Comments