Learning a discriminative mid-level feature for action recognition

Liu, CuiWei; Pei, MingTao; Wu, XinXiao; Kong, Yu; Jia, YunDe

doi:10.1007/s11432-013-4938-y

Learning a discriminative mid-level feature for action recognition

Research Paper
Published: 10 October 2013

Volume 57, pages 1–13, (2014)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

CuiWei Liu¹,
MingTao Pei¹,
XinXiao Wu¹,
Yu Kong¹ &
…
YunDe Jia¹

274 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we address the problem of recognizing human actions from videos. Most of the existing approaches employ low-level features (e.g., local features and global features) to represent an action video. However, algorithms based on low-level features are not robust to complex environments such as cluttered background, camera movement and illumination change. Therefore, we propose a novel random forest learning framework to construct a discriminative and informative mid-level feature from low-level features of densely sampled 3D cuboids. Each cuboid is classified by the corresponding random forests with a novel fusion scheme, and the cuboid’s posterior probabilities of all categories are normalized to generate a histogram. After that, we obtain our mid-level feature by concatenating histograms of all the cuboids. Since a single low-level feature is not enough to capture the variations of human actions, multiple complementary low-level features (i.e., optical flow and histogram of gradient 3D features) are employed to describe 3D cuboids. Moreover, temporal context between local cuboids is exploited as another type of low-level feature. The above three low-level features (i.e., optical flow, histogram of gradient 3D features and temporal context) are effectively fused in the proposed learning framework. Finally, the mid-level feature is employed by a random forest classifier for robust action recognition. Experiments on the Weizmann, UCF sports, Ballet, and multi-view IXMAS datasets demonstrate that out mid-level feature learned from multiple low-level features can achieve a superior performance over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Efros A A, Berg A C, Mori G, et al. Recognizing action at a distance. In: Proceedings of 9th IEEE Conference on Computer Vision (ICCV), Nice, 2003. 726–733
Chapter Google Scholar
Thurau C, Hlavac V. Pose primitive based human action recognition in videos or still images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, 2005. 886–893
Google Scholar
Laptev I, Marszalek M, Schmid C, et al. Learning realistic human actions from movies. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Google Scholar
Klaser A, Marszalek M, Schmid C. A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Vision Conference (BMVC), Leeds, 2008. 1–10
Google Scholar
Wang H, Ullah M M, Klaser A, et al. Evaluation of local spatio-temporal features for action recognition. In: Proceedings of the British Machine Vision Conference (BMVC), London, 2009. 1–11
Google Scholar
Wu X X, Xu D, Duan L X, et al. Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 489–496
Google Scholar
Liu J G, Ali S, Shah M. Recognizing human actions using multiple features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Google Scholar
Wang Y, Mori G. Max-margin hidden conditional random fields for human action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, 2009. 872–879
Google Scholar
Han L, Wu X X, Liang W, et al. Discriminative human action recognition in the learned hierarchical manifold space. Image Vis Comput, 2010, 28: 836–849
Article Google Scholar
Fathi A, Mori G. Action recognition by learning mid-level motion features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Google Scholar
Niebles J C, Li F F. A hierarchical model of shape and appearance for human action classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, 2007. 1–8
Google Scholar
Kong Y, Zhang X Q, Hu W M, et al. Adaptive learning codebook for action recognition. Pattern Recogn Lett, 2011, 32: 1178–1186
Article Google Scholar
Lu Z W, Peng Y X, Ip H H S. Spectral learning of latent semantics for action recognition. In: Proceedings of IEEE Conference on Computer Vision (ICCV), Barcelona, 2011. 1503–1510
Google Scholar
Wang Y, Mori G. Hidden part models for human action recognition: probabilistic versus max-margin. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 1310–1323
Article Google Scholar
Niebles J C, Chen C W, Li F F. Modeling temporal structure of decomposable motion segments for activity classification. In: Proceedings of the 11th European Conference on Computer Vision (ECCV), Heraklion, 2010. 392–405
Google Scholar
Raptis M, Kokkinos I, Soatto S. Discovering discriminative action parts from mid-level video representations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2012. 1242–1249
Google Scholar
Liu J G, Kuipers B, Savarese S. Recognizing human actions by attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 3337–3344
Google Scholar
Bosch A, Zisserman A, Muoz X. Image classification using random forests and ferns. In: Proceedings of IEEE Conference on Computer Vision (ICCV), Rio de Janeiro, 2007. 1–8
Google Scholar
Yu G, Yuan J S, Liu Z C. Unsupervised random forest indexing for fast action search. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 865–872
Google Scholar
Shotton J, Fitzgibbon A, Cook M, et al. Real-time human pose recognition in parts from single depth images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, 2011. 116–124
Google Scholar
Breiman L. Random forests. Mach Learn, 2001, 45: 5–32
Article MATH Google Scholar
Lepetit V, Fua P. Keypoint recognition using randomized trees. IEEE Trans Pattern Anal Mach Intell, 2006, 28: 1465–1479
Article Google Scholar
Breiman L. Randomizing outputs to increase prediction accuracy. Mach Learn, 2000, 40: 229–242
Article MATH Google Scholar
Blank M, Gorelick L, Shechtman E, et al. Actions as space-time shapes. In: Proceedings of 10th IEEE Conference on Computer Vision (ICCV), Beijing, 2005. 1395–1402
Google Scholar
Rodriguez M D, Ahmed J, Shah M. Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Google Scholar
Weinland D, Boyer E, Ronfard R. Action recognition from arbitrary views using 3D exemplars. In: Proceedings of IEEE Conference on Computer Vision (ICCV), Rio de Janeiro, 2007. 1–7
Google Scholar
Wu X X, Jia Y D, Liang W. Incremental discriminant-analysis of canonical correlations for action recognition. Pattern Recogn, 2010, 43: 4190–4197
Article MATH Google Scholar
Yao A, Gall J, Gool L V. A hough transform-based voting framework for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010. 2061–2068
Google Scholar
Wang H, Klaser A, Schmid C, et al. Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 3169–3176
Google Scholar
Kovashka A, Grauman K. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010. 2046–2053
Google Scholar
Junejo I N, Dexter E, Laptev I, et al. Cross-view action recognition from temporal self-similarities. In: Proceedings of the 10th European Conference on Computer Vision (ECCV), Mardi, 2008. 1–19
Google Scholar
Liu J G, Shah M, Kuipers B, et al. Cross-view action recognition via view knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 3209–3216
Google Scholar
Weinland D, Ozuysal M, Fua P. Making action recognition robust to occlusions and viewpoint changes. In: Proceedings of the 11th European Conference on Computer Vision (ECCV), Heraklion, 2010. 635–648
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, 100081, China
CuiWei Liu, MingTao Pei, XinXiao Wu, Yu Kong & YunDe Jia

Authors

CuiWei Liu
View author publications
You can also search for this author in PubMed Google Scholar
MingTao Pei
View author publications
You can also search for this author in PubMed Google Scholar
XinXiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Kong
View author publications
You can also search for this author in PubMed Google Scholar
YunDe Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to MingTao Pei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Pei, M., Wu, X. et al. Learning a discriminative mid-level feature for action recognition. Sci. China Inf. Sci. 57, 1–13 (2014). https://doi.org/10.1007/s11432-013-4938-y

Download citation

Received: 27 July 2013
Accepted: 21 September 2013
Published: 10 October 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11432-013-4938-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning a discriminative mid-level feature for action recognition

Abstract

Access this article

Similar content being viewed by others

Action Recognition with HOG-OF Features

A Decision Forest Based Feature Selection Framework for Action Recognition from RGB-Depth Cameras

An Accurate Random Forest-Based Action Recognition Technique Using only Velocity and Landmarks’ Distances

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning a discriminative mid-level feature for action recognition

Abstract

Access this article

Similar content being viewed by others

Action Recognition with HOG-OF Features

A Decision Forest Based Feature Selection Framework for Action Recognition from RGB-Depth Cameras

An Accurate Random Forest-Based Action Recognition Technique Using only Velocity and Landmarks’ Distances

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation