A Deep Model Combining Structural Features and Context Cues for Action Recognition in Static Images

Wang, Xinxin; Li, Kan; Li, Yang

doi:10.1007/978-3-319-70136-3_66

A Deep Model Combining Structural Features and Context Cues for Action Recognition in Static Images

Xinxin Wang¹⁸,
Kan Li¹⁸ &
Yang Li¹⁸

Conference paper
First Online: 26 October 2017

3517 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10639))

Abstract

In this paper, we present a deep model for the task of action recognition in static images, which combines body structural information and context cues to build a more accurate classifier. Moreover, to construct more semantic and robust body structural features, we propose a new body descriptor, named limb angle discriptor(LAD), which uses the relative angles between the limbs in 2D skeleton. We evaluate our method on the PASCAL VOC 2012 Action dataset and compare it with the published results. The result shows that our method achieves 90.6% mean AP, outperforming the previous state-of-art approaches in the field.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We refer to part pairs as limbs for clarity, despite the fact that some pairs are not human limbs (e.g., the torso).

References

Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations, pp. 1365–1372 (2010)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Ellis, C., Masood, S.Z., Tappen, M.F., Laviola, J.J., Sukthankar, R.: Exploring the trade-off between accuracy and observational latency in action recognition. Int. J. Comput. Vis. 101(3), 420–436 (2013)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. arXiv preprint arXiv:1704.07333 (2017)
Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts, pp. 2470–2478 (2015)
Google Scholar
Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with r*cnn, pp. 1080–1088 (2015)
Google Scholar
Hoai, M., Ladicky, L., Zisserman, A.: Action recognition from weak alignment of body parts (2014)
Google Scholar
Hoai12, M.: Regularized max pooling for image categorization (2014)
Google Scholar
Hussein, M.E., Torki, M., Gowayyed, M.A., Elsaban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, pp. 2466–2472 (2013)
Google Scholar
Kerola, T., Inoue, N., Shinoda, K.: Spectral graph skeletons for 3D action recognition, pp. 417–432 (2014)
Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points, pp. 9–14 (2010)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance, pp. 3177–3184 (2011)
Google Scholar
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724 (2014)
Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2012)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using nave-bayes-nearest-neighbor, pp. 14–19 (2012)
Google Scholar
Ziaeefard, M., Bergevin, R.: Semantic human activity recognition: A literature review. Pattern Recogn. 48(8), 2329–2345 (2015)
Article Google Scholar

Download references

Acknowlwdgements

The work is partly supported by Beijing Natual Science Foundation (4172054).

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Xinxin Wang, Kan Li & Yang Li

Authors

Xinxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kan Li .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Li, K., Li, Y. (2017). A Deep Model Combining Structural Features and Context Cues for Action Recognition in Static Images. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10639. Springer, Cham. https://doi.org/10.1007/978-3-319-70136-3_66

Download citation

DOI: https://doi.org/10.1007/978-3-319-70136-3_66
Published: 26 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70135-6
Online ISBN: 978-3-319-70136-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics