Abstract
In this paper, we present a deep model for the task of action recognition in static images, which combines body structural information and context cues to build a more accurate classifier. Moreover, to construct more semantic and robust body structural features, we propose a new body descriptor, named limb angle discriptor(LAD), which uses the relative angles between the limbs in 2D skeleton. We evaluate our method on the PASCAL VOC 2012 Action dataset and compare it with the published results. The result shows that our method achieves 90.6% mean AP, outperforming the previous state-of-art approaches in the field.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We refer to part pairs as limbs for clarity, despite the fact that some pairs are not human limbs (e.g., the torso).
References
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations, pp. 1365–1372 (2010)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Ellis, C., Masood, S.Z., Tappen, M.F., Laviola, J.J., Sukthankar, R.: Exploring the trade-off between accuracy and observational latency in action recognition. Int. J. Comput. Vis. 101(3), 420–436 (2013)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. arXiv preprint arXiv:1704.07333 (2017)
Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts, pp. 2470–2478 (2015)
Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with r*cnn, pp. 1080–1088 (2015)
Hoai, M., Ladicky, L., Zisserman, A.: Action recognition from weak alignment of body parts (2014)
Hoai12, M.: Regularized max pooling for image categorization (2014)
Hussein, M.E., Torki, M., Gowayyed, M.A., Elsaban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, pp. 2466–2472 (2013)
Kerola, T., Inoue, N., Shinoda, K.: Spectral graph skeletons for 3D action recognition, pp. 417–432 (2014)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points, pp. 9–14 (2010)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance, pp. 3177–3184 (2011)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724 (2014)
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using nave-bayes-nearest-neighbor, pp. 14–19 (2012)
Ziaeefard, M., Bergevin, R.: Semantic human activity recognition: A literature review. Pattern Recogn. 48(8), 2329–2345 (2015)
Acknowlwdgements
The work is partly supported by Beijing Natual Science Foundation (4172054).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, X., Li, K., Li, Y. (2017). A Deep Model Combining Structural Features and Context Cues for Action Recognition in Static Images. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10639. Springer, Cham. https://doi.org/10.1007/978-3-319-70136-3_66
Download citation
DOI: https://doi.org/10.1007/978-3-319-70136-3_66
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70135-6
Online ISBN: 978-3-319-70136-3
eBook Packages: Computer ScienceComputer Science (R0)