Pose-guided action recognition in static images using lie-group

Mi, Siya; Zhang, Yu

doi:10.1007/s10489-021-02760-1

Pose-guided action recognition in static images using lie-group

Published: 16 September 2021

Volume 52, pages 6760–6768, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

495 Accesses
4 Citations
Explore all metrics

Abstract

Static action recognition in images is challenging, because the image lacks the motion information to characterize the relations between the human and objects. Existing works detect the human with related objects or transfer the motion from videos to images. However the interaction is implicitly depicted. In this paper, we try to solve this problem from a different aspect of view, i.e., to explicitly learn the interactive information from the pose of the human and objects. Humans have different poses in different actions, and the objects in different actions can have different spatial interactions with certain parts of the human. This interaction in poses can be represented by Lie-group naturally. The Lie-group method computes the orientation and distance between the key points or joints, which reveal the relation between humans and objects in different actions. In the experiment, the proposed method shows competitive classification results on several still action image datasets, which advocates the way to recognize still actions by using poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pose-Enhanced Relation Feature for Action Recognition in Still Images

Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Human Action Recognition Using 2DPCA-DMM Representation and GA-SVM in Depth Sequences

References

Maji S, Bourdev L, Malik J (2011) Action recognition from a distributed representation of pose and appearance. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 3177–3184
Hoai M (2014) Regularized max pooling for image categorization. In: Proceedings of British Machine Vision Conference
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 1717–1724
Gupta S, Malik J (2015) Visual semantic role labeling. arXiv:1505.0447
Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with R*CNN. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 1080–1088
Sharma G, Jurie F, Schmid C (2015) Expanded parts model for semantic description of humans in still images. arXiv:1509.04186
Gkioxari G, Girshick R, Malik J (2015) Actions and attributes from wholes and parts. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 2470–2478
Prest A, Schmid C, Ferrari V (2012) Weakly supervised learning of interactions between humans and objects. IEEE Trans Pattern Anal Mach Intell 34(3):601–614
Article Google Scholar
Liu L, Tan R T, You S (2018) Loss guided activation for action recognition in still images. In: Asian Conference on Computer Vision, pp 152–167
Khan F S, van de Weijer J, Anwer R M, Bagdanov A D, Felsberg M, Laaksonen J (2018) Scale coding bag of deep features for human attribute and action recognition. arXiv:1612.04884v2
Yang W, Wang Y, Mori G (2010) Recognizing human actions from still images with latent poses. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 2030–2037
Wang J, Wang G (2016) Hierarchical spatial sum-product networks for action recognition in still images. IEEE Trans Circ Syst Video Technol 28(1):90–100
Article Google Scholar
Gkioxari G, Girshick R, Dollár P, He K (2018) Detecting and recognizing human-object intaractions. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition
Gao R, Xiong B, Grauman K (2018) Im2flow: Motion hallucination from static images for action recognition. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 5937–5947
Delaitre V, Sivic J, Laptev I (2011) Learning person-object interactions for action recognition in still images. In: Proceedings of Advances in Neural Information Processing Systems
Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 1159–1168
Procesi C (2007) Lie groups: An approach through invariants and representations. Springer
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, pp 1137–1149
Thurau C, Hlavac V (2008) Pose primitive based human action recognition in videos or still images. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:1–8
Zhou Y, Ni B, Hong R, Wang M, Tian Q (2015) Interaction part mining: A mid-level approach for fine-grained action recognition. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:3323–3331
Girshick R B, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:580–587
Yan S, Smith J S, Lu W, Zhang B (2018) Multibranch attention networks for action recognition in still images. IEEE Trans Cogn Dev Syst 10(4):1116–1125
Article Google Scholar
Liu X, Zhu X, Li M, Wang L, Zhu E, Liu T, Kloft M, Shen D, Yin J, Gao W (2020) Multiple kernel k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42 (5):1191–1204
Google Scholar
Yu X, Ye X, Gao Q (2020) Infrared handprint image restoration algorithm based on apoptotic mechanism. IEEE Access 8:47334–47343
Article Google Scholar
Zhang L, Song L, Du B, Zhang Y (2021) Nonlocal low-rank tensor completion for visual data. IEEE Trans Cybern 51(2):673–685
Article Google Scholar
He Z, Huang H, Wu Y, Yang X, Zhang W (2021) Consistent scale normalization for object perception. Appl Intell 51:4490–4502
Article Google Scholar
Li Y, Cao G, Yu Q, Li X (2018) Active contours driven by non-local gaussian distribution fitting energy for image segmentation. Appl Intell 48(12):4855–4870
Article Google Scholar
Yang W, Gao Y, Cao L, Yang M, Shi Y (2014) mpadal: a joint local-and-global multi-view feature selection method for activity recognition. Appl Intell 41(3):776–790
Article Google Scholar
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:1653–1660
Tompson J, Jain A, Lecun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. Proc Adv Neural Inf Process Syst:1799–1807
Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. Int Conf Comput Vis:1913–1921
Wei S, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. Comput Vis Pattern Recogn:4724–4732
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. Proc Eur Conf Comput Vision:483– 499
Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:4733–4742
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. Comput Vis Pattern Recogn:5693–5703
Mohamed W, Ben Hamza A (2016) Deformable 3d shape retrieval using a spectral geometric descriptor. Appl Intell 45(2):213–229
Article Google Scholar
Chéron G, Laptev I, Schmid C (2015) P-CNN: Pose-based CNN features for action recognition. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 3218–3226
Ma M, Marturi N, Li Y, Leonardis A, Stolkin R (2018) Region-sequence based six-stream cnn features for general and fine-grained human action recognition in videos. Pattern Recogn 76:506–521
Article Google Scholar
Nie B X, Xiong C, Zhu S (2015) Joint action recognition and pose estimation from video. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:1293–1301
Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: Pose motion representation for action recognition. Proc IEEE Int’l Conf Comput Vis Pattern Recog:7024–7033
Du W, Wang Y, Qiao Y (2017) Rpan: An end-to-end recurrent pose-attention network for action recognition in videos. Proc IEEE Int’l Conf Comput Vis:3745–3754
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 7103–7112
Moreno-Noguer F (2018) 3d human pose estimation from a single image via distance matrix regression. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 1561–1570
Simo-Serra E, Quattoni A, Torras C, Moreno-Noguer F (2013) A joint model for 2d and 3d pose estimation from a single image. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 3634–3641
Ramakrishna V, Kanade T, Sheikh Y (2012) Reconstructing 3d human pose from 2d image landmarks. In: Proceedings of European Conf. Computer Vision
Martinez J, Hossain R, Romero J, Little J J (2017) A simple yet effective baseline for 3d human pose estimation. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 2659–2668
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36 (7):1325–1339
Article Google Scholar
Rad M, Lepetit V (2017) Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 3848–3856
Grabner A, Roth P M, Lepetit V (2018) 3d pose estimation and 3d model retrieval for objects in the wild. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 3022–3031
Tekin B, Sinha S N, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 292–301
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 6517–6525
Lepetit V, Moreno-Noguer F, Fua. P (2009) Epnp: An accurate o(n) solution to the pnp problem. Int J Comput Vis 81(2):155–166
Article Google Scholar
Xu C, Govindarajan L N, Zhang Y, Cheng L (2017) Lie-x: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int J Comput Vis 123(3):454–478
Article MathSciNet Google Scholar
Wang F, Jiang M, Qian C, Yang S, Li C (2017) Residual attention network for image classification. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 6450–6458
Everingham M, Gool L V, Williams C, Winn J, Zisserman A (2012) The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.htm%l
Yao B, Jiang X, Khosla A, Lin AL, Guibas LJ, Fei-Fei L (2011) Human action recognition by learning bases of action attributes and parts. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 1331–1338
Zhang Y, Cheng L, Wu J, Cai J, Do M N, Lu J (2016) Action recognition in still images with minimum annotation efforts. IEEE Trans Image Process 25(11):5479–5490
Article MathSciNet Google Scholar
Safaei M, Foroosh H (2018) A zero-shot architecture for action recognition in still images. In: Proceedings of Int’l Conf. on Image Processing, pp 460–464
Safaei M, Foroosh H (2019) Still image action recognition by predicting spatial-temporal pixel evolution. In: IEEE Winter Conference on Applications of Computer Vision, pp 111– 120
Li L-J, Su H, Lim Y, Cosgriff R, Goodwin D, Fei-Fei L (2011) Object bank: A high-level image representation for scene classification and semantic feature sparsification. In: Proceedings of Advances in Neural Information Processing Systems
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE Int’l Conf. on Com- puter Vision and Pattern Recognition, pp 3360–3367

Download references

Acknowledgments

This work is supported by National Key R&D Program of China (2018AAA0100100), National Natural Science Foundation of China (61702095), and Natural Science Foundation of Jiangsu Province (BK20190341).

Author information

Authors and Affiliations

School of Cyber Science and Engineering, Southeast University, Nanjing, 211189, China
Siya Mi
Purple Mountain Laboratories, Nanjing, China
Siya Mi
School of Computer Science and Engineering, and the Key Lab of Computer Network and Information Integration (Ministry of Education), Southeast University, Nanjing, 211189, China
Yu Zhang

Authors

Siya Mi
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siya Mi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mi, S., Zhang, Y. Pose-guided action recognition in static images using lie-group. Appl Intell 52, 6760–6768 (2022). https://doi.org/10.1007/s10489-021-02760-1

Download citation

Accepted: 12 July 2021
Published: 16 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10489-021-02760-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pose-guided action recognition in static images using lie-group

Abstract

Access this article

Similar content being viewed by others

Pose-Enhanced Relation Feature for Action Recognition in Still Images

Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Human Action Recognition Using 2DPCA-DMM Representation and GA-SVM in Depth Sequences

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pose-guided action recognition in static images using lie-group

Abstract

Access this article

Similar content being viewed by others

Pose-Enhanced Relation Feature for Action Recognition in Still Images

Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Human Action Recognition Using 2DPCA-DMM Representation and GA-SVM in Depth Sequences

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation