research-article

Posture Guided Human Action Recognition for Fitness Applications

Authors:

Jayaprakash Akula,

B H Pawan Prasad,

Green RoshAuthors Info & Claims

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

Article No.: 11, Pages 1 - 9

https://doi.org/10.1145/3571600.3571612

Published: 12 May 2023 Publication History

Abstract

Human action recognition has attracted a lot of attention in the recent past due to newer applications in computer vision such as fitness tracking, augmented reality and virtual reality. Most of the existing deep learning based methods first deploy a deep neural network to estimate the human pose from a sequence of images followed by a second network to classify the human actions using all the estimated human poses. However, the pose estimation used in these methods typically fail to generalize for non-upright actions such as push-ups, plank, etc since the keypoints are closer to each other than observed in upright postures such as jump, dead-lift, etc. Hence, the accuracy of these methods gets impacted for non-upright actions, typically seen in fitness applications. In this paper, we propose a novel multi-stage deep learning based method for action recognition to predict upright as well as non-upright actions with high accuracy. We use a Light Weight Boundary Refinement Module (LWBRM) during pose estimation to distinguish closer keypoints more effectively. Further, we also introduce an intermediate frame-by-frame posture classification stage after pose estimation. We observed that this intermediate stage enables us to improve the human action recognition accuracy by while improving computational efficiency by ∼ 2 × compared to state-of-the-art methods. Our method can process at 104 frames per second on an android smartphone, and hence can readily be deployed for consumer oriented fitness applications.

References

[1]

Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2d human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition. 3686–3693.

Digital Library

[2]

Bruno Artacho and Andreas Savakis. 2020. Unipose: Unified human pose estimation in single images and videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7035–7044.

[3]

Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, and Maja Pantic. 2020. Toward fast and accurate human pose estimation via soft-gated skip connections. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 8–15.

Digital Library

[4]

Yuxin Chen, Ziqi Zhang, Chunfeng Yuan, Bing Li, Ying Deng, and Weiming Hu. 2021. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13359–13368.

[5]

Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. 2020. Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 183–192.

[6]

Haodong Duan, Yue Zhao, Kai Chen, Dian Shao, Dahua Lin, and Bo Dai. 2021. Revisiting Skeleton-based Action Recognition. arXiv preprint arXiv:2104.13586(2021).

[7]

Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, and Dahua Lin. 2020. Omni-sourced webly-supervised learning for video recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer, 670–688.

[8]

Shreyank Gowda, Marcus Rohrbach, and Laura Sevilla-Lara. 2020. SMART Frame Selection for Action Recognition.

[9]

Daniel Groos, Heri Ramampiaro, and Espen AF Ihlen. 2021. EfficientPose: Scalable single-person pose estimation. Applied Intelligence 51, 4 (2021), 2518–2533.

Digital Library

[10]

Rawal Khirodkar, Visesh Chari, Amit Agrawal, and Ambrish Tyagi. 2021. Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3122–3131.

[11]

Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3595–3603.

[12]

Yinxiao Li, Zhichao Lu, Xuehan Xiong, and Jonathan Huang. 2022. Perf-net: Pose empowered rgb-flow net. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 513–522.

[13]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.

[14]

Huajun Liu, Fuqiang Liu, Xinyi Fan, and Dong Huang. 2021. Polarized self-attention: towards high-quality pixel-wise regression. arXiv preprint arXiv:2107.00782(2021).

[15]

Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 143–152.

Digital Library

[16]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European conference on computer vision. Springer, 483–499.

[17]

Yuya Obinata and Takuma Yamamoto. 2021. Temporal Extension Module for Skeleton-Based Action Recognition. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 534–540.

[18]

Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4353–4361.

[19]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.

[20]

Dian Shao, Yue Zhao, Bo Dai, and Dahua Lin. 2020. FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]

Hao Shao, Shengju Qian, and Yu Liu. 2020. Temporal interlacing network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11966–11973.

[22]

Yi-Fan Song, Zhang Zhang, Caifeng Shan, and Liang Wang. 2021. Constructing stronger and faster baselines for skeleton-based action recognition. arXiv preprint arXiv:2106.15125(2021).

[23]

Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5693–5703.

[24]

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6450–6459.

[25]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence.

[26]

Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, and Bolei Zhou. 2020. Temporal pyramid network for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 591–600.

[27]

Sen Yang, Zhibin Quan, Mu Nie, and Wankou Yang. 2021. Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11802–11812.

[28]

Ailing Zeng, Xiao Sun, Lei Yang, Nanxuan Zhao, Minhao Liu, and Qiang Xu. 2021. Learning skeletal graph neural networks for hard 3d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11436–11445.

[29]

Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7093–7102.

Index Terms

Posture Guided Human Action Recognition for Fitness Applications
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Human pose estimation and its application to action recognition: A survey
Highlights
- We provide a comprehensive survey of recent human pose estimation methods.
- We ...
Abstract
Human pose estimation aims at predicting the poses of human body parts in images or videos. Since pose motions are often driven by some specific human actions, knowing the body pose of a human is critical for action recognition. This ...
A method for action recognition based on pose and interest points

In recent years, action recognition has become a hot research topic in the image processing area. Some studies have shown that based on supervised learning, spatial-temporal interest points which are extracted from videos demonstrate good performance in ...
Action recognition from a distributed representation of pose and appearance
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

We present a distributed representation of pose and appearance of people called the "poselet activation vector". First we show that this representation can be used to estimate the pose of people defined by the 3D orientations of the head and torso in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

December 2022

506 pages

ISBN:9781450398220

DOI:10.1145/3571600

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICVGIP'22

ICVGIP'22: Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

December 8 - 10, 2022

Gandhinagar, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
93
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten