skip to main content
10.1145/3571600.3571612acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Posture Guided Human Action Recognition for Fitness Applications

Published: 12 May 2023 Publication History

Abstract

Human action recognition has attracted a lot of attention in the recent past due to newer applications in computer vision such as fitness tracking, augmented reality and virtual reality. Most of the existing deep learning based methods first deploy a deep neural network to estimate the human pose from a sequence of images followed by a second network to classify the human actions using all the estimated human poses. However, the pose estimation used in these methods typically fail to generalize for non-upright actions such as push-ups, plank, etc since the keypoints are closer to each other than observed in upright postures such as jump, dead-lift, etc. Hence, the accuracy of these methods gets impacted for non-upright actions, typically seen in fitness applications. In this paper, we propose a novel multi-stage deep learning based method for action recognition to predict upright as well as non-upright actions with high accuracy. We use a Light Weight Boundary Refinement Module (LWBRM) during pose estimation to distinguish closer keypoints more effectively. Further, we also introduce an intermediate frame-by-frame posture classification stage after pose estimation. We observed that this intermediate stage enables us to improve the human action recognition accuracy by while improving computational efficiency by ∼ 2 × compared to state-of-the-art methods. Our method can process at 104 frames per second on an android smartphone, and hence can readily be deployed for consumer oriented fitness applications.

References

[1]
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2d human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition. 3686–3693.
[2]
Bruno Artacho and Andreas Savakis. 2020. Unipose: Unified human pose estimation in single images and videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7035–7044.
[3]
Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, and Maja Pantic. 2020. Toward fast and accurate human pose estimation via soft-gated skip connections. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 8–15.
[4]
Yuxin Chen, Ziqi Zhang, Chunfeng Yuan, Bing Li, Ying Deng, and Weiming Hu. 2021. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13359–13368.
[5]
Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. 2020. Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 183–192.
[6]
Haodong Duan, Yue Zhao, Kai Chen, Dian Shao, Dahua Lin, and Bo Dai. 2021. Revisiting Skeleton-based Action Recognition. arXiv preprint arXiv:2104.13586(2021).
[7]
Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, and Dahua Lin. 2020. Omni-sourced webly-supervised learning for video recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer, 670–688.
[8]
Shreyank Gowda, Marcus Rohrbach, and Laura Sevilla-Lara. 2020. SMART Frame Selection for Action Recognition.
[9]
Daniel Groos, Heri Ramampiaro, and Espen AF Ihlen. 2021. EfficientPose: Scalable single-person pose estimation. Applied Intelligence 51, 4 (2021), 2518–2533.
[10]
Rawal Khirodkar, Visesh Chari, Amit Agrawal, and Ambrish Tyagi. 2021. Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3122–3131.
[11]
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3595–3603.
[12]
Yinxiao Li, Zhichao Lu, Xuehan Xiong, and Jonathan Huang. 2022. Perf-net: Pose empowered rgb-flow net. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 513–522.
[13]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
[14]
Huajun Liu, Fuqiang Liu, Xinyi Fan, and Dong Huang. 2021. Polarized self-attention: towards high-quality pixel-wise regression. arXiv preprint arXiv:2107.00782(2021).
[15]
Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 143–152.
[16]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European conference on computer vision. Springer, 483–499.
[17]
Yuya Obinata and Takuma Yamamoto. 2021. Temporal Extension Module for Skeleton-Based Action Recognition. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 534–540.
[18]
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4353–4361.
[19]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
[20]
Dian Shao, Yue Zhao, Bo Dai, and Dahua Lin. 2020. FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21]
Hao Shao, Shengju Qian, and Yu Liu. 2020. Temporal interlacing network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11966–11973.
[22]
Yi-Fan Song, Zhang Zhang, Caifeng Shan, and Liang Wang. 2021. Constructing stronger and faster baselines for skeleton-based action recognition. arXiv preprint arXiv:2106.15125(2021).
[23]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5693–5703.
[24]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6450–6459.
[25]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence.
[26]
Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, and Bolei Zhou. 2020. Temporal pyramid network for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 591–600.
[27]
Sen Yang, Zhibin Quan, Mu Nie, and Wankou Yang. 2021. Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11802–11812.
[28]
Ailing Zeng, Xiao Sun, Lei Yang, Nanxuan Zhao, Minhao Liu, and Qiang Xu. 2021. Learning skeletal graph neural networks for hard 3d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11436–11445.
[29]
Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7093–7102.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing
December 2022
506 pages
ISBN:9781450398220
DOI:10.1145/3571600
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Pose estimation
  2. action recognition
  3. deep learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICVGIP'22

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 93
    Total Downloads
  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media