Skip to main content
Log in

Towards improvement of baseline performance for regression based human pose estimation

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

A challenging problem for robotic interaction and augmented reality is the estimation and tracking of human poses in images and videos. Pose estimation using deep neural networks has shown encouraging results in recent approaches. The environmental sensitivity and computational complexity of conventional pose estimation methods are major drawbacks. In light of these issues, this paper proposes a novel approach that uses DenseNet and CNN-based transfer learning to learn by explicitly exploiting the skeletal data. Other imageNet pre-trained models along with probabilistic and regression losses are used for comparative study. A widely accepted benchmark pose estimation dataset, FLIC (Frames Labelled in Cinema) serves as the basis for our evaluation and comparison. As a result of our experiments with an \(R^2\) score of 0.948, we recommend probabilistic loss over regression loss as the new baseline for future downstream tasks and fine-tuning-based transfer learning techniques for pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  • Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) Posetrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5167–5176

  • Ash A, Shwartz M (1999) R2: a useful measure of model performance when predicting a dichotomous outcome. Stat Med 18(4):375–384

    Article  Google Scholar 

  • Bansal Keshav, Gupta Abhishek Kumar, Rai Sushant, Bansal Bajrang (2020) Pose estimation on 3-d models using convnets. In 2020 6th International Conference on Signal Processing and Communication (ICSC), pages 58–63. IEEE

  • Cao Zhe, Simon Tomas, Wei Shih-En, Sheikh Yaser (2017) Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7291–7299

  • Carreira Joao, Agrawal Pulkit, Fragkiadaki Katerina, Malik Jitendra (2016) Human pose estimation with iterative error feedback. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4733–4742

  • Chen Yilun, Wang Zhicheng, Peng Yuxiang, Zhang Zhiqiang, Yu Gang, Sun Jian (2018) Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7103–7112

  • Chen Xianjie, Yuille Alan L (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. Advances in neural information processing systems, 27

  • Cheng Bowen, Xiao Bin, Wang Jingdong, Shi Honghui, Huang Thomas S, Zhang Lei (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5386–5395

  • Chollet François (2017) Xception: Deep learning with depthwise separable convolutions

  • Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, Fei-Fei Li (2009) Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255

  • Farhadi Ali, Redmon Joseph (2018) Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition, pages 1804–2767. Springer Berlin/Heidelberg, Germany,

  • Firdaus NM, Rakun E (2019) Recognizing fingerspelling in sibi (sistem isyarat bahasa indonesia) using openpose and elliptical fourier descriptor. In: Proceedings of the international conference on advanced information science and system, pages 1–6

  • Gavrilyuk K, Sanford R, Javan M, Snoek Cees GM (2020) Actor-transformers for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 839–848

  • Geng Zigang, Sun Ke, Xiao Bin, Zhang Zhaoxiang, Wang Jingdong (2021) Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14676–14686

  • Graving Jacob M, Chae Daniel, Naik Hemal, Li Liang, Koger Benjamin, Costelloe Blair R, Couzin Iain D (2019) Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. Elife, 8:e47994

  • He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian (2015) Deep residual learning for image recognition

  • Howard Andrew G, Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, Adam Hartwig (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  • Huang Gao, Liu Zhuang, Der Maaten Laurens Van, Weinberger Kilian Q (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708

  • Huang Wei-Lun, Hung Chun-Yi, Lin I-Chen (2021)Confidence-based 6d object pose estimation. IEEE Transactions on Multimedia

  • Huang Gao, Liu Zhuang, Maaten Laurens van der, Weinberger Kilian Q (2018) Densely connected convolutional networks

  • Karpathy Andrej, et al (2016) Cs231n convolutional neural networks for visual recognition. Neural networks, 1(1)

  • Ke Lipeng, Chang Ming-Ching, Qi Honggang, Lyu Siwei (2018) Multi-scale structure-aware network for human pose estimation. In Proceedings of the european conference on computer vision (ECCV), pages 713–728

  • Khirodkar Rawal, Chari Visesh, Agrawal Amit, Tyagi Ambrish (2021) Multi-hypothesis pose networks: Rethinking top-down pose estimation. arXiv preprint arXiv:2101.11223

  • Li Z, Ye J, Song M, Huang Y, Pan Z (2021) Online knowledge distillation for efficient pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11740–11750

  • Li Jiefeng, Wang Can, Zhu Hao, Mao Yihuan, Fang Hao-Shu, Lu Cewu (2019) Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10863–10872

  • Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, Zitnick C Lawrence (2014) Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer

  • Ma Ningning, Zhang Xiangyu, Zheng Hai-Tao, Sun Jian (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pages 116–131

  • McNally W, Vats K, Wong A, McPhee J (2021) Evopose2d: pushing the boundaries of 2d human pose estimation using accelerated neuroevolution with weight transfer. IEEE Access 9:139403–139414

    Article  Google Scholar 

  • McNally W, Wong A, McPhee J (2018) Action recognition using deep convolutional neural networks and compressed spatio-temporal pose encodings. J Comput Vis Imag Syst 4(1):3–3

    Google Scholar 

  • McNally W, Walters P, Vats K, Wong A, McPhee J (2021) Deepdarts: Modeling keypoints as objects for automatic scorekeeping in darts using a single camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4547–4556

  • Nakai M, Tsunoda Y, Hayashi H, Murakoshi H (2018) Prediction of basketball free throw shooting by openpose. In: JSAI International symposium on artificial intelligence, pages 435–446. Springer

  • Neff Christopher, Sheth Aneri, Furgurson Steven, Tabkhi Hamed (2020) Efficienthrnet: Efficient scaling for lightweight high-resolution multi-person pose estimation. arXiv preprint arXiv:2007.08090

  • Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, pages 483–499. Springer

  • Palossi Daniele, Zimmerman Nicky, Burrello Alessio, Conti Francesco, Müller Hanna, Gambardella Luca Maria, Benini Luca, Giusti Alessandro, Guzzi Jérôme (2021) Fully onboard ai-powered human-drone pose estimation on ultralow-power autonomous flying nano-uavs. IEEE Internet of Things Journal, 9(3):1913–1929

  • Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7753–7762

  • Pham D-M (2018) Human identification using neural network-based classification of periodic behaviors in virtual reality. In: 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 657–658. IEEE

  • Pleiss Geoff, Chen Danlu, Huang Gao, Li Tongcheng, Maaten Laurens van der, Weinberger Kilian Q (2017) Memory-efficient implementation of densenets. arXiv preprint arXiv:1707.06990

  • Raaj Y, Idrees H, Hidalgo G, Sheikh Y (2019) Efficient online multi-person 2d pose tracking with recurrent spatio-temporal affinity fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4620–4628

  • Rafi U, Leibe B, Gall J, Kostrikov I (2016) An efficient convolutional network for human pose estimation. In: BMVC, volume 1, page 2

  • Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  • Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, Chen Liang-Chieh (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520

  • Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, Chen Liang-Chieh (2019) Mobilenetv2: Inverted residuals and linear bottlenecks

  • Sapp B, Taskar B (2013) Modec: multimodal decomposable models for human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3674–3681

  • Sun Ke, Xiao Bin, Liu Dong, Wang Jingdong (2019) Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5693–5703

  • Sun Xiao, Shang Jiaxiang, Liang Shuang, Wei Yichen (2017) Compositional human pose regression. In Proceedings of the IEEE International Conference on Computer Vision, pages 2602–2611

  • Sun Ke, Li Mingjie, Liu Dong, Wang Jingdong (2018) Igcv3: Interleaved low-rank group convolutions for efficient deep neural networks. arXiv preprint arXiv:1806.00178

  • Tan Mingxing, Le Quoc V (2019) Mixconv: Mixed depthwise convolutional kernels. arXiv preprint arXiv:1907.09595

  • Tan Mingxing, Le Quoc V (2020) Efficientnet: Rethinking model scaling for convolutional neural networks

  • Tang Wei, Yu Pei, Wu Ying (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pages 190–206

  • Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 648–656

  • Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1653–1660

  • Voeikov R, Falaleev N, Baikulov R (2020) Ttnet: Real-time temporal and spatial video analysis of table tennis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 884–885

  • Wei Shih-En, Ramakrishna Varun, Kanade Takeo, Sheikh Yaser (2016) Convolutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 4724–4732

  • Xiao Bin, Wu Haiping, Wei Yichen (2018) Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV), pages 466–481

  • Xie Guotian, Wang Jingdong, Zhang Ting, Lai Jianhuang, Hong Richang, Qi Guo-Ju (2018) Interleaved structured sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8847–8856

  • Yang Y, Ramanan D (2012) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890

    Article  Google Scholar 

  • Yang Wei, Li Shuang, Ouyang Wanli, Li Hongsheng, Wang Xiaogang (2017) Learning feature pyramids for human pose estimation. In proceedings of the IEEE international conference on computer vision, pages 1281–1290

  • Yosinski Jason, Clune Jeff, Bengio Yoshua, Lipson Hod (2014) How transferable are features in deep neural networks? arXiv preprint arXiv:1411.1792

  • Yu Changqian, Xiao Bin, Gao Changxin, Yuan Lu, Zhang Lei, Sang Nong, Wang Jingdong (2021) Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10440–10450

  • Zhang J, Zhang J (2018) An analysis of cnn feature extractor based on kl divergence. International Journal of Image and Graphics 18(03):1850017

    Article  Google Scholar 

  • Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, Sun Jian (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6848–6856

  • Zhang Ting, Qi Guo-Jun, Xiao Bin, Wang Jingdong (2017) Interleaved group convolutions. In Proceedings of the IEEE international conference on computer vision, pages 4373–4382

  • Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pranjal Kumar.

Ethics declarations

Conflict of interest

No conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, P., Chauhan, S. Towards improvement of baseline performance for regression based human pose estimation. Evolving Systems 15, 659–667 (2024). https://doi.org/10.1007/s12530-023-09508-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-023-09508-x

Keywords

Navigation