Skip to main content
Log in

HQLC-Overlap: an adaptive low-cost binocular 3D human pose estimation model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In single-view 3D human pose modeling and analysis, there are always many hard problems of occlusion and blind spots that cannot be completely solved by single-view. Additionally, the multi-view training and complex view fusion greatly increase the training and application cost of the multi-view model. Therefore, we implemented a novel model based on dynamic binocular 3D pose overlap. It filters the views by a view filtering method to get the two best pose observation views. Then, it uses these two views to simulate the process of high-precision 3D collaborative imaging of an object by the human eye. Compared with most current single-view or multi-view models, HQLC-Overlap not only combines the advantages of the single-view model based on the high-quality view attention mechanism, but also solves the inherent problems of the single-view model through the binocular estimation mode. In this article, based on these filtered views in the data, we also counted and visualized the model’s estimation error of a large number of pose images and corrected them. The principle of HQLC-Overlap model shows that it has the advantages of fast, low computational cost and dynamic flexibility for multiple views. In the experiment, we used two large-scale human pose datasets and completed the ablation experiment of this model and the comparison experiment with other models. The experimental results show that it greatly improves the 3D pose estimation accuracy of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: Bmvc, vol 1

  2. Burenius M, Sullivan J, Carlsson S (2013) 3d pictorial structures for multiple view articulated pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3618–3625

  3. Chen J, Ying H, Liu X, Gu J, Feng R, Chen T, Gao H, Wu J (2020) A transfer learning based super-resolution microscopy for biopsy slice images: the joint methods perspective. IEEE/ACM Trans Comput Biol Bioinform 18 (1):103–113

    Google Scholar 

  4. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078

  5. Gao H, Xu K, Cao M, Xiao J, Xu Q, Yin Y (2021) The deep features and attention mechanism-based method to dish Healthcare under social IoT systems: an empirical study with a hand-deep local-global net. IEEE Trans Comput Soc Syst(TCSS) 9(1):336–347

    Article  Google Scholar 

  6. Gholami M, Rezaei A, Rhodin H, Ward R, Wang ZJ (2022) Self-supervised 3D human pose estimation from video. Neurocomputing 488(1):97–106

    Article  Google Scholar 

  7. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp 2672–2680

  8. Guler RA, Kokkinos I (2019) Holopose Holistic 3d human reconstruction in-the-wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10884–10894

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  10. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645

  11. Ionescu C, Li F, Sminchisescu C (2011) Latent structured models for human pose estimation. In: 2011 International conference on computer vision. IEEE, pp 2220–2227

  12. Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339

    Article  Google Scholar 

  13. Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7122–7131

  14. Kanazawa A, Zhang J Y, Felsen P, Malik J (2019) Learning 3d human dynamics from video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5614–5623

  15. Kocabas M, Athanasiou N, Black M.J (2020) Vibe: Video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5253–5263

  16. Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1077–1086

  17. Kolotouros N, Pavlakos G, Black MJ, Daniilidis K (2019) Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2252–2261

  18. Li Z, Oskarsson M, Heyden A (2021) 3d Human pose and shape estimation through collaborative learning and multi-view model-fitting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1888–1897

  19. Li Z, Yu T, Zheng Z, Guo K, Liu Y (2021) Posefusion: Pose-guided selective fusion for single-view human volumetric capture. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14162–14172

  20. Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: A skinned multi-person linear model. ACM Trans Graph 34(6):1–16

    Article  Google Scholar 

  21. Ma X, Su J, Wang C, Ci H, Wang Y (2021) Context modeling in 3d human pose estimation: A unified perspective. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6238–6247

  22. Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 International conference on 3D vision (3DV), pp 506–516. IEEE

  23. Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel H-P, Xu W, Casas D, Theobalt C (2017) Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Trans Graph 36(4):1–14

    Article  Google Scholar 

  24. Omran M, Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In: 2018 International conference on 3D Vision (3DV), pp 484–494.IEEE

  25. Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7025–7034

  26. Pavlakos G, Zhu L, Zhou X, Daniilidis K (2018) Learning to estimate 3d human pose and shape from a single color image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 459–468

  27. Qiu H, Wang C, Wang J, Wang N, Zeng W (2019) Cross view fusion for 3d human pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4342–4351

  28. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  30. Sun Y, Ye Y, Liu W, Gao W, Fu Y, Mei T (2019) Human mesh recovery from monocular images via a skeleton-disentangled representation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5349–5358

  31. Wang H, Sun M (2022) Smart-VPoseNet 3D human pose estimation models and methods based on multi-view discriminant network. Knowl Based Syst 239: 107992

  32. Wang H, Sun MH, Zhang H, Dong LY (2022) LHPE-nets: A lightweight 2D and 3D human pose estimation model with well-structural deep networks and multi-view pose sample simplification method. Plos One 17(2):e0264302

    Article  Google Scholar 

  33. Wang C, Wang Y, Lin Z, Yuille AL (2018) Robust 3d human pose estimation from single images or video sequences. IEEE Trans Pattern Anal Mach Intell 41(5):1227–1241

    Article  Google Scholar 

  34. Xiao J, Xu H, Gao H, Bian M, Li Y (2021) A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective. ACM Trans Multimed Comput Commun Appl(TOMM) 17 (1s):1–19

    Article  Google Scholar 

  35. Zhang F, Zhu X, Ye M (2019) Fast human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3517–3526

Download references

Acknowledgements

This study has been partially supported by National Natural Science Foundation of China (61872164), Program of Science and Technology Development Plan of Jilin Province of China (20220201147GX) and Fundamental Research Funds for the Central Universities (2022-JCXK-02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minghui Sun.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Sun, M. HQLC-Overlap: an adaptive low-cost binocular 3D human pose estimation model. Multimed Tools Appl 82, 17159–17173 (2023). https://doi.org/10.1007/s11042-022-14156-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14156-5

Keywords

Navigation