Abstract
In single-view 3D human pose modeling and analysis, there are always many hard problems of occlusion and blind spots that cannot be completely solved by single-view. Additionally, the multi-view training and complex view fusion greatly increase the training and application cost of the multi-view model. Therefore, we implemented a novel model based on dynamic binocular 3D pose overlap. It filters the views by a view filtering method to get the two best pose observation views. Then, it uses these two views to simulate the process of high-precision 3D collaborative imaging of an object by the human eye. Compared with most current single-view or multi-view models, HQLC-Overlap not only combines the advantages of the single-view model based on the high-quality view attention mechanism, but also solves the inherent problems of the single-view model through the binocular estimation mode. In this article, based on these filtered views in the data, we also counted and visualized the model’s estimation error of a large number of pose images and corrected them. The principle of HQLC-Overlap model shows that it has the advantages of fast, low computational cost and dynamic flexibility for multiple views. In the experiment, we used two large-scale human pose datasets and completed the ablation experiment of this model and the comparison experiment with other models. The experimental results show that it greatly improves the 3D pose estimation accuracy of the model.
Similar content being viewed by others
References
Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: Bmvc, vol 1
Burenius M, Sullivan J, Carlsson S (2013) 3d pictorial structures for multiple view articulated pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3618–3625
Chen J, Ying H, Liu X, Gu J, Feng R, Chen T, Gao H, Wu J (2020) A transfer learning based super-resolution microscopy for biopsy slice images: the joint methods perspective. IEEE/ACM Trans Comput Biol Bioinform 18 (1):103–113
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078
Gao H, Xu K, Cao M, Xiao J, Xu Q, Yin Y (2021) The deep features and attention mechanism-based method to dish Healthcare under social IoT systems: an empirical study with a hand-deep local-global net. IEEE Trans Comput Soc Syst(TCSS) 9(1):336–347
Gholami M, Rezaei A, Rhodin H, Ward R, Wang ZJ (2022) Self-supervised 3D human pose estimation from video. Neurocomputing 488(1):97–106
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp 2672–2680
Guler RA, Kokkinos I (2019) Holopose Holistic 3d human reconstruction in-the-wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10884–10894
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
Ionescu C, Li F, Sminchisescu C (2011) Latent structured models for human pose estimation. In: 2011 International conference on computer vision. IEEE, pp 2220–2227
Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7122–7131
Kanazawa A, Zhang J Y, Felsen P, Malik J (2019) Learning 3d human dynamics from video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5614–5623
Kocabas M, Athanasiou N, Black M.J (2020) Vibe: Video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5253–5263
Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1077–1086
Kolotouros N, Pavlakos G, Black MJ, Daniilidis K (2019) Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2252–2261
Li Z, Oskarsson M, Heyden A (2021) 3d Human pose and shape estimation through collaborative learning and multi-view model-fitting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1888–1897
Li Z, Yu T, Zheng Z, Guo K, Liu Y (2021) Posefusion: Pose-guided selective fusion for single-view human volumetric capture. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14162–14172
Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: A skinned multi-person linear model. ACM Trans Graph 34(6):1–16
Ma X, Su J, Wang C, Ci H, Wang Y (2021) Context modeling in 3d human pose estimation: A unified perspective. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6238–6247
Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 International conference on 3D vision (3DV), pp 506–516. IEEE
Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel H-P, Xu W, Casas D, Theobalt C (2017) Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Trans Graph 36(4):1–14
Omran M, Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In: 2018 International conference on 3D Vision (3DV), pp 484–494.IEEE
Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7025–7034
Pavlakos G, Zhu L, Zhou X, Daniilidis K (2018) Learning to estimate 3d human pose and shape from a single color image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 459–468
Qiu H, Wang C, Wang J, Wang N, Zeng W (2019) Cross view fusion for 3d human pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4342–4351
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sun Y, Ye Y, Liu W, Gao W, Fu Y, Mei T (2019) Human mesh recovery from monocular images via a skeleton-disentangled representation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5349–5358
Wang H, Sun M (2022) Smart-VPoseNet 3D human pose estimation models and methods based on multi-view discriminant network. Knowl Based Syst 239: 107992
Wang H, Sun MH, Zhang H, Dong LY (2022) LHPE-nets: A lightweight 2D and 3D human pose estimation model with well-structural deep networks and multi-view pose sample simplification method. Plos One 17(2):e0264302
Wang C, Wang Y, Lin Z, Yuille AL (2018) Robust 3d human pose estimation from single images or video sequences. IEEE Trans Pattern Anal Mach Intell 41(5):1227–1241
Xiao J, Xu H, Gao H, Bian M, Li Y (2021) A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective. ACM Trans Multimed Comput Commun Appl(TOMM) 17 (1s):1–19
Zhang F, Zhu X, Ye M (2019) Fast human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3517–3526
Acknowledgements
This study has been partially supported by National Natural Science Foundation of China (61872164), Program of Science and Technology Development Plan of Jilin Province of China (20220201147GX) and Fundamental Research Funds for the Central Universities (2022-JCXK-02).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Sun, M. HQLC-Overlap: an adaptive low-cost binocular 3D human pose estimation model. Multimed Tools Appl 82, 17159–17173 (2023). https://doi.org/10.1007/s11042-022-14156-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14156-5