Decision-level information fusion powered human pose estimation

Zhang, Yiqing; Chen, Weiting

doi:10.1007/s10489-022-03623-z

Decision-level information fusion powered human pose estimation

Published: 05 May 2022

Volume 53, pages 2161–2172, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yiqing Zhang¹ &
Weiting Chen¹

422 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Human pose estimation is viewed as a crucial step for understanding human behaviour. Although significant progress has been made in this area in recent years, most studies have focused on feature-level information fusion, while decision-level information fusion has rarely been explored. Compared with feature-level information, decision-level information contains more semantic and interpretable information and can help improve the performance of pose estimation in occluded and crowded scenes. In this paper, we focus on the fusion of decision-level information. We propose a View Fusion module for aggregating decision-level information from different stages to generate a more comprehensive estimation. An Auxiliary Task module is introduced to bridge the gap between the feature extractor and the View Fusion module and to provide prior information about the form of the decision-level information. Considering that the precision of predictions from different stages varies, we use different strategies to guide the learning process. Experiments show that our models outperform previous methods and achieve competitive results on the CrowdPose test set. Further experiments indicate that our method is flexible and can improve the performance of various backbones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human pose estimation with gated multi-scale feature fusion and spatial mutual information

Article 08 November 2021

3D Human Pose Estimation Based on Multi-feature Extraction

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

Article 16 November 2020

References

Chen Y, Tian Y, He M (2020) Monocular human pose estimation: A survey of deep learning-based methods. Comput Vis Image Underst 192. https://doi.org/10.1016/j.cviu.2019.102897
Luvizon D, Picard D, Tabia H (2020) Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition. IEEE Trans Pattern Anal Mach Intell:1–1. https://doi.org/10.1109/TPAMI.2020.2976014
Sun Y, Huang H, Yun X, Yang B, Dong K (2021) Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition. Appl Intell. https://doi.org/10.1007/s10489-021-02370-x
Yoon Y, Yu J, Jeon M (2021) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell. https://doi.org/10.1007/s10489-021-02487-z
Gao C, Chen Y, Yu J-G, Sang N (2020) Pose-guided spatiotemporal alignment for video-based person Re-identification. Inf Sci 527:176–190. https://doi.org/10.1016/j.ins.2020.04.007
Article MathSciNet Google Scholar
Zheng L, Huang Y, Lu H, Yang Y (2019) Pose-Invariant Embedding for Deep Person Re-Identification. IEEE Trans Image Process 28(9):4500–4509. https://doi.org/10.1109/TIP.2019.2910414
Article MathSciNet MATH Google Scholar
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) MFDNet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimed:1–1. https://doi.org/10.1109/TMM.2021.3081873
Li D, Liu H, Zhang Z, Lin K, Fang S, Li Z, Xiong N N (2021) CARM: Confidence-aware recommender model via review representation learning and historical rating behavior in the online platforms. Neurocomputing 455:283–296. https://doi.org/10.1016/j.neucom.2021.03.122
Article Google Scholar
Shen X, Yi B, Liu H, Zhang W, Zhang Z, Liu S, Xiong N (2021) Deep Variational Matrix Factorization with Knowledge Embedding for Recommendation System. IEEE Trans Knowl Data Eng 33(5):1906–1918. https://doi.org/10.1109/TKDE.2019.2952849
Article Google Scholar
Liu T, Liu H, Li Y, Zhang Z, Liu S (2019) Efficient Blind Signal Reconstruction With Wavelet Transforms Regularization for Educational Robot Infrared Vision Sensing. IEEE/ASME Trans Mechatron 24(1):384–394. https://doi.org/10.1109/TMECH.2018.2870056
Article Google Scholar
Liu T, Liu H, Li Y-F, Chen Z, Zhang Z, Liu S (2020) Flexible FTIR Spectral Imaging Enhancement for Industrial Robot Infrared Vision Sensing. IEEE Trans Indust Inform 16(1):544–554. https://doi.org/10.1109/TII.2019.2934728
Article Google Scholar
Liu H, Nie H, Zhang Z, Li Y-F (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322. https://doi.org/10.1016/j.neucom.2020.09.068
Article Google Scholar
Li Z, Liu H, Zhang Z, Liu T, Xiong N N (2021) Learning knowledge graph embedding with heterogeneous relation attention networks, IEEE Trans Neural Netw Learn Syst:1–13. https://doi.org/10.1109/TNNLS.2021.3055147
Zhang Z, Li Z, Liu H, Xiong N N (2020) Multi-scale dynamic convolutional network for knowledge graph embedding, IEEE Trans Knowl Data Eng:1–1. https://doi.org/10.1109/TKDE.2020.3005952
Wei S, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional Pose Machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4724–4732
Li M, Zhou Z, Liu X (2019) Multi-Person Pose Estimation Using Bounding Box Constraint and LSTM. IEEE Trans Multimed 21(10):2653–2663. https://doi.org/10.1109/TMM.2019.2903455
Article Google Scholar
Cheng B, Xiao B, Wang J, Shi H, Huang T S, Zhang L (2020) HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5385– 5394
Samet N, Akbas E (2021) HPRNet: Hierarchical point regression for whole-body human pose estimation. Image Vis Comput 115:104285. https://doi.org/10.1016/j.imavis.2021.104285
Article Google Scholar
Toshev A, Szegedy C (2014) DeepPose: Human Pose Estimation via Deep Neural Networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1653– 1660
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using Convolutional Networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 648–656
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 483–499
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 472–487
Tian Y, Hu W, Jiang H, Wu J (2019) Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347:13–23. https://doi.org/10.1016/j.neucom.2019.01.104
Article Google Scholar
Huang J, Zhu Z, Guo F, Huang G (2020) The devil is in the details: delving into unbiased data processing for human pose estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5699–5708
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686
Article Google Scholar
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7103–7112
Cai Y, Wang Z, Luo Z, Yin B, Du A, Wang H, Zhang X, Zhou X, Zhou E, Sun J (2020) Learning delicate local representations for multi-person pose estimation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Springer International Publishing, Cham, pp 455–472
Yan M, Deng Z, He B, Zou C, Wu J, Zhu Z (2022) Emotion classification with multichannel physiological signals using hybrid feature and adaptive decision fusion. Biomed Signal Process Control 71:103235. https://doi.org/10.1016/j.bspc.2021.103235
Article Google Scholar
Liu A-A, Lu Z, Xu N, Nie W, Li W (2021) Multi-type decision fusion network for visual Q&A. Image Vis Comput 115:104281. https://doi.org/10.1016/j.imavis.2021.104281
Article Google Scholar
Geng X, Liang Y, Jiao L (2020) Multi-frame decision fusion based on evidential association rule mining for target identification. Appl Soft Comput 94:106460. https://doi.org/10.1016/j.asoc.2020.106460
Article Google Scholar
Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 536–553
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3711–3719
Zhang W, Fang J, Wang X, Liu W (2021) EfficientPose: Efficient human pose estimation with neural architecture search. Comput Vis Media 7(3):335–347. https://doi.org/10.1007/s41095-021-0214-z
Article Google Scholar
Oh S-I, Kang H-B (2017) Object detection and classification by decision-level fusion for intelligent vehicle systems. Sens (Basel, Switzerland) 17(1):207. https://doi.org/10.3390/s17010207
Article MathSciNet Google Scholar
Zhang J, Tian J, Cao Y, Yang Y, Xu X (2020) Deep time-frequency representation and progressive decision fusion for ECG classification. Knowl-Based Syst 190:105402. https://doi.org/10.1016/j.knosys.2019.105402
Article Google Scholar
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft COCO: Common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision–ECCV 2014, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 740–755
Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) CrowdPose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10855–10864
Geng Z, Sun K, Xiao B, Zhang Z, Wang J (2021) Bottom-up human pose estimation via disentangled keypoint regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14676–14686
Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (January 2021) OpenPose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257
Xiao J, Li H, Qu G, Fujita H, Cao Y, Zhu J, Huang C (2021) Hope: Heatmap and offset for pose estimation. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-021-03124-w
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Article Google Scholar
Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: Regional Multi-person Pose Estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2353–2362
Xu X, Zou Q, Lin X (2021) CFENet: Content-aware feature enhancement network for multi-person pose estimation. Appl Intell. https://doi.org/10.1007/s10489-021-02383-6
Khirodkar R, Chari V, Agrawal A, Tyagi A (2021) Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Qiu L, Zhang X, Li Y, Li G, Wu X, Xiong Z, Han X, Cui S (2020) Peeking into occluded joints: a novel framework for crowd pose estimation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 488–504
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., pp 8024–8035
Kingma D P, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) International Conference on Learning Representations, San Diego
Yu C, Xiao B, Gao C, Yuan L, Zhang L, Sang N, Wang J (2021) Lite-HRNet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10440–10450
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

Download references

Funding

This work was supported in part by National Key Research and Development Program of China (No. 2018YFB2101300), in part by National Natural Science Foundation of China (Grant No. 61871186), and in part by the Dean’s Fund of Engineering Research Center of Software/Hardware Codesign Technology and Application, Ministry of Education (East China Normal University).

Author information

Authors and Affiliations

MOE Research Center of Software/Hardware Co-Design Engineering, East China Normal University, Shanghai, China
Yiqing Zhang & Weiting Chen

Authors

Yiqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weiting Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiting Chen.

Ethics declarations

Conflict of Interests

The authors have no relevant financial or nonfinancial interests to disclose.

Additional information

Availability of Data and Material

The data that support the findings of this study are openly available. The COCO dataset is available at https://cocodataset.org/. The CrowdPose dataset is available at https://github.com/Jeff-sjtu/CrowdPose.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A:: Impact of σ

We present the evaluation results of LiteHRNet on the CrowdPose test set with various values of σ.

As shown in Fig. 6, with the increase of σ, the performance initially increases and subsequently drops. Thus, the choice of σ can affect the performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Chen, W. Decision-level information fusion powered human pose estimation. Appl Intell 53, 2161–2172 (2023). https://doi.org/10.1007/s10489-022-03623-z

Download citation

Accepted: 10 April 2022
Published: 05 May 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03623-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decision-level information fusion powered human pose estimation

Abstract

Access this article

Similar content being viewed by others

Human pose estimation with gated multi-scale feature fusion and spatial mutual information

3D Human Pose Estimation Based on Multi-feature Extraction

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of Data and Material

Publisher’s note

Appendix A:: Impact of σ

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decision-level information fusion powered human pose estimation

Abstract

Access this article

Similar content being viewed by others

Human pose estimation with gated multi-scale feature fusion and spatial mutual information

3D Human Pose Estimation Based on Multi-feature Extraction

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of Data and Material

Publisher’s note

Appendix A:: Impact of σ

Appendix A:: Impact of σ

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation