Skip to main content
Log in

Decision-level information fusion powered human pose estimation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Human pose estimation is viewed as a crucial step for understanding human behaviour. Although significant progress has been made in this area in recent years, most studies have focused on feature-level information fusion, while decision-level information fusion has rarely been explored. Compared with feature-level information, decision-level information contains more semantic and interpretable information and can help improve the performance of pose estimation in occluded and crowded scenes. In this paper, we focus on the fusion of decision-level information. We propose a View Fusion module for aggregating decision-level information from different stages to generate a more comprehensive estimation. An Auxiliary Task module is introduced to bridge the gap between the feature extractor and the View Fusion module and to provide prior information about the form of the decision-level information. Considering that the precision of predictions from different stages varies, we use different strategies to guide the learning process. Experiments show that our models outperform previous methods and achieve competitive results on the CrowdPose test set. Further experiments indicate that our method is flexible and can improve the performance of various backbones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Chen Y, Tian Y, He M (2020) Monocular human pose estimation: A survey of deep learning-based methods. Comput Vis Image Underst 192. https://doi.org/10.1016/j.cviu.2019.102897

  2. Luvizon D, Picard D, Tabia H (2020) Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition. IEEE Trans Pattern Anal Mach Intell:1–1. https://doi.org/10.1109/TPAMI.2020.2976014

  3. Sun Y, Huang H, Yun X, Yang B, Dong K (2021) Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition. Appl Intell. https://doi.org/10.1007/s10489-021-02370-x

  4. Yoon Y, Yu J, Jeon M (2021) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell. https://doi.org/10.1007/s10489-021-02487-z

  5. Gao C, Chen Y, Yu J-G, Sang N (2020) Pose-guided spatiotemporal alignment for video-based person Re-identification. Inf Sci 527:176–190. https://doi.org/10.1016/j.ins.2020.04.007

    Article  MathSciNet  Google Scholar 

  6. Zheng L, Huang Y, Lu H, Yang Y (2019) Pose-Invariant Embedding for Deep Person Re-Identification. IEEE Trans Image Process 28(9):4500–4509. https://doi.org/10.1109/TIP.2019.2910414

    Article  MathSciNet  MATH  Google Scholar 

  7. Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) MFDNet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimed:1–1. https://doi.org/10.1109/TMM.2021.3081873

  8. Li D, Liu H, Zhang Z, Lin K, Fang S, Li Z, Xiong N N (2021) CARM: Confidence-aware recommender model via review representation learning and historical rating behavior in the online platforms. Neurocomputing 455:283–296. https://doi.org/10.1016/j.neucom.2021.03.122

    Article  Google Scholar 

  9. Shen X, Yi B, Liu H, Zhang W, Zhang Z, Liu S, Xiong N (2021) Deep Variational Matrix Factorization with Knowledge Embedding for Recommendation System. IEEE Trans Knowl Data Eng 33(5):1906–1918. https://doi.org/10.1109/TKDE.2019.2952849

    Article  Google Scholar 

  10. Liu T, Liu H, Li Y, Zhang Z, Liu S (2019) Efficient Blind Signal Reconstruction With Wavelet Transforms Regularization for Educational Robot Infrared Vision Sensing. IEEE/ASME Trans Mechatron 24(1):384–394. https://doi.org/10.1109/TMECH.2018.2870056

    Article  Google Scholar 

  11. Liu T, Liu H, Li Y-F, Chen Z, Zhang Z, Liu S (2020) Flexible FTIR Spectral Imaging Enhancement for Industrial Robot Infrared Vision Sensing. IEEE Trans Indust Inform 16(1):544–554. https://doi.org/10.1109/TII.2019.2934728

    Article  Google Scholar 

  12. Liu H, Nie H, Zhang Z, Li Y-F (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322. https://doi.org/10.1016/j.neucom.2020.09.068

    Article  Google Scholar 

  13. Li Z, Liu H, Zhang Z, Liu T, Xiong N N (2021) Learning knowledge graph embedding with heterogeneous relation attention networks, IEEE Trans Neural Netw Learn Syst:1–13. https://doi.org/10.1109/TNNLS.2021.3055147

  14. Zhang Z, Li Z, Liu H, Xiong N N (2020) Multi-scale dynamic convolutional network for knowledge graph embedding, IEEE Trans Knowl Data Eng:1–1. https://doi.org/10.1109/TKDE.2020.3005952

  15. Wei S, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional Pose Machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4724–4732

  16. Li M, Zhou Z, Liu X (2019) Multi-Person Pose Estimation Using Bounding Box Constraint and LSTM. IEEE Trans Multimed 21(10):2653–2663. https://doi.org/10.1109/TMM.2019.2903455

    Article  Google Scholar 

  17. Cheng B, Xiao B, Wang J, Shi H, Huang T S, Zhang L (2020) HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5385– 5394

  18. Samet N, Akbas E (2021) HPRNet: Hierarchical point regression for whole-body human pose estimation. Image Vis Comput 115:104285. https://doi.org/10.1016/j.imavis.2021.104285

    Article  Google Scholar 

  19. Toshev A, Szegedy C (2014) DeepPose: Human Pose Estimation via Deep Neural Networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1653– 1660

  20. Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using Convolutional Networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 648–656

  21. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 483–499

  22. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 472–487

  23. Tian Y, Hu W, Jiang H, Wu J (2019) Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347:13–23. https://doi.org/10.1016/j.neucom.2019.01.104

    Article  Google Scholar 

  24. Huang J, Zhu Z, Guo F, Huang G (2020) The devil is in the details: delving into unbiased data processing for human pose estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5699–5708

  25. Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686

    Article  Google Scholar 

  26. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7103–7112

  27. Cai Y, Wang Z, Luo Z, Yin B, Du A, Wang H, Zhang X, Zhou X, Zhou E, Sun J (2020) Learning delicate local representations for multi-person pose estimation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Springer International Publishing, Cham, pp 455–472

  28. Yan M, Deng Z, He B, Zou C, Wu J, Zhu Z (2022) Emotion classification with multichannel physiological signals using hybrid feature and adaptive decision fusion. Biomed Signal Process Control 71:103235. https://doi.org/10.1016/j.bspc.2021.103235

    Article  Google Scholar 

  29. Liu A-A, Lu Z, Xu N, Nie W, Li W (2021) Multi-type decision fusion network for visual Q&A. Image Vis Comput 115:104281. https://doi.org/10.1016/j.imavis.2021.104281

    Article  Google Scholar 

  30. Geng X, Liang Y, Jiao L (2020) Multi-frame decision fusion based on evidential association rule mining for target identification. Appl Soft Comput 94:106460. https://doi.org/10.1016/j.asoc.2020.106460

    Article  Google Scholar 

  31. Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 536–553

  32. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3711–3719

  33. Zhang W, Fang J, Wang X, Liu W (2021) EfficientPose: Efficient human pose estimation with neural architecture search. Comput Vis Media 7(3):335–347. https://doi.org/10.1007/s41095-021-0214-z

    Article  Google Scholar 

  34. Oh S-I, Kang H-B (2017) Object detection and classification by decision-level fusion for intelligent vehicle systems. Sens (Basel, Switzerland) 17(1):207. https://doi.org/10.3390/s17010207

    Article  MathSciNet  Google Scholar 

  35. Zhang J, Tian J, Cao Y, Yang Y, Xu X (2020) Deep time-frequency representation and progressive decision fusion for ECG classification. Knowl-Based Syst 190:105402. https://doi.org/10.1016/j.knosys.2019.105402

    Article  Google Scholar 

  36. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft COCO: Common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision–ECCV 2014, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 740–755

  37. Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) CrowdPose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10855–10864

  38. Geng Z, Sun K, Xiao B, Zhang Z, Wang J (2021) Bottom-up human pose estimation via disentangled keypoint regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14676–14686

  39. Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (January 2021) OpenPose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257

  40. Xiao J, Li H, Qu G, Fujita H, Cao Y, Zhu J, Huang C (2021) Hope: Heatmap and offset for pose estimation. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-021-03124-w

  41. He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175

    Article  Google Scholar 

  42. Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: Regional Multi-person Pose Estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2353–2362

  43. Xu X, Zou Q, Lin X (2021) CFENet: Content-aware feature enhancement network for multi-person pose estimation. Appl Intell. https://doi.org/10.1007/s10489-021-02383-6

  44. Khirodkar R, Chari V, Agrawal A, Tyagi A (2021) Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

  45. Qiu L, Zhang X, Li Y, Li G, Wu X, Xiong Z, Han X, Cui S (2020) Peeking into occluded joints: a novel framework for crowd pose estimation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 488–504

  46. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., pp 8024–8035

  47. Kingma D P, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) International Conference on Learning Representations, San Diego

  48. Yu C, Xiao B, Gao C, Yuan L, Zhang L, Sang N, Wang J (2021) Lite-HRNet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10440–10450

  49. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

Download references

Funding

This work was supported in part by National Key Research and Development Program of China (No. 2018YFB2101300), in part by National Natural Science Foundation of China (Grant No. 61871186), and in part by the Dean’s Fund of Engineering Research Center of Software/Hardware Codesign Technology and Application, Ministry of Education (East China Normal University).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiting Chen.

Ethics declarations

Conflict of Interests

The authors have no relevant financial or nonfinancial interests to disclose.

Additional information

Availability of Data and Material

The data that support the findings of this study are openly available. The COCO dataset is available at https://cocodataset.org/. The CrowdPose dataset is available at https://github.com/Jeff-sjtu/CrowdPose.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A:: Impact of σ

Appendix A:: Impact of σ

We present the evaluation results of LiteHRNet on the CrowdPose test set with various values of σ.

As shown in Fig. 6, with the increase of σ, the performance initially increases and subsequently drops. Thus, the choice of σ can affect the performance.

Fig. 6
figure 6

Impact of σ

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Chen, W. Decision-level information fusion powered human pose estimation. Appl Intell 53, 2161–2172 (2023). https://doi.org/10.1007/s10489-022-03623-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03623-z

Keywords

Navigation