Abstract
Human pose estimation is viewed as a crucial step for understanding human behaviour. Although significant progress has been made in this area in recent years, most studies have focused on feature-level information fusion, while decision-level information fusion has rarely been explored. Compared with feature-level information, decision-level information contains more semantic and interpretable information and can help improve the performance of pose estimation in occluded and crowded scenes. In this paper, we focus on the fusion of decision-level information. We propose a View Fusion module for aggregating decision-level information from different stages to generate a more comprehensive estimation. An Auxiliary Task module is introduced to bridge the gap between the feature extractor and the View Fusion module and to provide prior information about the form of the decision-level information. Considering that the precision of predictions from different stages varies, we use different strategies to guide the learning process. Experiments show that our models outperform previous methods and achieve competitive results on the CrowdPose test set. Further experiments indicate that our method is flexible and can improve the performance of various backbones.
Similar content being viewed by others
References
Chen Y, Tian Y, He M (2020) Monocular human pose estimation: A survey of deep learning-based methods. Comput Vis Image Underst 192. https://doi.org/10.1016/j.cviu.2019.102897
Luvizon D, Picard D, Tabia H (2020) Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition. IEEE Trans Pattern Anal Mach Intell:1–1. https://doi.org/10.1109/TPAMI.2020.2976014
Sun Y, Huang H, Yun X, Yang B, Dong K (2021) Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition. Appl Intell. https://doi.org/10.1007/s10489-021-02370-x
Yoon Y, Yu J, Jeon M (2021) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell. https://doi.org/10.1007/s10489-021-02487-z
Gao C, Chen Y, Yu J-G, Sang N (2020) Pose-guided spatiotemporal alignment for video-based person Re-identification. Inf Sci 527:176–190. https://doi.org/10.1016/j.ins.2020.04.007
Zheng L, Huang Y, Lu H, Yang Y (2019) Pose-Invariant Embedding for Deep Person Re-Identification. IEEE Trans Image Process 28(9):4500–4509. https://doi.org/10.1109/TIP.2019.2910414
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) MFDNet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimed:1–1. https://doi.org/10.1109/TMM.2021.3081873
Li D, Liu H, Zhang Z, Lin K, Fang S, Li Z, Xiong N N (2021) CARM: Confidence-aware recommender model via review representation learning and historical rating behavior in the online platforms. Neurocomputing 455:283–296. https://doi.org/10.1016/j.neucom.2021.03.122
Shen X, Yi B, Liu H, Zhang W, Zhang Z, Liu S, Xiong N (2021) Deep Variational Matrix Factorization with Knowledge Embedding for Recommendation System. IEEE Trans Knowl Data Eng 33(5):1906–1918. https://doi.org/10.1109/TKDE.2019.2952849
Liu T, Liu H, Li Y, Zhang Z, Liu S (2019) Efficient Blind Signal Reconstruction With Wavelet Transforms Regularization for Educational Robot Infrared Vision Sensing. IEEE/ASME Trans Mechatron 24(1):384–394. https://doi.org/10.1109/TMECH.2018.2870056
Liu T, Liu H, Li Y-F, Chen Z, Zhang Z, Liu S (2020) Flexible FTIR Spectral Imaging Enhancement for Industrial Robot Infrared Vision Sensing. IEEE Trans Indust Inform 16(1):544–554. https://doi.org/10.1109/TII.2019.2934728
Liu H, Nie H, Zhang Z, Li Y-F (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322. https://doi.org/10.1016/j.neucom.2020.09.068
Li Z, Liu H, Zhang Z, Liu T, Xiong N N (2021) Learning knowledge graph embedding with heterogeneous relation attention networks, IEEE Trans Neural Netw Learn Syst:1–13. https://doi.org/10.1109/TNNLS.2021.3055147
Zhang Z, Li Z, Liu H, Xiong N N (2020) Multi-scale dynamic convolutional network for knowledge graph embedding, IEEE Trans Knowl Data Eng:1–1. https://doi.org/10.1109/TKDE.2020.3005952
Wei S, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional Pose Machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4724–4732
Li M, Zhou Z, Liu X (2019) Multi-Person Pose Estimation Using Bounding Box Constraint and LSTM. IEEE Trans Multimed 21(10):2653–2663. https://doi.org/10.1109/TMM.2019.2903455
Cheng B, Xiao B, Wang J, Shi H, Huang T S, Zhang L (2020) HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5385– 5394
Samet N, Akbas E (2021) HPRNet: Hierarchical point regression for whole-body human pose estimation. Image Vis Comput 115:104285. https://doi.org/10.1016/j.imavis.2021.104285
Toshev A, Szegedy C (2014) DeepPose: Human Pose Estimation via Deep Neural Networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1653– 1660
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using Convolutional Networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 648–656
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 483–499
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 472–487
Tian Y, Hu W, Jiang H, Wu J (2019) Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347:13–23. https://doi.org/10.1016/j.neucom.2019.01.104
Huang J, Zhu Z, Guo F, Huang G (2020) The devil is in the details: delving into unbiased data processing for human pose estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5699–5708
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7103–7112
Cai Y, Wang Z, Luo Z, Yin B, Du A, Wang H, Zhang X, Zhou X, Zhou E, Sun J (2020) Learning delicate local representations for multi-person pose estimation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Springer International Publishing, Cham, pp 455–472
Yan M, Deng Z, He B, Zou C, Wu J, Zhu Z (2022) Emotion classification with multichannel physiological signals using hybrid feature and adaptive decision fusion. Biomed Signal Process Control 71:103235. https://doi.org/10.1016/j.bspc.2021.103235
Liu A-A, Lu Z, Xu N, Nie W, Li W (2021) Multi-type decision fusion network for visual Q&A. Image Vis Comput 115:104281. https://doi.org/10.1016/j.imavis.2021.104281
Geng X, Liang Y, Jiao L (2020) Multi-frame decision fusion based on evidential association rule mining for target identification. Appl Soft Comput 94:106460. https://doi.org/10.1016/j.asoc.2020.106460
Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 536–553
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3711–3719
Zhang W, Fang J, Wang X, Liu W (2021) EfficientPose: Efficient human pose estimation with neural architecture search. Comput Vis Media 7(3):335–347. https://doi.org/10.1007/s41095-021-0214-z
Oh S-I, Kang H-B (2017) Object detection and classification by decision-level fusion for intelligent vehicle systems. Sens (Basel, Switzerland) 17(1):207. https://doi.org/10.3390/s17010207
Zhang J, Tian J, Cao Y, Yang Y, Xu X (2020) Deep time-frequency representation and progressive decision fusion for ECG classification. Knowl-Based Syst 190:105402. https://doi.org/10.1016/j.knosys.2019.105402
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft COCO: Common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision–ECCV 2014, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 740–755
Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) CrowdPose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10855–10864
Geng Z, Sun K, Xiao B, Zhang Z, Wang J (2021) Bottom-up human pose estimation via disentangled keypoint regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14676–14686
Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (January 2021) OpenPose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257
Xiao J, Li H, Qu G, Fujita H, Cao Y, Zhu J, Huang C (2021) Hope: Heatmap and offset for pose estimation. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-021-03124-w
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: Regional Multi-person Pose Estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2353–2362
Xu X, Zou Q, Lin X (2021) CFENet: Content-aware feature enhancement network for multi-person pose estimation. Appl Intell. https://doi.org/10.1007/s10489-021-02383-6
Khirodkar R, Chari V, Agrawal A, Tyagi A (2021) Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Qiu L, Zhang X, Li Y, Li G, Wu X, Xiong Z, Han X, Cui S (2020) Peeking into occluded joints: a novel framework for crowd pose estimation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 488–504
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., pp 8024–8035
Kingma D P, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) International Conference on Learning Representations, San Diego
Yu C, Xiao B, Gao C, Yuan L, Zhang L, Sang N, Wang J (2021) Lite-HRNet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10440–10450
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Funding
This work was supported in part by National Key Research and Development Program of China (No. 2018YFB2101300), in part by National Natural Science Foundation of China (Grant No. 61871186), and in part by the Dean’s Fund of Engineering Research Center of Software/Hardware Codesign Technology and Application, Ministry of Education (East China Normal University).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no relevant financial or nonfinancial interests to disclose.
Additional information
Availability of Data and Material
The data that support the findings of this study are openly available. The COCO dataset is available at https://cocodataset.org/. The CrowdPose dataset is available at https://github.com/Jeff-sjtu/CrowdPose.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A:: Impact of σ
Appendix A:: Impact of σ
We present the evaluation results of LiteHRNet on the CrowdPose test set with various values of σ.
As shown in Fig. 6, with the increase of σ, the performance initially increases and subsequently drops. Thus, the choice of σ can affect the performance.
Rights and permissions
About this article
Cite this article
Zhang, Y., Chen, W. Decision-level information fusion powered human pose estimation. Appl Intell 53, 2161–2172 (2023). https://doi.org/10.1007/s10489-022-03623-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03623-z