skip to main content
10.1145/3581783.3611989acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

DecenterNet: Bottom-Up Human Pose Estimation Via Decentralized Pose Representation

Published: 27 October 2023 Publication History

Abstract

Multi-person pose estimation in crowded scenes remains a very challenging task. This paper finds that most previous methods fail to estimate or group visible keypoints in crowded scenes rather than reasoning invisible keypoints. We thus categorize the crowded scenes into entanglement and occlusion based on the visibility of human parts and observe that entanglement is a significant problem in crowded scenes. With this observation, we propose DecenterNet, an end-to-end deep architecture to perform robust and efficient pose estimation in crowded scenes. Within DecenterNet, we introduce a decentralized pose representation that uses all visible keypoints as the root points to represent human poses, which is more robust in the entanglement area. We also propose a decoupled pose assessment mechanism, which introduces a location map to adaptively select optimal poses in the offset map. In addition, we have constructed a new dataset named SkatingPose, containing more entangled scenes. The proposed DecenterNet surpasses the best method on SkatingPose by 1.8 AP. Furthermore, DecenterNet obtains 71.2 AP and 71.4 AP on the COCO and CrowdPose datasets, respectively, demonstrating the superiority of our method. We will release our source code, trained models, and dataset to facilitate further studies in this research direction. Our code and dataset are available in https://github.com/InvertedForest/DecenterNet.

Supplemental Material

MP4 File
Video for "DecenterNet: Bottom-Up Human Pose Estimation Via Decentralized Pose Representation".

References

[1]
Rusa Agafonova. 2019. International skating union versus European Commission: is the European sports model under threat? The International Sports Law Journal 19, 1 (2019), 87--101.
[2]
Md Zahangir Alom, Tarek M Taha, Christopher Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Brian C Van Esesn, Abdul A S Awwal, and Vijayan K Asari. 2018. The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164 (2018).
[3]
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2d human pose estimation: New benchmark and state of the art analysis. In CVPR. 3686--3693.
[4]
Guillem Brasó, Nikita Kister, and Laura Leal-Taixé. 2021. The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In ICCV. 11853--11863.
[5]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In CVPR. 7103--7112.
[6]
Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. 2022. Masked-attention mask transformer for universal image segmentation. In CVPR. 1290--1299.
[7]
Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S Huang, and Lei Zhang. 2020. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In CVPR. 5386--5395.
[8]
Yu Cheng, Bo Wang, Bo Yang, and Robby T Tan. 2021. Monocular 3D multi-person pose estimation by integrating top-down and bottom-up networks. In CVPR. 7649--7659.
[9]
Xiaochuan Fan, Kang Zheng, Yuewei Lin, and Song Wang. 2015. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In CVPR. 1347--1355.
[10]
Hao-Shu Fang, Jiefeng Li, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, and Cewu Lu. 2022. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE TPAMI (2022).
[11]
Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, and Jingdong Wang. 2021. Bottom-up human pose estimation via disentangled keypoint regression. In CVPR. 14676--14686.
[12]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580--587.
[13]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In ICCV. 2961--2969.
[14]
Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Jian Zhao, Guodong Guo, and Zhenjun Han. 2021. Anti-UAV: A large multi-modal benchmark for UAV tracking. arXiv preprint arXiv:2101.08466 (2021).
[15]
Lei Jin, Xiaojuan Wang, Xuecheng Nie, Luoqi Liu, Yandong Guo, and Jian Zhao. 2022. Grouping by center: Predicting centripetal offsets for the bottom-up human pose estimation. IEEE TMM (2022).
[16]
Lei Jin, Xiaojuan Wang, Xuecheng Nie, Wendong Wang, Yandong Guo, Shuicheng Yan, and Jian Zhao. 2023. Rethinking the Person Localization for Single-Stage Multi-Person Pose Estimation. IEEE TMM (2023).
[17]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[18]
Sven Kreiss, Lorenzo Bertoni, and Alexandre Alahi. 2019. Pifpaf: Composite fields for human pose estimation. In CVPR. 11977--11986.
[19]
Jin Lei, Chenyang Xu, Xiaojuan Wang, Yabo Xiao, Yandong Guo, Xuecheng Nie, and Jian Zhao. 2022. Single-Stage is Enough: Multi-Person Absolute 3D Pose Estimation. CVPR (2022).
[20]
Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, and Cewu Lu. 2019. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In CVPR. 10863--10872.
[21]
Qun Li, Ziyi Zhang, Fu Xiao, Feng Zhang, and Bir Bhanu. 2022. Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human Pose Estimation. In IJCAI. 1095--1101.
[22]
Qun Li, Ziyi Zhang, Feng Zhang, and Fu Xiao. [n. d.]. HRNeXt: High-Resolution Context Network for Crowd Pose Estimation. ([n. d.]).
[23]
Hongzhou Lin and Stefanie Jegelka. 2018. Resnet with one-neuron hidden layers is a universal approximator. NIPS 31 (2018).
[24]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.
[25]
Weian Mao, Zhi Tian, Xinlong Wang, and Chunhua Shen. 2021. Fcpose: Fully convolutional multi-person pose estimation with dynamic instance-aware convolutions. In CVPR. 9034--9043.
[26]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. 2017. Mixed precision training. arXiv preprint arXiv:1710.03740 (2017).
[27]
Alejandro Newell, Zhiao Huang, and Jia Deng. 2017. Associative embedding: End-to-end learning for joint detection and grouping. NeuIPS 30 (2017).
[28]
Xuecheng Nie, Jiashi Feng, Junliang Xing, and Shuicheng Yan. 2018. Pose partition networks for multi-person pose estimation. In ECCV. 684--699.
[29]
Xuecheng Nie, Jiashi Feng, Jianfeng Zhang, and Shuicheng Yan. 2019. Single-stage multi-person pose machines. In ICCV. 6951--6960.
[30]
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards accurate multi-person pose estimation in the wild. In CVPR. 4903--4911.
[31]
Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2013. Poselet conditioned pictorial structures. In CVPR. 588--595.
[32]
Lingteng Qiu, Xuanye Zhang, Yanran Li, Guanbin Li, Xiaojun Wu, Zixiang Xiong, Xiaoguang Han, and Shuguang Cui. 2020. Peeking into occluded joints: A novel framework for crowd pose estimation. In ECCV. Springer, 488--504.
[33]
Dahu Shi, Xing Wei, Liangqi Li, Ye Ren, and Wenming Tan. 2022. End-to-end multi-person pose estimation with transformers. In CVPR. 11069--11078.
[34]
Juil Sock, Kwang In Kim, Caner Sahin, and Tae-Kyun Kim. 2018. Multi-task deep networks for depth-based 6d object pose and joint registration in crowd scenarios. arXiv preprint arXiv:1806.03891 (2018).
[35]
Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, and Jingdong Wang. 2017. Human pose estimation using global and local normalization. In ICCV. 5599--5607.
[36]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In CVPR. 5693--5703.
[37]
Dongkai Wang, Shiliang Zhang, and Gang Hua. 2021. Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference. NeuIPS 34 (2021), 6278--6289.
[38]
Junjie Wang, Zhenbo Yu, Zhengyan Tong, Hang Wang, Jinxian Liu, Wenjun Zhang, and Xiaoyan Wu. 2022. OCR-Pose: Occlusion-Aware Contrastive Representation for Unsupervised 3D Human Pose Estimation. In ACMMM (Lisboa, Portugal) (MM '22). Association for Computing Machinery, New York, NY, USA, 5477--5485. https://doi.org/10.1145/3503161.3547780
[39]
Qingzhong Wang, Pengfei Zhang, Haoyi Xiong, and Jian Zhao. 2021. Face. evolve: A high-performance face recognition library. arXiv preprint arXiv:2107.08621 (2021).
[40]
Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. 2021. Cvt: Introducing convolutions to vision transformers. In ICCV. 22--31.
[41]
J. Wu, H. Zheng, B. Zhao, Y. Li, B. Yan, R. Liang, W. Wang, S. Zhou, G. Lin, and Y. Fu. 2017. AI Challenger: A Large-scale Dataset for Going Deeper in Image Understanding. In ICME.
[42]
Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple baselines for human pose estimation and tracking. In ECCV. 466--481.
[43]
Yabo Xiao, Kai Su, Xiaojuan Wang, Dongdong Yu, Lei Jin, Mingshu He, and Zehuan Yuan. 2022. QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query. (2022).
[44]
Yabo Xiao, Xiao Juan Wang, Dongdong Yu, Guoli Wang, Qian Zhang, and HE Mingshu. 2022. Adaptivepose: Human parts as adaptive points. In AAAI, Vol. 36. 2813--2821.
[45]
Lumin Xu, Ruihan Xu, and Sheng Jin. 2020. Hieve acm mm grand challenge 2020: Pose tracking in crowded scenes. In ACMMM. 4689--4693.
[46]
Nan Xue, Tianfu Wu, Gui-Song Xia, and Liangpei Zhang. 2022. Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation. In CVPR. 13065--13074.
[47]
Yi Yang and Deva Ramanan. 2011. Articulated pose estimation with flexible mixtures-of-parts. In CVPR. 1385--1392.
[48]
Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, and Jingdong Wang. 2021. Lite-hrnet: A lightweight high-resolution network. In CVPR. 10440--10450.
[49]
Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep layer aggregation. In CVPR. 2403--2412.
[50]
Christoph Zauner. 2010. Implementation and benchmarking of perceptual image hash functions. (2010).
[51]
Jian Zhao, Yu Cheng, Yi Cheng, Yang Yang, Fang Zhao, Jianshu Li, Hengzhu Liu, Shuicheng Yan, and Jiashi Feng. 2019. Look across elapse: Disentangled representation learning and photorealistic cross-age face synthesis for age-invariant face recognition. In AAAI, Vol. 33. 9251--9258.
[52]
Jian Zhao, Junliang Xing, Lin Xiong, Shuicheng Yan, and Jiashi Feng. 2020. Rec-ognizing profile faces by imagining frontal view. IJCV 128 (2020), 460--478.
[53]
Jian Zhao12, Jianshu Li, Fang Zhao, Shuicheng Yan13, and Jiashi Feng. 2017. Marginalized CNN: Learning deep invariant representations. (2017).
[54]
C. Zhe, T. Simon, S. E. Wei, and Y. Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR.
[55]
Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).

Cited By

View all
  • (2025)ZeroPose: CAD-Prompted Zero-Shot Object 6D Pose Estimation in Cluttered ScenesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.348243935:2(1251-1264)Online publication date: Feb-2025
  • (2025)Class Incremental Learning With Less Forgetting Direction and Equilibrium PointIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.347795135:2(1150-1164)Online publication date: Feb-2025
  • (2024)Diffusion-Based Hypotheses Generation and Joint-Level Hypotheses Aggregation for 3D Human Pose EstimationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341534834:11(10678-10691)Online publication date: Nov-2024
  • Show More Cited By

Index Terms

  1. DecenterNet: Bottom-Up Human Pose Estimation Via Decentralized Pose Representation

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MM '23: Proceedings of the 31st ACM International Conference on Multimedia
        October 2023
        9913 pages
        ISBN:9798400701085
        DOI:10.1145/3581783
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 October 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. datasets
        2. human pose estimation
        3. neural networks
        4. single-stage

        Qualifiers

        • Research-article

        Funding Sources

        • the Fundamental Research Funds for the Central Universities
        • Young Elite Scientist Sponsorship Program of China Association for Science and Technology
        • National Nature Fund
        • Natural Science Foundation of China
        • Young Elite Scientist Sponsorship Program of Beijing Association for Science and Technology

        Conference

        MM '23
        Sponsor:
        MM '23: The 31st ACM International Conference on Multimedia
        October 29 - November 3, 2023
        Ottawa ON, Canada

        Acceptance Rates

        Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)101
        • Downloads (Last 6 weeks)8
        Reflects downloads up to 02 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)ZeroPose: CAD-Prompted Zero-Shot Object 6D Pose Estimation in Cluttered ScenesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.348243935:2(1251-1264)Online publication date: Feb-2025
        • (2025)Class Incremental Learning With Less Forgetting Direction and Equilibrium PointIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.347795135:2(1150-1164)Online publication date: Feb-2025
        • (2024)Diffusion-Based Hypotheses Generation and Joint-Level Hypotheses Aggregation for 3D Human Pose EstimationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341534834:11(10678-10691)Online publication date: Nov-2024
        • (2024)Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose TransferIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338294834:9(7896-7911)Online publication date: 1-Sep-2024
        • (2024)SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00179(1824-1833)Online publication date: 16-Jun-2024
        • (2024)Multi-supervision transformer combining bounding box and mask for data-limited pose estimationNeurocomputing10.1016/j.neucom.2023.127209571:COnline publication date: 12-Apr-2024
        • (2024)DHRNet: A Dual-path Hierarchical Relation Network for multi-person pose estimationKnowledge-Based Systems10.1016/j.knosys.2024.112263300(112263)Online publication date: Sep-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media