Skip to main content
Log in

Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

For multi-person 2D pose estimation, current deep learning based methods have exhibited impressive performance, but the trade-offs among efficiency, robustness, and accuracy in the existing approaches remain unavoidable. In principle, bottom-up methods are superior to top-down methods in efficiency, but they perform worse in accuracy. To make full use of their respective advantages, in this paper we design a novel bidirectional optimization coupled lightweight network (BOCLN) architecture for efficient, robust, and general-purpose multi-person 2D (2-dimensional) pose estimation from natural images. With the BOCLN framework, the bottom-up network focuses on global features, while the top-down network places emphasis on detailed features. The entire framework shares global features along the bottom-up data stream, while the top-down data stream aims to accelerate the accurate pose estimation. In particular, to exploit the priors of human joints’ relationship, we propose a probability limb heat map to represent the spatial context of the joints and guide the overall pose skeleton prediction, so that each person’s pose estimation in cluttered scenes (involving crowd) could be as accurate and robust as possible. Therefore, benefiting from the novel BOCLN architecture, the time-consuming refinement procedure could be much simplified to an efficient lightweight network. Extensive experiments and evaluations on public benchmarks have confirmed that our new method is more efficient and robust, yet still attain competitive accuracy performance compared with the state-of-the-art methods. Our BOCLN shows even greater promise in online applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wen Y, Gao L, Fu H, Zhang F, Xia S. Graph CNNs with motif and variable temporal block for skeleton-based action recognition. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 2019.

  2. Kikuchi T, Endo Y, Kanamori Y, Hashimoto T, Mitani J. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media, 2018, 4(1): 43-54.

    Article  Google Scholar 

  3. Fan X, Zheng K, Lin Y, Wang S. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1347-1355.

  4. Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference, October 2016, pp.483-499.

  5. Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4724-4732.

  6. Chen Y, Shen C, Wei X S, Liu L, Yang J. Adversarial PoseNet: A structure-aware convolutional network for human pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1212-1221.

  7. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P V, Schiele B. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4929-4937.

  8. Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1302-1310.

  9. Newell A, Huang Z, Deng J. Associative embedding: Endto-end learning for joint detection and grouping. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.2274-2284.

  10. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2980-2988.

  11. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K. Towards accurate multi-person pose estimation in the wild. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3711-3719.

  12. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J. Cascaded pyramid network for multi-person pose estimation. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7103-7112.

  13. Papandreou G, Zhu T, Chen L C, Gidaris S, Tompson J, Murphy K. PersonLab: Person pose estimation and instance segmentation with a bottom-up, partbased, geometric embedding model. arXiv:1803.08225, 2018. https://arxiv.org/abs/1803.08225, January 2019.

  14. Kocabas M, Karagoz S, Akbas E. MultiPoseNet: Fast multi-person pose estimation using pose residual network. arXiv:1807.04067, 2018. https://arxiv.org/abs/1807.04067, January 2019.

  15. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2005, pp.886-893.

  16. Chen X, Yuille A L. Articulated pose estimation by a graphical model with image dependent pairwise relations. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1736-1744.

  17. Andriluka M, Roth S, Schiele B. Pictorial structures revisited: People detection and articulated pose estimation. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.1014-1021.

  18. Johnson S, Everingham M. Learning effective human pose estimation from inaccurate annotation. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1465-1472.

  19. Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1385-1392.

  20. Dantone M, Gall J, Leistner C, Gool L V. Human pose estimation using body parts dependent joint regressors. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3041-3048.

  21. Gkioxari G, Arbelaez P, Bourdev L, Malik J. Articulated pose estimation using discriminative armlet classifiers. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3342-3349.

  22. Pishchulin L, Andriluka M, Gehler P, Schiele B. Poselet conditioned pictorial structures. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.588-595.

  23. Sapp B, Taskar B. MODEC: Multimodal decomposable models for human pose estimation. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3674-3681.

  24. Toshev A, Szegedy C. DeepPose: Human pose estimation via deep neural networks. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1653-1660.

  25. Zhang Z, Luo P, Loy C C, Tang X. Facial landmark detection by deep multi-task learning. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.94-108.

  26. Wang J, Zhang J, Luo C, Chen F. Joint head pose and facial landmark regression from depth images. Computational Visual Media, 2017, 3(3): 229-241.

    Article  Google Scholar 

  27. Tompson J J, Jain A, LeCun Y, Bregler C. Joint training of a convolutional network and a graphical model for human pose estimation. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1799-1807.

  28. Chu X, Yang W, Ouyang W, Ma C, Yuille A L, Wang X. Multi-context attention for human pose estimation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5669-5678.

  29. Rogez G,Weinzaepfel P, Schmid C. LCR-Net: Localizationclassification-regression for human pose. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1216-1224.

  30. Fang H, Xie S, Tai Y W, Lu C. RMPE: Regional multiperson pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2353-2362.

  31. Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448.

  32. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.91-99.

  33. Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944.

  34. Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO: Common objects in context. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.740-755.

  35. Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D human pose estimation: New benchmark and state of the art analysis. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.3686-3693.

  36. Paszke A, Gross S, Chintala S, Chanan G, Yang E, De-Vito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In Proc. the 2017 Annual Conference on Neural Information Processing Systems Autodiff Workshop, December 2017.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Qin.

Electronic supplementary material

ESM 1

(PDF 692 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Fang, Z., Song, WF. et al. Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation. J. Comput. Sci. Technol. 34, 522–536 (2019). https://doi.org/10.1007/s11390-019-1924-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-019-1924-x

Keywords

Navigation