Skip to main content

Tiny Person Pose Estimation via Image and Feature Super Resolution

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12890))

Abstract

Although great progress has been achieved on human pose estimation in recent years, we notice the performance drops dramatically when the scale of target person becomes small. In this paper, we start with analysis on tiny person pose estimation and find that the failure is mainly caused by blurriness and ambiguous edges in up-sampled images, which are harmful for pose estimation. Based on the above analysis, we propose to apply an additional super resolution network on top of an existing pose estimation method to better handle tiny persons. Specifically, we propose three super resolution (SR) networks which apply on image level, feature level and both levels, respectively. Furthermore, a novel task-driven loss function tailored to pose estimation is proposed for SR networks. Experimental results on the MPII and MSCOCO datasets show that our proposed pose super resolution methods bring significant improvements over the baseline for tiny persons.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7103–7112 (2018)

    Google Scholar 

  2. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13

    Chapter  Google Scholar 

  3. Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2334–2343 (2017)

    Google Scholar 

  4. Hui, Z., Wang, X., Gao, X.: Fast and accurate single image super-resolution via information distillation network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 723–731 (2018)

    Google Scholar 

  5. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3

    Chapter  Google Scholar 

  6. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703 (2019)

    Google Scholar 

  7. Liang, X., Gong, K., Shen, X., Lin, L.: Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 41, 871–885 (2019)

    Article  Google Scholar 

  8. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  9. Neumann, L., Vedaldi, A.: Tiny people pose. In: Asian Conference on Computer Vision (ACCV), pp. 558–574 (2018)

    Google Scholar 

  10. Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2277–2287 (2017)

    Google Scholar 

  11. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  12. Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4929–4937 (2016)

    Google Scholar 

  13. Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5674–5682 (2019)

    Google Scholar 

  14. Tan, W., Yan, B., Bare, B.: Feature super-resolution: make machine see more clearly. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3994–4002 (2018)

    Google Scholar 

  15. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 648–656 (2015)

    Google Scholar 

  16. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660 (2014)

    Google Scholar 

  17. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29

    Chapter  Google Scholar 

  18. Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6995–7003 (2018)

    Google Scholar 

Download references

Ackowledgments

This work was partially supported by the National Natural Science Foundation of China (Grant No. U1713208, 61802189), Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (Grant No. 61861136011), Natural Science Foundation of Jiangsu Province, China (Grant No. BK20181299), the Fundamental Research Funds for the Central Universities (Grant No. 30920032201), National Key Research and Development Program of China (Grant No. 2017YFC0820601), China Postdoctoral Science Foundation (Grand No. 2020M681609).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shanshan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, J., Liu, Y., Zhao, L., Zhang, S., Yang, J. (2021). Tiny Person Pose Estimation via Image and Feature Super Resolution. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12890. Springer, Cham. https://doi.org/10.1007/978-3-030-87361-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87361-5_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87360-8

  • Online ISBN: 978-3-030-87361-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics