research-article

Pay Attention Selectively and Comprehensively: Pyramid Gating Network for Human Pose Estimation without Pre-training

Authors:

Jimin XiaoAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 2364 - 2371

https://doi.org/10.1145/3394171.3414041

Published: 12 October 2020 Publication History

Get Access

Abstract

Deep neural network with multi-scale feature fusion has achieved great success in human pose estimation. However, drawbacks still exist in these methods: 1) they consider multi-scale features equally, which may over-emphasize redundant features; 2) preferring deeper structures, they can learn features with the strong semantic representation, but tend to lose natural discriminative information; 3) to attain good performance, they rely heavily on pretraining, which is time-consuming, or even unavailable practically. To mitigate these problems, we propose a novel comprehensive recalibration model called Pyramid GAting Network (PGA-Net) that is capable of distillating, selecting, and fusing the discriminative and attention-aware features at different scales and different levels (i.e., both semantic and natural levels). Meanwhile, focusing on fusing features both selectively and comprehensively, PGA-Net can demonstrate remarkable stability and encouraging performance even without pre-training, making the model can be trained truly from scratch. We demonstrate the effectiveness of PGA-Net through validating on COCO and MPII benchmarks, attaining new state-of-the-art performance. https://github.com/ssr0512/PGA-Net

Supplementary Material

MP4 File (3394171.3414041.mp4)

In this work, we develop a novel framework called Pyramid Gating Network. First, we design a multi-stage residual feature pyramid gating strategy which aims to train a very deep network end-to-end. Moreover, we manage to learn soft gates on multi-scale features in the top-down structure, enabling to distillate and select significant features automatically and dynamically. Second, we propose an image pyramid attention which aims at preserving more natural information so as to fuse with semantic features. Third, we excogitate an effective incorporation framework which can combine two pyramid gating strategies (i.e. naturally and semantically) at multiple scales. Importantly, with such reinforced and discriminative features, our model demonstrates remarkably more stable performance and much faster convergence even without the pre-training process, enabling a model which can be truly trained from scratch end-to-end. It is also noted that our method can be also readily applied in other models.

Download
69.55 MB

References

[1]

Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2d human pose estimation: New benchmark and state of the art analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3686--3693.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Aggregated pyramid gating network for human pose estimation without pre-training

Pay Attention to Attention for Sequential Recommendation

Human Pose Estimation based on Attention Multi-resolution Network

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations