Loading [a11y]/accessibility-menu.js
Learning Monocular Regression of 3D People in Crowds via Scene-Aware Blending and De-Occlusion | IEEE Journals & Magazine | IEEE Xplore

Learning Monocular Regression of 3D People in Crowds via Scene-Aware Blending and De-Occlusion


Abstract:

In this study, we address the challenge of estimating 3D body pose, shape, and depth relationships from single RGB images in crowded scenes. The difficulty lies in the li...Show More

Abstract:

In this study, we address the challenge of estimating 3D body pose, shape, and depth relationships from single RGB images in crowded scenes. The difficulty lies in the limited availability of in-the-wild training samples, which feature densely populated scenes. To mitigate this issue, we introduce a synthesis-based approach that fuses multiple human samples into a single composite scene. Our innovative scene-aware blending technique maintains human-scene consistency by positioning individuals within plausible locations and adjusting their scales to conform to 3D settings. Furthermore, our method enables flexible per-subject occlusion management during the blending process, bolstering the robustness of 3D human body representations through a novel de-occlusion training scheme. We present a one-stage model, CBD, designed to learn monocular regression of 3D people in crowds by leveraging blending and de-occlusion techniques. Our quantitative and qualitative evaluations on four benchmark datasets reveal that CBD surpasses existing state-of-the-art approaches in terms of 3D human pose and mesh regression accuracy, thereby establishing it as a promising solution for monocular 3D human mesh recovery in densely populated scenes.
Published in: IEEE Transactions on Multimedia ( Volume: 26)
Page(s): 2289 - 2302
Date of Publication: 12 July 2023

ISSN Information:

Funding Agency:


References

References is not available for this document.