Abstract:
We propose a novel and effective approach for 3D hand pose estimation on single depth image. Instead of doing deterministic regression from depth images, our model focuse...Show MoreMetadata
Abstract:
We propose a novel and effective approach for 3D hand pose estimation on single depth image. Instead of doing deterministic regression from depth images, our model focuses on learning a latent distribution to model the high dimensional space of pose joints, which can also be interpreted as a kinematics model for human hands. Specifically, the proposed network combines the framework of conditional variational autoencoder which learns an encoder and a decoder with standard convolutional network. The encoder models the latent variable as a prior or a regularization for the pose joints. Then probabilistic inference is performed by the decoder to generate the output prediction conditioned on input depth images. In addition, we introduce a pool-convolution module to improve the localization regression of the network. The architecture can be trained end-to-end. In experiments, we demonstrate the effectiveness of our proposed approach in comparison to various state-of-art holistic regression approaches.
Published in: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)
Date of Conference: 14-18 May 2019
Date Added to IEEE Xplore: 11 July 2019
ISBN Information: