Abstract
We address the issue of 3D face modeling and tracking from RGB-D images. Existing methods usually fit a deformable model to an RGB-D image using iterative closest point algorithm. Due to the noise and occlusion of the depth image, these methods are not robust enough. To solve this issue, we propose a method for robust 3D face modeling and tracking. For an input RGB-D face image, our method first estimates the initial head pose of a person using random forests. Then, a generic bilinear face model is fitted to the RGB-D image using iterative closest point algorithm. To improve the accuracy and robustness of face modeling, an optimal weight for each face vertex is integrated into the fitting procedure. The distances between facial landmarks are also used to better estimate facial expressions. Finally, the head pose, the identity, and expression parameters of the bilinear face model are jointly optimized. Experiments show that our method can generate accurate 3D face models from an RGB-D image or image sequence. The method can also robustly track the face even if the person is with large head rotations and various facial expressions.
Similar content being viewed by others
References
Baltrusaitis, T., Robinson, P., Morency, L.: 3D constrained local model for rigid and non-rigid facial tracking. In: CVPR (2012)
Bouaziz, S., Wang, Y., Pauly, M.: Online modeling for realtime facial animation. ACM Trans. Graphics 32(4), 40 (2013)
Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graphics 33(4), 43:1–43:10 (2014)
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graphics 20(3), 413–425 (2014)
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)
Fanelli, G., Weise, T., Gall, J., Gool, L.: Real time head pose estimation from consumer depth cameras. In: DAGM (2011)
Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3D face reconstruction and dense alignment with position map regression network. In: ECCV (2018)
Guido, B., Matteo, F., Roberto, V., Simone, C., Rita, C.: Face-from-depth for head pose estimation on depth images. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 596–609 (2020)
Guo, J., Zhu, X., Yang, Y., Yang, F., Lei, Z., Li, S.Z.: Towards fast, accurate and stable 3D dense face alignment. In: European Conference on Computer Vision, pp. 152–168 (2020)
Guo, Y., Zhang, J., Cai, J., Jiang, B., Zheng, J.: Cnn-based real-time dense face reconstruction with inverse rendered photorealistic face images. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1294–1307 (2019)
Guo, Y., Zhang, J., Cai, L., Cai, J., Zheng, J.: Self-supervised CNN for unconstrained 3D facial performance capture from an RGB-D camera. arXiv:1808.05323 [cs.CV] (2018)
Hao, Y., Zhu, H., Wu, K., Lin, X., Ma, L.: Salient points guided face alignment. Multim. Syst. 25, 475–485 (2019)
Hsieh, P., Ma, C., Yu, J., Li, H.: Unconstrained realtime facial performance capture. In: IEEE CVPR (2015)
Ichim, A.E., Bouaziz, S., Pauly, M.: Dynamic 3D avatar creation from hand-held video input. ACM Trans. Graphics 34(4), 45:1–45:14 (2015)
Jiang, L., Zhang, J., Deng, B., Li, H., Liu, L.: 3D face reconstruction with geometry details from a single image. IEEE Trans. Image Process. 27(10), 4756–4770 (2018)
Kazemi, V., Taylor, C.K.J., Kohli, P., Izadi, S.: Real-time face reconstruction from a single depth image. In: International Conference in 3D Vision (2014)
Liang, L.: Precise iterative closest point algorithm for RGB-D data registration with noise and outliers. Neurocomputing 399, 361–368 (2020)
Luo, C., Zhang, J., Yu, J., Chen, C.W., Wang, S.: Real-time head pose estimation and face modeling from a depth image, vol 2019. IEEE Trans. Multim 21(10), 2473–2481 (2019)
Meyer, G., Gupta, S., Frosio, I., Reddy, D.: Robust model-based 3D head pose estimation. In: International Conference on Computer Vision (2015)
Pham, H.X., Pavlovic, V.: Robust real-time 3D face tracking from RGBD videos under extreme pose, depth, and expression variations. In: International Conference in 3D Vision (2016)
Thomas, D.: Real-time simultaneous 3D head modeling and facial motion capture with an RGB-D camera. In: arXiv:2004.10557v1 [cs.CV] (2020)
Valle, R., Buenaposada, J., Baumela, L.: Multi-task head pose estimation in-the-wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2874–2881 (2021)
Wang, J., Zhang, J., Honda, K., Wei, J., Dang, J.: Audio-visual speech recognition integrating 3D lip information obtained from the kinect. Multim. Syst. 22, 315–323 (2016)
Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. In: Proceedings SIGGRAPH (2011)
Yu, Y., Da, F., Guo, Y.: Sparse ICP with resampling and denoising for 3D face verification. IEEE Trans. Inf. Forensics Secur. 14(7), 1917–1927 (2019)
Zhan, S., Chang, L., Zhao, J., Kurihara, T., Du, H., Tang, Y., Cheng, J.: Real-time 3D face modeling based on 3D face imaging. Neurocomputing 252, 42–48 (2017)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Zhou, Q., Park, J., Koltun, V.: Fast global registration. In: ECCV (2016)
Zhu, X., Liu, X., Lei, Z., Li, S.Z.: Face alignment in full pose range: a 3D total solution. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 78–92 (2019)
Acknowledgements
This work was supported by the capital health research and development of special (no. 2021-1G-2182), and the state key development program in 14th Five-Year (no. 2021YFF0602103).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by A. Liu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, C., Zhang, J., Bao, C. et al. Robust 3D face modeling and tracking from RGB-D images. Multimedia Systems 28, 1657–1666 (2022). https://doi.org/10.1007/s00530-022-00925-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-00925-7