ABSTRACT
While existing face normal estimation methods have produced promising results on small datasets, they often suffer from severe performance degradation on diverse in-the-wild face images, especially for the high-fidelity face normal estimation. Training a high-fidelity face normal estimation model with generalization capability requires a large amount of training data with face normal ground truth. Since collecting such high-fidelity database is difficult in practice, which prevents current methods from recovering face normal with fine-grained geometric details. To mitigate this issue, we propose a coarse-to-fine framework to estimate face normal from an in-the-wild image with only a coarse exemplar reference. Specifically, we first train a model using limited training data to exploit the coarse normal of a real face image. Then, we leverage the estimated coarse normal as an exemplar and devise an exemplar-based normal estimation network to explore robust mapping from the input face image to the fine-grained normal. In this manner, our method can largely alleviate the negative impact caused by lacking training data, and focus on exploring the high-fidelity normal contained in natural images. Extensive experiments and ablation studies are conducted to demonstrate the efficacy of our design, and reveal its superiority over state-of-the-art methods in terms of both training data requirement and recovery quality of fine-grained face normal. Our code is available at \urlhttps://github.com/AutoHDR/HFFNE.
Supplemental Material
- Victoria Fernández Abrevaya, Adnane Boukhayma, Philip HS Torr, and Edmond Boyer. 2020. Cross-modal deep face normals with deactivable skip connections. In CVPR. 4979--4989.Google Scholar
- Andrew D Bagdanov, Alberto Del Bimbo, and Iacopo Masi. 2011. The florence 2d/3d hybrid face dataset. In Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding. 79--80.Google ScholarDigital Library
- Aayush Bansal, Bryan Russell, and Abhinav Gupta. 2016. Marr revisited: 2d-3d alignment via surface normal prediction. In CVPR. 5965--5974.Google Scholar
- Jonathan T Barron and Jitendra Malik. 2011. High-frequency shape and albedo from shading using natural image statistics. In CVPR. IEEE, 2521--2528.Google Scholar
- Jonathan T Barron and Jitendra Malik. 2012. Shape, albedo, and illumination from a single image of an unknown object. In CVPR. IEEE, 334--341.Google Scholar
- Jonathan T Barron and Jitendra Malik. 2014. Shape, illumination, and reflectance from shading. TPAMI, Vol. 37, 8 (2014), 1670--1687.Google ScholarDigital Library
- Chaofeng Chen, Dihong Gong, Hao Wang, Zhifeng Li, and Kwan-Yee K. Wong. 2020. Learning Spatial Attention for Face Super-Resolution. TIP.Google Scholar
- Zezhou Cheng, Qingxiong Yang, and Bin Sheng. 2015. Deep colorization. In ICCV. 415--423.Google Scholar
- Nikolai Chinaev, Alexander Chigorin, and Ivan Laptev. 2018. Mobileface: 3D face reconstruction with efficient cnn regression. In ECCVW. 0--0.Google Scholar
- Cheng Deng, Erkun Yang, Tongliang Liu, Jie Li, Wei Liu, and Dacheng Tao. 2019. Unsupervised semantic-preserving adversarial hashing for image search. TIP, Vol. 28, 8 (2019), 4032--4044.Google ScholarCross Ref
- Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, Min Jin Chong, and David Forsyth. 2017. Learning diverse image colorization. In CVPR. 6837--6845.Google Scholar
- Berk Dogan, Shuhang Gu, and Radu Timofte. 2019. Exemplar guided face image super-resolution without facial landmarks. In CVPRW. 0--0.Google Scholar
- Ady Ecker and Allan D Jepson. 2010. Polynomial shape from shading. In CVPR. IEEE, 145--152.Google Scholar
- Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3d face reconstruction and dense alignment with position map regression network. In ECCV. 534--551.Google Scholar
- Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In CVPR. 2414--2423.Google Scholar
- Mingming He, Dongdong Chen, Jing Liao, Pedro V Sander, and Lu Yuan. 2018. Deep exemplar-based colorization. TOG, Vol. 37, 4 (2018), 1--16.Google Scholar
- Berthold KP Horn. 1975. Obtaining shape from shading information. The psychology of computer vision (1975), 115--155.Google ScholarDigital Library
- Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. 2015. Single image super-resolution from transformed self-exemplars. In CVPR. 5197--5206.Google Scholar
- Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV. 1501--1510.Google Scholar
- Xinyu Huang, Jizhou Gao, Liang Wang, and Ruigang Yang. 2007. Examplar-based shape from shading. In 3DIM. IEEE, 349--356.Google Scholar
- Peter J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 492--518.Google Scholar
- Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. TOG), Vol. 36, 4 (2017), 1--14.Google ScholarDigital Library
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR. 1125--1134.Google Scholar
- Aaron S Jackson, Adrian Bulat, Vasileios Argyriou, and Georgios Tzimiropoulos. 2017. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In ICCV. 1031--1039.Google Scholar
- David W Jacobs and Ronen Basri. 2005. Lambertian reflectance and linear subspaces. US Patent 6,853,745.Google Scholar
- Wonjong Jang, Gwangjin Ju, Yucheol Jung, Jiaolong Yang, Xin Tong, and Seungyong Lee. 2021. StyleCariGAN: caricature generation via StyleGAN feature map modulation. TOG, Vol. 40, 4 (2021), 1--16.Google ScholarDigital Library
- Taewon Kang. 2021. Multiple GAN Inversion for Exemplar-based Image-to-Image Translation. In ICCV. 3515--3522.Google Scholar
- Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. 4401--4410.Google Scholar
- Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In CVPR. 8110--8119.Google Scholar
- Diederik P Kingma and Ba J Adam. 2020. A method for stochastic optimization. arXiv preprint arXiv: 14126980. 2014. Cited on (2020), 50.Google Scholar
- Iasonas Kokkinos. 2017. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR. 6129--6138.Google Scholar
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In ICCV. 3730--3738.Google Scholar
- Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. 2017. Deep photo style transfer. In CVPR. 4990--4998.Google Scholar
- Wan-Chun Ma, Tim Hawkins, Pieter Peers, Charles-Felix Chabert, Malte Weiss, Paul E Debevec, et al. 2007. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. Rendering Techniques , Vol. 2007, 9 (2007), 10.Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. NeurIPS , Vol. 32 (2019).Google Scholar
- Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In CVPR. 2536--2544.Google Scholar
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 234--241.Google Scholar
- Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In ICCVW. 397--403.Google Scholar
- Matan Sela, Elad Richardson, and Ron Kimmel. 2017. Unrestricted facial geometry reconstruction using image-to-image translation. In ICCV. 1576--1585.Google Scholar
- Soumyadip Sengupta, Angjoo Kanazawa, Carlos D Castillo, and David W Jacobs. 2018. Sfsnet: Learning shape, reflectance and illuminance of facesin the wild'. In CVPR. 6296--6305.Google Scholar
- Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, and Dimitris Samaras. 2017. Neural face editing with intrinsic image disentangling. In CVPR. 5541--5550.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Anh Tuan Tran, Tal Hassner, Iacopo Masi, Eran Paz, Yuval Nirkin, and Gerard Medioni. 2018. Extreme 3d face reconstruction: Seeing through occlusions. In CVPR. 3935--3944.Google Scholar
- Luan Tran, Feng Liu, and Xiaoming Liu. 2019. Towards high-fidelity nonlinear 3D face morphable model. In CVPR. 1126--1135.Google Scholar
- Luan Tran and Xiaoming Liu. 2018. Nonlinear 3d face morphable model. In CVPR. 7346--7355.Google Scholar
- Luan Tran and Xiaoming Liu. 2019. On learning 3d face morphable model from in-the-wild images. TPAMI, Vol. 43, 1 (2019), 157--171.Google Scholar
- George Trigeorgis, Patrick Snape, Iasonas Kokkinos, and Stefanos Zafeiriou. 2017. Face normals" in-the-wild" using fully convolutional networks. In CVPR. 38--47.Google Scholar
- Anh Tuan Tran, Tal Hassner, Iacopo Masi, and Gérard Medioni. 2017. Regressing robust and discriminative 3D morphable models with a very deep neural network. In CVPR. 5163--5172.Google Scholar
- Ying Xiong, Ayan Chakrabarti, Ronen Basri, Steven J Gortler, David W Jacobs, and Todd Zickler. 2014. From shading to local shape. TPAMI, Vol. 37, 1 (2014), 67--79.Google ScholarCross Ref
- Zhongyou Xu, Tingting Wang, Faming Fang, Yun Sheng, and Guixu Zhang. 2020. Stylization-based architecture for fast deep exemplar colorization. In CVPR. 9363--9372.Google Scholar
- Dawei Yang and Jia Deng. 2018. Shape from shading through shape evolution. In CVPR. 3781--3790.Google Scholar
- Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao, and Xinbo Gao. 2017. Pairwise relationship guided deep hashing for cross-modal retrieval. In AAAI, Vol. 31.Google ScholarCross Ref
- Raymond A Yeh, Chen Chen, Teck Yian Lim, Alexander G Schwing, Mark Hasegawa-Johnson, and Minh N Do. 2017. Semantic image inpainting with deep generative models. In CVPR. 5485--5493.Google Scholar
- Baosheng Yu and Dacheng Tao. 2021. Heatmap Regression via Randomized Rounding. TPAMI (2021).Google Scholar
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In CVPR. 5505--5514.Google Scholar
- Stefanos Zafeiriou, Mark Hansen, Gary Atkinson, Vasileios Argyriou, Maria Petrou, Melvyn Smith, and Lyndon Smith. 2011. The photoface database. In CVPRW. IEEE, 132--139.Google Scholar
- Fangneng Zhan, Yingchen Yu, Kaiwen Cui, Gongjie Zhang, Shijian Lu, Jianxiong Pan, Changgong Zhang, Feiying Ma, Xuansong Xie, and Chunyan Miao. 2021. Unbalanced feature transport for exemplar-based image translation. In CVPR. 15028--15038.Google Scholar
- Bo Zhang, Mingming He, Jing Liao, Pedro V Sander, Lu Yuan, Amine Bermak, and Dong Chen. 2019. Deep exemplar-based video colorization. In CVPR. 8052--8061.Google Scholar
- Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, and Fang Wen. 2020. Cross-domain correspondence learning for exemplar-based image translation. In CVPR. 5143--5153.Google Scholar
- Zhenyu Zhang, Yanhao Ge, Renwang Chen, Ying Tai, Yan Yan, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2021. Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection. In CVPR. 14214--14224.Google Scholar
- Hengyuan Zhao, Wenhao Wu, Yihao Liu, and Dongliang He. 2021. Color2Embed: Fast Exemplar-Based Image Colorization using Color Embeddings. arXiv preprint arXiv:2106.08017 (2021).Google Scholar
- Haitian Zheng, Minghao Guo, Haoqian Wang, Yebin Liu, and Lu Fang. 2017. Combining exemplar-based approach and learning-based approach for light field super-resolution using a hybrid imaging system. In ICCVW. 2481--2486.Google Scholar
- Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z Li. 2016. Face alignment across large poses: A 3d solution. In CVPR. 146--155.ioGoogle Scholar
Index Terms
- Towards High-Fidelity Face Normal Estimation
Recommendations
Towards High Fidelity Face Frontalization in the Wild
AbstractFace frontalization refers to the process of synthesizing the frontal view of a face from a given profile. Due to self-occlusion and appearance distortion in the wild, it is extremely challenging to recover faithful high-resolution results ...
Probabilistic recognition of human faces from video
Special issue on Face recognitionRecognition of human faces using a gallery of still or video images and a probe set of videos is systematically investigated using a probabilistic framework. In still-to-video recognition, where the gallery consists of still images, a time series state ...
Towards Age-Invariant Face Recognition
Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages remains a big challenge. The appearance of a human face changes substantially over time, resulting in significant intra-class variations. As ...
Comments