VERTEX: VEhicle Reconstruction and TEXture Estimation from a Single Image Using Deep Implicit Semantic Template Mapping

Zhao, Xiaochen; Zheng, Zerong; Ji, Chaonan; Liu, Zhenyi; Lin, Siyou; Yu, Tao; Suo, Jinli; Liu, Yebin

doi:10.1007/978-3-031-20497-5_52

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1666 Accesses

Abstract

We introduce VERTEX, an effective solution to recovering the 3D shape and texture of vehicles from uncalibrated monocular inputs under real-world street environments. To fully utilize the semantic prior of vehicles, we propose a novel geometry and texture joint representation based on implicit semantic template mapping. Compared to existing representations which infer 3D texture fields, our method explicitly constrains the texture distribution on the 2D surface of the template and avoids the limitation of fixed topology. Moreover, we propose a joint training strategy that leverages the texture distribution to learn a semantic-preserving mapping from vehicle instances to the canonical template. We also contribute a new synthetic dataset containing 830 elaborately textured car models labeled with key points and rendered using Physically Based Rendering (PBRT) system with measured HDRI skymaps to obtain highly realistic images. Experiments demonstrate the superior performance of our approach on both testing dataset and in-the-wild images. Furthermore, the presented technique enables additional applications such as 3D vehicle texture transfer and material identification, and can be generalized to other shape categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-Aware Supervision

EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving: Supplementary Materials

Notes

1.
https://github.com/nicolas-gervais/predicting-car-price-from-scraped-data/tree/master/picture-scraper.

References

Beker, D., et al.: Monocular differentiable rendering for self-supervised 3D object detection (2020)
Google Scholar
Carr, J.C., Beatson, R.K., Cherrie, J.B., Mitchell, T.J., Evans, T.R.: Reconstruction and representation of 3D objects with radial basis functions. In: Computer Graphics (2001)
Google Scholar
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5799–5809 (2021)
Google Scholar
Chang, A.X., et al.: An information-rich 3D model repository. Comput. Sci. (2015)
Google Scholar
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: A Unified Approach for Single and Multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)
Google Scholar
Deng, Y., Yang, J., Tong, X.: Deformed implicit field: Modeling 3D shapes with learned dense correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10286–10296 (2021)
Google Scholar
Deng, Y., Yang, J., Xiang, J., Tong, X.: Gram: generative radiance manifolds for 3D-aware image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10673–10683 (2022)
Google Scholar
Goel, S., Kanazawa, A., Malik, J.: Shape and viewpoint without keypoints. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 88–104. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_6
Chapter Google Scholar
Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric regularization for learning shapes. arXiv:2002.10099 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Henderson, P., Tsiminaki, V., Lampert, C.: Leveraging 2D data to learn textured 3D mesh generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Google Scholar
Kaiming, H., Georgia, G., Piotr, D., Ross, G.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell, pp. 1–1 (2017)
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13). Sydney, Australia (2013)
Google Scholar
Lalonde, J.F,et al.: The Laval HDR sky database. http://sky.hdrdb.com (2016)
Li, W., et al.: AADS: Augmented autonomous driving simulation using data-driven algorithms. Science Robotics 4 (2019)
Google Scholar
Meng, D., et al.: Parsing-based view-aware embedding network for vehicle re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) June 2020
Google Scholar
Menze, M., Heipke, C., Geiger, A.: Object scene flow. ISPRS J. Photogrammetry Remote Sens.(JPRS) (2018)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Miao, H., Lu, F., Liu, Z., Zhang, L., Manocha, D., Zhou, B.: Robust 2D/3D vehicle parsing in CVIS (2021)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked Hourglass Networks for Human Pose Estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Niemeyer, M., Geiger, A.: Giraffe: representing scenes as compositional generative neural feature fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11453–11464 (2021)
Google Scholar
Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A.: Texture fields: learning texture representations in function space. In: Proceedings IEEE International Conf. on Computer Vision (ICCV) (2019)
Google Scholar
Park, E., Yang, J., Yumer, E., Ceylan, D., Berg, A.C.: Transformation-grounded image generation network for novel 3D view synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: Learning continuous signed distance functions for shape representation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) June 2019
Google Scholar
Pharr, M., Jakob, W., Humphreys, G.: Physically based rendering: from theory to implementation. Morgan Kaufmann (2016)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: PIFU: pixel-aligned implicit function for high-resolution clothed human digitization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Shen, C., O”Brien, J.F., Shewchuk, J.R.: Interpolating and approximating implicit surfaces from polygon soup. ACM Trans. Graph. 23(3), pp. 896–904 (2004)https://doi.org/10.1145/1186562.1015816
Sun, Y., Liu, Z., Wang, Y., Sarma, S.E.: Im2avatar: colorful 3D reconstruction from a single image (2018)
Google Scholar
Turk, G., O’Brien, J.F.: Modelling with implicit surfaces that interpolate. ACM Trans. Graph. 21(4), 855–873 (2002)
Article Google Scholar
Wang, P., Huang, X., Cheng, X., Zhou, D., Geng, Q., Yang, R.: The apolloscape open dataset for autonomous driving and its application. IEEE Trans. pattern. Anal. Mach. Intell (2019)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3D-aware image synthesis via learning structural and textural representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18430–18439 (2022)
Google Scholar
Zheng, Z., Yu, T., Dai, Q., Liu, Y.: Deep implicit templates for 3D shape representation (2020)
Google Scholar
Zhu, J.Y., et al.: Visual object networks: Image generation with disentangled 3D representations. In: Advances in Neural Information Processing Systems 31 (2018)
Google Scholar

Download references

Acknowledgements

This paper is supported by the National Key Research and Development Program of China [2018YFB2100500].

Author information

Authors and Affiliations

Tsinghua University, Beijing, 100084, China
Xiaochen Zhao, Zerong Zheng, Chaonan Ji, Siyou Lin, Tao Yu, Jinli Suo & Yebin Liu
Jilin University, Changchun, 130015, China
Zhenyi Liu

Authors

Xiaochen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zerong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Chaonan Ji
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Siyou Lin
View author publications
You can also search for this author in PubMed Google Scholar
Tao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jinli Suo
View author publications
You can also search for this author in PubMed Google Scholar
Yebin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yebin Liu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2072 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, X. et al. (2022). VERTEX: VEhicle Reconstruction and TEXture Estimation from a Single Image Using Deep Implicit Semantic Template Mapping. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_52
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

VERTEX: VEhicle Reconstruction and TEXture Estimation from a Single Image Using Deep Implicit Semantic Template Mapping