Skip to main content

An Optimization Framework to Enforce Multi-view Consistency for Texturing 3D Meshes

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15094))

Included in the following conference series:

Abstract

A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlapping regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively. Project page: https://aigc3d.github.io/ConsistenTex.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cao, T., Kreis, K., Fidler, S., Sharp, N., Yin, K.: Texfusion: synthesizing 3d textures with text-guided image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4169–4181 (2023)

    Google Scholar 

  2. Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2tex: text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396 (2023)

  3. Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: disentangling geometry and appearance for high-quality text-to-3d content creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  4. Chen, Y., Chen, R., Lei, J., Zhang, Y., Jia, K.: Tango: text-driven photorealistic and robust 3D stylization via lighting decomposition (2022)

    Google Scholar 

  5. Christie, M., Olivier, P., Normand, J.: Camera control in computer graphics. Comput. Graph. Forum 27(8), 2197–2218 (2008). https://doi.org/10.1111/j.1467-8659.2008.01181.x

  6. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002). https://doi.org/10.1109/34.1000236

  7. Deitke, M., et al.: Objaverse: a universe of annotated 3D objects (2022)

    Google Scholar 

  8. Deng, K., et al.: Flashtex: fast relightable mesh texturing with lightcontrolnet (2024)

    Google Scholar 

  9. Dong, Y., et al.: Gpld3d: latent diffusion of 3d shape generative models by enforcing geometric and physical priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 56–66 (2024)

    Google Scholar 

  10. Dutagaci, H., Cheung, C.P., Godil, A.: A benchmark for best view selection of 3d objects. In: Proceedings of the ACM Workshop on 3D Object Retrieval (3DOR 2010), pp. 45–50. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1877808.1877819

  11. Efros, A.A., Freeman, W.T.: Image Quilting for Texture Synthesis and Transfer, 1st edn. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3596711.3596771

  12. Guo, Y., et al.: Decorate3d: text-driven high-quality texture generation for mesh decoration in the wild. In: Thirty-Seventh Conference on Neural Information Processing Systems (NeurIPS) (2023)

    Google Scholar 

  13. Hamdi, A., Giancola, S., Ghanem, B.: MVTN: multi-view transformation network for 3d shape recognition. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, 10–17 October 2021, pp. 1–11. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00007

  14. Hamdi, A., Giancola, S., Ghanem, B.: Voint cloud: multi-view point cloud representation for 3d understanding. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, 1–5 May 2023. OpenReview.net (2023). https://openreview.net/pdf?id=IpGgfpMucHj

  15. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  16. Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, 18–22 June 2018, pp. 5010–5019. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00526

  17. Kappes, J.H., et al.: A comparative study of modern inference techniques for structured discrete energy minimization problems. Int. J. Comput. Vision 1–30 (2015). https://doi.org/10.1007/s11263-015-0809-x

  18. Kim, S., Tai, Y., Lee, J., Park, J., Kweon, I.S.: Category-specific salient view selection via deep convolutional neural networks. Comput. Graph. Forum 36(8), 313–328 (2017). https://doi.org/10.1111/cgf.13082

  19. Knodt, J., Gao, X.: Consistent latent diffusion for mesh texturing (2023)

    Google Scholar 

  20. Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006). https://doi.org/10.1109/TPAMI.2006.200

  21. Kundu, A., et al.: Virtual multi-view fusion for 3d semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) ECCV 2020, Part XXIV. LNCS, vol. 12369, pp. 518–535. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_31

  22. Lee, C.H., Varshney, A., Jacobs, D.W.: Mesh saliency. In: ACM SIGGRAPH 2005 Papers (SIGGRAPH 2005), pp. 659–666. Association for Computing Machinery, New York (2005). https://doi.org/10.1145/1186822.1073244

  23. Leifman, G., Shtrom, E., Tal, A.: Surface regions of interest for viewpoint selection. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2544–2556 (2016). https://doi.org/10.1109/TPAMI.2016.2522437

  24. Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011). https://doi.org/10.1109/TPAMI.2010.147

  25. Liu, R., Wu, R., Hoorick, B.V., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328 (2023)

  26. Liu, Y., et al.: Syncdreamer: generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453 (2023)

  27. Liu, Y., Xie, M., Liu, H., Wong, T.T.: Text-guided texturing by synchronized multi-view diffusion. arXiv preprint arXiv:2311.12891 (2023)

  28. Metzer, G., Richardson, E., Patashnik, O., Giryes, R., Cohen-Or, D.: Latent-nerf for shape-guided generation of 3d shapes and textures. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, 17–24 June 2023, pp. 12663–12673. IEEE (2023). https://doi.org/10.1109/CVPR52729.2023.01218

  29. Michel, O., Bar-On, R., Liu, R., Benaim, S., Hanocka, R.: Text2mesh: text-driven neural stylization for meshes. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022.01313

  30. Mohammad Khalid, N., Xie, T., Belilovsky, E., Popa, T.: Clip-mesh: generating textured meshes from text using pretrained image-text models. In: SIGGRAPH Asia 2022 Conference Papers (2022). https://doi.org/10.1145/3550469.3555392

  31. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, 1–5 May 2023. OpenReview.net (2023). https://openreview.net/pdf?id=FjNys5c7VyY

  32. Qiu, L., et al.: Richdreamer: a generalizable normal-depth diffusion model for detail richness in text-to-3d. arXiv preprint arXiv:2311.16918 (2023)

  33. Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: Texture: text-guided texturing of 3d shapes. In: ACM SIGGRAPH 2023 Conference Proceedings (SIGGRAPH 2023). Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3588432.3591503

  34. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022.01042

  35. Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models. In: NeurIPS (2022). http://papers.nips.cc/paper_files/paper/2022/hash/a1859debfb3b59d094f3504d5ebb6c25-Abstract-Datasets_and_Benchmarks.html

  36. Secord, A., Lu, J., Finkelstein, A., Singh, M., Nealen, A.: Perceptual models of viewpoint preference. ACM Trans. Graph. 30(5), 1–12 (2011). https://doi.org/10.1145/2019627.2019628

  37. . Sederberg, T.W., Parry, S.R.: Free-form deformation of solid geometric models. SIGGRAPH Comput. Graph. 20(4), 151–160 (1986). https://doi.org/10.1145/15886.15903

  38. Shi, R., et al.: Zero123++: a single image to consistent multi-view diffusion base model

    Google Scholar 

  39. Shi, Y., Wang, P., Ye, J., Long, M., Li, K., Yang, X.: Mvdream: multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512 (2023)

  40. Soltani, A.A., Huang, H., Wu, J., Kulkarni, T.D., Tenenbaum, J.B.: Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, 21–26 July 2017, pp. 2511–2519. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.269

  41. Song, R., Zhang, W., Zhao, Y., Liu, Y.: Unsupervised multi-view CNN for salient view selection and 3d interest point detection. Int. J. Comput. Vis. 130(5), 1210–1227 (2022). https://doi.org/10.1007/s11263-022-01592-x

  42. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, 7–13 December 2015, pp. 945–953. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.114

  43. Sun, Y., Huang, Q., Hsiao, D., Guan, L., Hua, G.: Learning view selection for 3d scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, 19–25 June 2021, pp. 14464–14473. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.01423

  44. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D models from single images with a convolutional network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 322–337. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_20

    Chapter  Google Scholar 

  45. Tsalicoglou, C., Manhardt, F., Tonioni, A., Niemeyer, M., Tombari, F.: Textmesh: generation of realistic 3D meshes from text prompts (2023)

    Google Scholar 

  46. Waechter, M., Moehrle, N., Goesele, M.: Let there be color! Large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_54

    Chapter  Google Scholar 

  47. Wei, X., Yu, R., Sun, J.: View-GCN: view-based graph convolutional network for 3d shape analysis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, 13–19 June 2020, pp. 1847–1856. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00192

  48. Weng, H., et al.: Consistent123: improve consistency for one image to 3d object synthesis. arXiv preprint arXiv:2310.08092 (2023)

  49. Xu, Y., et al.: DMV3D: denoising multi-view diffusion using 3d large reconstruction model (2023)

    Google Scholar 

  50. Ye, J., Wang, P., Li, K., Shi, Y., Wang, H.: Consistent-1-to-3: consistent image to 3D view synthesis via geometry-aware diffusion models (2023)

    Google Scholar 

  51. Youwang, K., Oh, T.H., Pons-Moll, G.: Paint-it: text-to-texture synthesis via deep convolutional texture map optimization and physically-based rendering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

    Google Scholar 

  52. Yu, X., Dai, P., Li, W., Ma, L., Liu, Z., Qi, X.: Texture generation on 3d meshes with point-UV diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4206–4216 (2023)

    Google Scholar 

  53. Zeng, X., et al.: Paint3d: paint anything 3d with lighting-less texture diffusion models (2023)

    Google Scholar 

  54. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3836–3847 (2023)

    Google Scholar 

  55. Zuo, Q., et al.: Videomv: consistent multi-view generation based on large video generative model (2024)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zilong Dong or Qixing Huang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 28603 KB)

Supplementary material 2 (mp4 15916 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Z. et al. (2025). An Optimization Framework to Enforce Multi-view Consistency for Texturing 3D Meshes. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15094. Springer, Cham. https://doi.org/10.1007/978-3-031-72764-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72764-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72763-4

  • Online ISBN: 978-3-031-72764-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics