Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

Choi, Seokhun; Song, Hyeonseop; Kim, Jaechul; Kim, Taehyeong; Do, Hoseok

doi:10.1007/978-3-031-72646-0_17

Seokhun Choi¹³,
Hyeonseop Song¹³,
Jaechul Kim¹³,
Taehyeong Kim¹⁴ &
…
Hoseok Do¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15061))

Included in the following conference series:

European Conference on Computer Vision

413 Accesses

Abstract

Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy.

S. Choi and H. Song—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Contrastive Gaussian Clustering for Weakly Supervised 3D Scene Segmentation

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

References

Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975). https://doi.org/10.1145/361002.361007
Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18392–18402 (2023)
Google Scholar
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Cen, J., et al.: Segment any 3D Gaussians (2024). https://arxiv.org/abs/2312.00860v1
Cen, J., et al.: Segment anything in 3D with nerfs. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Chen, X., Tang, J., Wan, D., Wang, J., Zeng, G.: Interactive segment anything nerf with feature imitation. arXiv preprint arXiv:2305.16233 (2023)
Chen, Z., Funkhouser, T., Hedman, P., Tagliasacchi, A.: MobileNERF: exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Chen, Z., Wang, F., Liu, H.: Text-to-3D using gaussian splatting. arXiv preprint arXiv:2309.16585 (2023)
Cheng, H.K., Oh, S.W., Price, B., Schwing, A., Lee, J.Y.: Tracking anything with decoupled video segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1316–1326 (2023)
Google Scholar
Cotton, R.J., Peyton, C.: Dynamic gaussian splatting from markerless motion capture reconstruct infants movements. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pp. 60–68 (2024)
Google Scholar
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: CVPR (2022)
Google Scholar
Goel, R., Sirikonda, D., Saini, S., Narayanan, P.: Interactive segmentation of radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Haque, A., Tancik, M., Efros, A., Holynski, A., Kanazawa, A.: Instruct-NERF2NERF: editing 3D scenes with instructions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Google Scholar
Jiang, Y., et al.: VR-GS: a physical dynamics-aware interactive gaussian splatting system in virtual reality. arXiv preprint arXiv:2401.16663 (2024)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: LERF: language embedded radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19729–19739 (2023)
Google Scholar
Kim, C.M., Wu, M., Kerr, J., Goldberg, K., Tancik, M., Kanazawa, A.: GARField: group anything with radiance fields (2024)
Google Scholar
Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4015–4026 (2023)
Google Scholar
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4) (2017). https://doi.org/10.1145/3072959.3073599
Kobayashi, S., Matsumoto, E., Sitzmann, V.: Decomposing NERF for editing via feature field distillation. In: Advances in Neural Information Processing Systems. vol. 35 (2022). https://arxiv.org/pdf/2205.15585.pdf
Kopanas, G., Leimkühler, T., Rainer, G., Jambon, C., Drettakis, G.: Neural point catacaustics for novel-view synthesis of reflections. ACM Trans. Graph. (TOG) 41(6), 1–15 (2022)
Article Google Scholar
Kopanas, G., Philip, J., Leimkühler, T., Drettakis, G.: Point-based neural rendering with per-view optimization. In: Computer Graphics Forum, vol. 40, pp. 29–43. Wiley Online Library (2021)
Google Scholar
Ling, H., Kim, S.W., Torralba, A., Fidler, S., Kreis, K.: Align your gaussians: text-to-4D with dynamic 3D Gaussians and composed diffusion models. arXiv preprint arXiv:2312.13763 (2023)
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9298–9309 (2023)
Google Scholar
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713 (2023)
McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017). https://doi.org/10.21105/joss.00205
Mildenhall, B., Srinivasan, P.P., Ortiz-Cayon, R., Kalantari, N.K., Ramamoorthi, R., Ng, R., Kar, A.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)
Article Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NERF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Mirzaei, A., Aumentado-Armstrong, T., Derpanis, K.G., Kelly, J., Brubaker, M.A., Gilitschenski, I., Levinshtein, A.: Spin-NERF: multiview segmentation and perceptual inpainting with neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20669–20679 (2023)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ren, J., Pan, L., Tang, J., Zhang, C., Cao, A., Zeng, G., Liu, Z.: DreamGaussian4D: generative 4D Gaussian splatting. arXiv preprint arXiv:2312.17142 (2023)
Ren, Z., Agarwala$^\dagger $, A., Russell$^\dagger $, B., Schwing$^\dagger $, A.G., Wang$^\dagger $, O.: Neural volumetric object selection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). ($^\dagger $ alphabetic ordering)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Skalski, P.: Make sense (2019). https://github.com/SkalskiP/make-sense/
Song, H., Choi, S., Do, H., Lee, C., Kim, T.: Blending-NERF: text-driven localized editing in neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14383–14393 (2023)
Google Scholar
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: generative gaussian splatting for efficient 3D content creation. arXiv preprint arXiv:2309.16653 (2023)
Tschernezki, V., Laina, I., Larlus, D., Vedaldi, A.: Neural feature fusion fields: 3D distillation of self-supervised 2D image representations. In: Proceedings of the International Conference on 3D Vision (3DV) (2022)
Google Scholar
Wang, C., Chai, M., He, M., Chen, D., Liao, J.: Clip-NERF: text-and-image driven manipulation of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3835–3844 (2022)
Google Scholar
Xu, L., et al.: VR-NeRF: high-fidelity virtualized walkable spaces. In: SIGGRAPH Asia Conference Proceedings (2023). https://doi.org/10.1145/3610548.3618139, https://vr-nerf.github.io
Yang, Z., Yang, H., Pan, Z., Zhu, X., Zhang, L.: Real-time photorealistic dynamic scene representation and rendering with 4D gaussian splatting. arXiv preprint arXiv:2310.10642 (2023)
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)
Ye, M., Danelljan, M., Yu, F., Ke, L.: Gaussian grouping: segment and edit anything in 3D scenes. arXiv preprint arXiv:2312.00732 (2023)
Yen-Chen, L., Florence, P., Barron, J.T., Lin, T.Y., Rodriguez, A., Isola, P.: NeRF-Supervision: learning dense object descriptors from neural radiance fields. In: IEEE Conference on Robotics and Automation (ICRA) (2022)
Google Scholar
Yi, T., et al.: GaussianDreamer: fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529 (2023)
Yifan, W., Serena, F., Wu, S., Öztireli, C., Sorkine-Hornung, O.: Differentiable surface splatting for point-based geometry processing. ACM Trans. Graph. (TOG) 38(6), 1–14 (2019)
Article Google Scholar
Ying, H., et al.: OmniSeg3D: Omniversal 3D segmentation via hierarchical contrastive learning (2023)
Google Scholar
Zhou, S., et al.: Feature 3DGS: supercharging 3D gaussian splatting to enable distilled feature fields. arXiv preprint arXiv:2312.03203 (2023)
Zielonka, W., Bagautdinov, T., Saito, S., Zollhöfer, M., Thies, J., Romero, J.: Drivable 3D Gaussian avatars. arXiv preprint arXiv:2311.08581 (2023)

Download references

Author information

Authors and Affiliations

AI Lab, CTO Division, LG Electronics, Seoul, Republic of Korea
Seokhun Choi, Hyeonseop Song, Jaechul Kim & Hoseok Do
Department of Biosystems Engineering, Seoul National University, Seoul, Republic of Korea
Taehyeong Kim

Authors

Seokhun Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hyeonseop Song
View author publications
You can also search for this author in PubMed Google Scholar
Jaechul Kim
View author publications
You can also search for this author in PubMed Google Scholar
Taehyeong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hoseok Do
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Taehyeong Kim or Hoseok Do .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15590 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, S., Song, H., Kim, J., Kim, T., Do, H. (2025). Click-Gaussian: Interactive Segmentation to Any 3D Gaussians. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-72646-0_17
Published: 28 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72645-3
Online ISBN: 978-3-031-72646-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians