Abstract
Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy.
S. Choi and H. Song—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975). https://doi.org/10.1145/361002.361007
Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18392–18402 (2023)
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Cen, J., et al.: Segment any 3D Gaussians (2024). https://arxiv.org/abs/2312.00860v1
Cen, J., et al.: Segment anything in 3D with nerfs. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Chen, X., Tang, J., Wan, D., Wang, J., Zeng, G.: Interactive segment anything nerf with feature imitation. arXiv preprint arXiv:2305.16233 (2023)
Chen, Z., Funkhouser, T., Hedman, P., Tagliasacchi, A.: MobileNERF: exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Chen, Z., Wang, F., Liu, H.: Text-to-3D using gaussian splatting. arXiv preprint arXiv:2309.16585 (2023)
Cheng, H.K., Oh, S.W., Price, B., Schwing, A., Lee, J.Y.: Tracking anything with decoupled video segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1316–1326 (2023)
Cotton, R.J., Peyton, C.: Dynamic gaussian splatting from markerless motion capture reconstruct infants movements. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pp. 60–68 (2024)
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: CVPR (2022)
Goel, R., Sirikonda, D., Saini, S., Narayanan, P.: Interactive segmentation of radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Haque, A., Tancik, M., Efros, A., Holynski, A., Kanazawa, A.: Instruct-NERF2NERF: editing 3D scenes with instructions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Jiang, Y., et al.: VR-GS: a physical dynamics-aware interactive gaussian splatting system in virtual reality. arXiv preprint arXiv:2401.16663 (2024)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: LERF: language embedded radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19729–19739 (2023)
Kim, C.M., Wu, M., Kerr, J., Goldberg, K., Tancik, M., Kanazawa, A.: GARField: group anything with radiance fields (2024)
Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4015–4026 (2023)
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4) (2017). https://doi.org/10.1145/3072959.3073599
Kobayashi, S., Matsumoto, E., Sitzmann, V.: Decomposing NERF for editing via feature field distillation. In: Advances in Neural Information Processing Systems. vol. 35 (2022). https://arxiv.org/pdf/2205.15585.pdf
Kopanas, G., Leimkühler, T., Rainer, G., Jambon, C., Drettakis, G.: Neural point catacaustics for novel-view synthesis of reflections. ACM Trans. Graph. (TOG) 41(6), 1–15 (2022)
Kopanas, G., Philip, J., Leimkühler, T., Drettakis, G.: Point-based neural rendering with per-view optimization. In: Computer Graphics Forum, vol. 40, pp. 29–43. Wiley Online Library (2021)
Ling, H., Kim, S.W., Torralba, A., Fidler, S., Kreis, K.: Align your gaussians: text-to-4D with dynamic 3D Gaussians and composed diffusion models. arXiv preprint arXiv:2312.13763 (2023)
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9298–9309 (2023)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713 (2023)
McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017). https://doi.org/10.21105/joss.00205
Mildenhall, B., Srinivasan, P.P., Ortiz-Cayon, R., Kalantari, N.K., Ramamoorthi, R., Ng, R., Kar, A.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NERF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Mirzaei, A., Aumentado-Armstrong, T., Derpanis, K.G., Kelly, J., Brubaker, M.A., Gilitschenski, I., Levinshtein, A.: Spin-NERF: multiview segmentation and perceptual inpainting with neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20669–20679 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Ren, J., Pan, L., Tang, J., Zhang, C., Cao, A., Zeng, G., Liu, Z.: DreamGaussian4D: generative 4D Gaussian splatting. arXiv preprint arXiv:2312.17142 (2023)
Ren, Z., Agarwala\(^\dagger \), A., Russell\(^\dagger \), B., Schwing\(^\dagger \), A.G., Wang\(^\dagger \), O.: Neural volumetric object selection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). (\(^\dagger \) alphabetic ordering)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Skalski, P.: Make sense (2019). https://github.com/SkalskiP/make-sense/
Song, H., Choi, S., Do, H., Lee, C., Kim, T.: Blending-NERF: text-driven localized editing in neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14383–14393 (2023)
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: generative gaussian splatting for efficient 3D content creation. arXiv preprint arXiv:2309.16653 (2023)
Tschernezki, V., Laina, I., Larlus, D., Vedaldi, A.: Neural feature fusion fields: 3D distillation of self-supervised 2D image representations. In: Proceedings of the International Conference on 3D Vision (3DV) (2022)
Wang, C., Chai, M., He, M., Chen, D., Liao, J.: Clip-NERF: text-and-image driven manipulation of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3835–3844 (2022)
Xu, L., et al.: VR-NeRF: high-fidelity virtualized walkable spaces. In: SIGGRAPH Asia Conference Proceedings (2023). https://doi.org/10.1145/3610548.3618139, https://vr-nerf.github.io
Yang, Z., Yang, H., Pan, Z., Zhu, X., Zhang, L.: Real-time photorealistic dynamic scene representation and rendering with 4D gaussian splatting. arXiv preprint arXiv:2310.10642 (2023)
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)
Ye, M., Danelljan, M., Yu, F., Ke, L.: Gaussian grouping: segment and edit anything in 3D scenes. arXiv preprint arXiv:2312.00732 (2023)
Yen-Chen, L., Florence, P., Barron, J.T., Lin, T.Y., Rodriguez, A., Isola, P.: NeRF-Supervision: learning dense object descriptors from neural radiance fields. In: IEEE Conference on Robotics and Automation (ICRA) (2022)
Yi, T., et al.: GaussianDreamer: fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529 (2023)
Yifan, W., Serena, F., Wu, S., Öztireli, C., Sorkine-Hornung, O.: Differentiable surface splatting for point-based geometry processing. ACM Trans. Graph. (TOG) 38(6), 1–14 (2019)
Ying, H., et al.: OmniSeg3D: Omniversal 3D segmentation via hierarchical contrastive learning (2023)
Zhou, S., et al.: Feature 3DGS: supercharging 3D gaussian splatting to enable distilled feature fields. arXiv preprint arXiv:2312.03203 (2023)
Zielonka, W., Bagautdinov, T., Saito, S., Zollhöfer, M., Thies, J., Romero, J.: Drivable 3D Gaussian avatars. arXiv preprint arXiv:2311.08581 (2023)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Choi, S., Song, H., Kim, J., Kim, T., Do, H. (2025). Click-Gaussian: Interactive Segmentation to Any 3D Gaussians. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-72646-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72645-3
Online ISBN: 978-3-031-72646-0
eBook Packages: Computer ScienceComputer Science (R0)