Improving Dynamic 3D Gaussian Splatting from Monocular Videos with Object Motion Information

Luo, Yixin; Huang, Zhangjin; Huang, Xudong

doi:10.1007/978-981-97-5612-4_8

Yixin Luo^10,11,
Zhangjin Huang^10,11 &
Xudong Huang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14872))

Included in the following conference series:

International Conference on Intelligent Computing

867 Accesses

Abstract

Despite the significant advancements achieved by recent 3D-Gaussian-based approaches in dynamic scene reconstruction, their efficacy is markedly diminished in monocular settings, particularly under conditions of rapid object motion. This issue arises from the inherent one-to-many mapping between monocular video and the dynamic scene, i.e., discerning precise object motion states from a monocular video is challenging while varying motion states may correspond to distinct scenes. To alleviate the issue, firstly, we explicitly extract the object motion states information from the monocular video with a pretrained video tracking model, TAM, and then separate 3D Gaussians into static and dynamic subsets based on such motion states information. Secondly, we present a three-stage training strategy to optimize 3D Gaussian across distinct motion states. Moreover, we introduce an innovative augmentation technique that provides augment views for supervising 3D Gaussians, thereby enriching the model with more multi-view information, pivotal for accurate interpretation of motion states. Our empirical evaluations on Nvidia and iPhone, two of the most challenging monocular datasets, demonstrates our method's superior reconstruction capabilities over other dynamic Gaussian models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue

Article 19 July 2022

Unsupervised Scale-Consistent Depth Learning from Video

Article 18 June 2021

From Synthetic to One-Shot Regression of Camera-Agnostic Human Performances

References

Agarwal, A., Arora, C.: Attention attention everywhere: monocular depth prediction with skip attention. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5861–5870 (2023)
Google Scholar
Bhat, S.F., Alhashim, I., Wonka, P.: AdaBins: depth estimation using adaptive bins. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009–4018 (2021)
Google Scholar
Bhat, S.F., Alhashim, I., Wonka, P.: LocalBins: improving depth estimation by learning local distributions. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 480–496. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_28
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: ZoeDepth: zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
Cao, A., Johnson, J.: HexPlane: a fast representation for dynamic scenes. In: CVPR (2023)
Google Scholar
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: TensoRF: tensorial radiance fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision (ECCV), vol. 13692. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_20
Fang, J., et al.: Fast dynamic radiance fields with time-aware neural voxels. In: SIGGRAPH Asia 2022 Conference Papers (2022)
Google Scholar
Gao, H., Li, R., Tulsiani, S., Russell, B., Kanazawa, A.: Monocular dynamic view synthesis: a reality check. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Katsumata, K., Vo, D.M., Nakayama, H.: An efficient 3D Gaussian representation for monocular/multi-view dynamic scenes. arXiv preprint arXiv:2311.12897 (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 1–14 (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026 (2023)
Google Scholar
Kopf, J., Rong, X., Huang, J.B.: Robust consistent video depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1611–1621 (2021)
Google Scholar
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6498–6508 (2021)
Google Scholar
Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: DynIBaR: neural dynamic image-based rendering. arXiv preprint arXiv:2211.11082 (2022)
Li, Z., Wang, X., Liu, X., Jiang, J.: BinsFormer: revisiting adaptive bins for monocular depth estimation. arXiv preprint arXiv:2204.00987 (2022)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. Preprint (2023)
Google Scholar
Luo, X., Huang, J.B., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. ACM Trans. Graph. (ToG) 39(4), 71–81 (2020)
Article Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (TOG) 41(4), 1–15 (2022)
Article Google Scholar
Park, K., et al.: Nerfies: deformable neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5865–5874 (2021)
Google Scholar
Park, K., et al.: HyperNeRF: a higher-dimensional representation for topo logically varying neural radiance fields. ACM Trans. Graph. 40(6), 1–12 (2021)
Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10318–10327 (2021)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2020)
Article Google Scholar
Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: explicit radiance fields in space, time, and appearance. In: CVPR (2023)
Google Scholar
Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4D: efficient neural 4D decomposition for high-fidelity dynamic reconstruction and rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)
Google Scholar
Wu, G., et al.: 4D Gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023)
Xie, T., et al.: PhysGaussian: physics-integrated 3D Gaussians for generative dynamics. arXiv preprint arXiv:2311.12198 (2023)
Yang, J., Gao, M., Li, Z., Gao, S., Wang, F., Zheng, F.: Track anything: segment anything meets videos. arXiv preprint arXiv:2304.11968 (2023)
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)
Yin, W., et al.: Metric3D: towards zero-shot metric 3D prediction from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9043–9053 (2023)
Google Scholar
Zhang, Z., Cole, F., Tucker, R., Freeman, W.T., Dekel, T.: Consistent depth of moving objects in video. ACM Trans. Graph. (TOG) 40(4), 1–12 (2021)
Google Scholar
Zhou, K., et al.: DynPoint: dynamic neural point for view synthesis. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang (No. 2023C01143), the Anhui Provincial Major Science and Technology Project (No. 202203a05020016), and the National Key R&D Program of China (No. 2022YFB3303400).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, 230027, China
Yixin Luo, Zhangjin Huang & Xudong Huang
Deqing Alpha Innovation Institute, Huzhou, 313299, China
Yixin Luo & Zhangjin Huang

Authors

Yixin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhangjin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xudong Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhangjin Huang .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
Eastern Institute of Technology, Ningbo, China
Yijie Pan
Eastern Institute of Technology, Ningbo, China
Qinhu Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, Y., Huang, Z., Huang, X. (2024). Improving Dynamic 3D Gaussian Splatting from Monocular Videos with Object Motion Information. In: Huang, DS., Pan, Y., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14872. Springer, Singapore. https://doi.org/10.1007/978-981-97-5612-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-97-5612-4_8
Published: 31 July 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5611-7
Online ISBN: 978-981-97-5612-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics