Abstract
Despite the significant advancements achieved by recent 3D-Gaussian-based approaches in dynamic scene reconstruction, their efficacy is markedly diminished in monocular settings, particularly under conditions of rapid object motion. This issue arises from the inherent one-to-many mapping between monocular video and the dynamic scene, i.e., discerning precise object motion states from a monocular video is challenging while varying motion states may correspond to distinct scenes. To alleviate the issue, firstly, we explicitly extract the object motion states information from the monocular video with a pretrained video tracking model, TAM, and then separate 3D Gaussians into static and dynamic subsets based on such motion states information. Secondly, we present a three-stage training strategy to optimize 3D Gaussian across distinct motion states. Moreover, we introduce an innovative augmentation technique that provides augment views for supervising 3D Gaussians, thereby enriching the model with more multi-view information, pivotal for accurate interpretation of motion states. Our empirical evaluations on Nvidia and iPhone, two of the most challenging monocular datasets, demonstrates our method's superior reconstruction capabilities over other dynamic Gaussian models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, A., Arora, C.: Attention attention everywhere: monocular depth prediction with skip attention. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5861–5870 (2023)
Bhat, S.F., Alhashim, I., Wonka, P.: AdaBins: depth estimation using adaptive bins. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009–4018 (2021)
Bhat, S.F., Alhashim, I., Wonka, P.: LocalBins: improving depth estimation by learning local distributions. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 480–496. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_28
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: ZoeDepth: zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
Cao, A., Johnson, J.: HexPlane: a fast representation for dynamic scenes. In: CVPR (2023)
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: TensoRF: tensorial radiance fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision (ECCV), vol. 13692. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_20
Fang, J., et al.: Fast dynamic radiance fields with time-aware neural voxels. In: SIGGRAPH Asia 2022 Conference Papers (2022)
Gao, H., Li, R., Tulsiani, S., Russell, B., Kanazawa, A.: Monocular dynamic view synthesis: a reality check. In: Advances in Neural Information Processing Systems (2022)
Katsumata, K., Vo, D.M., Nakayama, H.: An efficient 3D Gaussian representation for monocular/multi-view dynamic scenes. arXiv preprint arXiv:2311.12897 (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 1–14 (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026 (2023)
Kopf, J., Rong, X., Huang, J.B.: Robust consistent video depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1611–1621 (2021)
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6498–6508 (2021)
Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: DynIBaR: neural dynamic image-based rendering. arXiv preprint arXiv:2211.11082 (2022)
Li, Z., Wang, X., Liu, X., Jiang, J.: BinsFormer: revisiting adaptive bins for monocular depth estimation. arXiv preprint arXiv:2204.00987 (2022)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. Preprint (2023)
Luo, X., Huang, J.B., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. ACM Trans. Graph. (ToG) 39(4), 71–81 (2020)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (TOG) 41(4), 1–15 (2022)
Park, K., et al.: Nerfies: deformable neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5865–5874 (2021)
Park, K., et al.: HyperNeRF: a higher-dimensional representation for topo logically varying neural radiance fields. ACM Trans. Graph. 40(6), 1–12 (2021)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10318–10327 (2021)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2020)
Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: explicit radiance fields in space, time, and appearance. In: CVPR (2023)
Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4D: efficient neural 4D decomposition for high-fidelity dynamic reconstruction and rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2023)
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)
Wu, G., et al.: 4D Gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023)
Xie, T., et al.: PhysGaussian: physics-integrated 3D Gaussians for generative dynamics. arXiv preprint arXiv:2311.12198 (2023)
Yang, J., Gao, M., Li, Z., Gao, S., Wang, F., Zheng, F.: Track anything: segment anything meets videos. arXiv preprint arXiv:2304.11968 (2023)
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)
Yin, W., et al.: Metric3D: towards zero-shot metric 3D prediction from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9043–9053 (2023)
Zhang, Z., Cole, F., Tucker, R., Freeman, W.T., Dekel, T.: Consistent depth of moving objects in video. ACM Trans. Graph. (TOG) 40(4), 1–12 (2021)
Zhou, K., et al.: DynPoint: dynamic neural point for view synthesis. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Acknowledgments
This work was supported in part by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang (No. 2023C01143), the Anhui Provincial Major Science and Technology Project (No. 202203a05020016), and the National Key R&D Program of China (No. 2022YFB3303400).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Luo, Y., Huang, Z., Huang, X. (2024). Improving Dynamic 3D Gaussian Splatting from Monocular Videos with Object Motion Information. In: Huang, DS., Pan, Y., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14872. Springer, Singapore. https://doi.org/10.1007/978-981-97-5612-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-97-5612-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5611-7
Online ISBN: 978-981-97-5612-4
eBook Packages: Computer ScienceComputer Science (R0)