Abstract
3D scene understanding and generation are to reconstruct the layout of the scene and each object from an RGB image, estimate its semantic type in 3D space and generate a 3D scene. At present, the 3D scene generation algorithm based on deep learning mainly recovers the 3D scene from a single image. Due to the complexity of the real environment, the information provided by a single image is limited, and there are problems such as the lack of single-view information and the occlusion of objects in the scene. In response to the above problems, we propose a 3D scene generation framework SGMT, which realizes multi-view position information fusion and reconstructs the 3D scene from multi-view video time series data to compensate for the missing object position in existing methods. We demonstrated the effectiveness of multi-view scene generation of SGMT on the UrbanScene3D and SUNRGBD dataset and studied the influence of SGCN and joint fine-tuning. In addition, we further explored the transfer ability of the SGMT between datasets and discussed future improvements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds (2017)
Dasgupta, S., Fang, K., Chen, K., Savarese, S.: DeLay: robust spatial layout estimation for cluttered indoor scenes. In: Computer Vision Pattern Recognition (2016)
Genova, K., Cole, F., Sud, A., Sarna, A., Funkhouser, T.: Local deep implicit functions for 3D shape (2019)
Gkioxari, G., Malik, J., Johnson, J.: Mesh R-CNN (2019)
Hirzer, M., Roth, P.M., Lepetit, V.: Smart hypothesis generation for efficient and robust room layout estimation (2019)
Hsiao, C.W., Sun, C., Sun, M., Chen, H.T.: Flat2Layout: flat representation for estimating layout of general room types (2019)
Huang, S.: Cooperative holistic scene understanding: unifying 3D object, layout, and camera pose estimation (2018)
Huang, S., Qi, S., Zhu, Y., Xiao, Y., Xu, Y., Zhu, S.C.: Holistic 3D scene parsing and reconstruction from a single RGB image. In: European Conference on Computer Vision (2018)
Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Darrell, T.: A category-level 3D object dataset: putting the kinect to work. In: IEEE International Conference on Computer Vision Workshops (2013)
Kulkarni, N., Misra, I., Tulsiani, S., Gupta, A.: 3D-RelNet: joint object and relational network for 3D prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Lin, H.J., Huang, S.W., Lai, S.H., Chiang, C.K.: Indoor scene layout estimation from a single image. In: 2018 24th International Conference on Pattern Recognition (ICPR) (2018)
Liu, Y., Xue, F., Huang, H.: UrbanScene3D: a large scale urban scene dataset and simulator (2021)
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: ACM SIGGRAPH Computer Graphics, pp. 163–169 (1987)
Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Nie, Y., Han, X., Guo, S., Zheng, Y., Zhang, J.J.: Total3DUnderstanding: joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Ren, Y., Li, S., Chen, C., Kuo, C.-C.J.: A Coarse-to-Fine Indoor Layout Estimation (CFILE) method. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 36–51. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54193-8_3
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGP-D: a RGP-D scene understanding benchmark suite. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 567–576 (2015)
Xiao, J., Owens, A.H., Torralba, A.: SUN3D: a database of big spaces reconstructed using SfM and object labels. In: 2013 IEEE International Conference on Computer Vision (ICCV) (2013)
Xiao, J., Wang, R., Chen, X.: Holistic pose graph: modeling geometric structure among objects in a scene using graph inference for 3D object prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12717–12726, (October 2021)
Zhang, C., Cui, Z., Zhang, Y., Zeng, B., Liu, S.: Holistic 3D scene understanding from a single image with implicit representation (2021)
Acknowledgments
This work is supported in part by the Excellent Youth Scholars Program of Shandong Province (Grant no. 2022HWYQ-048) and the Oversea Innovation Team Project of the “20 Regulations for New Universities” funding program of Jinan (Grant no. 2021GXRC073).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, W. et al. (2022). Sequential Fusion of Multi-view Video Frames for 3D Scene Generation. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-20497-5_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)