Sequential Fusion of Multi-view Video Frames for 3D Scene Generation

Sun, Weilin; Li, Xiangxian; Li, Manyi; Wang, Yuqing; Zheng, Yuze; Meng, Xiangxu; Meng, Lei

doi:10.1007/978-3-031-20497-5_49

Weilin Sun¹²,
Xiangxian Li¹²,
Manyi Li¹²,
Yuqing Wang¹²,
Yuze Zheng¹²,
Xiangxu Meng¹² &
…
Lei Meng¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1785 Accesses
6 Citations

Abstract

3D scene understanding and generation are to reconstruct the layout of the scene and each object from an RGB image, estimate its semantic type in 3D space and generate a 3D scene. At present, the 3D scene generation algorithm based on deep learning mainly recovers the 3D scene from a single image. Due to the complexity of the real environment, the information provided by a single image is limited, and there are problems such as the lack of single-view information and the occlusion of objects in the scene. In response to the above problems, we propose a 3D scene generation framework SGMT, which realizes multi-view position information fusion and reconstructs the 3D scene from multi-view video time series data to compensate for the missing object position in existing methods. We demonstrated the effectiveness of multi-view scene generation of SGMT on the UrbanScene3D and SUNRGBD dataset and studied the influence of SGCN and joint fine-tuning. In addition, we further explored the transfer ability of the SGMT between datasets and discussed future improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

3D-Scene-Former: 3D scene generation from a single RGB image using Transformers

Article 15 July 2024

Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation

Road Perspective Depth Reconstruction from Single Images Using Reduce-Refine-Upsample CNNs

References

Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds (2017)
Google Scholar
Dasgupta, S., Fang, K., Chen, K., Savarese, S.: DeLay: robust spatial layout estimation for cluttered indoor scenes. In: Computer Vision Pattern Recognition (2016)
Google Scholar
Genova, K., Cole, F., Sud, A., Sarna, A., Funkhouser, T.: Local deep implicit functions for 3D shape (2019)
Google Scholar
Gkioxari, G., Malik, J., Johnson, J.: Mesh R-CNN (2019)
Google Scholar
Hirzer, M., Roth, P.M., Lepetit, V.: Smart hypothesis generation for efficient and robust room layout estimation (2019)
Google Scholar
Hsiao, C.W., Sun, C., Sun, M., Chen, H.T.: Flat2Layout: flat representation for estimating layout of general room types (2019)
Google Scholar
Huang, S.: Cooperative holistic scene understanding: unifying 3D object, layout, and camera pose estimation (2018)
Google Scholar
Huang, S., Qi, S., Zhu, Y., Xiao, Y., Xu, Y., Zhu, S.C.: Holistic 3D scene parsing and reconstruction from a single RGB image. In: European Conference on Computer Vision (2018)
Google Scholar
Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Darrell, T.: A category-level 3D object dataset: putting the kinect to work. In: IEEE International Conference on Computer Vision Workshops (2013)
Google Scholar
Kulkarni, N., Misra, I., Tulsiani, S., Gupta, A.: 3D-RelNet: joint object and relational network for 3D prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Lin, H.J., Huang, S.W., Lai, S.H., Chiang, C.K.: Indoor scene layout estimation from a single image. In: 2018 24th International Conference on Pattern Recognition (ICPR) (2018)
Google Scholar
Liu, Y., Xue, F., Huang, H.: UrbanScene3D: a large scale urban scene dataset and simulator (2021)
Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: ACM SIGGRAPH Computer Graphics, pp. 163–169 (1987)
Google Scholar
Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Nie, Y., Han, X., Guo, S., Zheng, Y., Zhang, J.J.: Total3DUnderstanding: joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Ren, Y., Li, S., Chen, C., Kuo, C.-C.J.: A Coarse-to-Fine Indoor Layout Estimation (CFILE) method. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 36–51. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54193-8_3
Chapter Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGP-D: a RGP-D scene understanding benchmark suite. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Xiao, J., Owens, A.H., Torralba, A.: SUN3D: a database of big spaces reconstructed using SfM and object labels. In: 2013 IEEE International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Xiao, J., Wang, R., Chen, X.: Holistic pose graph: modeling geometric structure among objects in a scene using graph inference for 3D object prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12717–12726, (October 2021)
Google Scholar
Zhang, C., Cui, Z., Zhang, Y., Zeng, B., Liu, S.: Holistic 3D scene understanding from a single image with implicit representation (2021)
Google Scholar

Download references

Acknowledgments

This work is supported in part by the Excellent Youth Scholars Program of Shandong Province (Grant no. 2022HWYQ-048) and the Oversea Innovation Team Project of the “20 Regulations for New Universities” funding program of Jinan (Grant no. 2021GXRC073).

Author information

Authors and Affiliations

Shandong University, Jinan, Shandong, China
Weilin Sun, Xiangxian Li, Manyi Li, Yuqing Wang, Yuze Zheng, Xiangxu Meng & Lei Meng

Authors

Weilin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiangxian Li
View author publications
You can also search for this author in PubMed Google Scholar
Manyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuqing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuze Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiangxu Meng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Meng .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, W. et al. (2022). Sequential Fusion of Multi-view Video Frames for 3D Scene Generation. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_49

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_49
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sequential Fusion of Multi-view Video Frames for 3D Scene Generation