Skip to main content

Sequential Fusion of Multi-view Video Frames for 3D Scene Generation

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

Abstract

3D scene understanding and generation are to reconstruct the layout of the scene and each object from an RGB image, estimate its semantic type in 3D space and generate a 3D scene. At present, the 3D scene generation algorithm based on deep learning mainly recovers the 3D scene from a single image. Due to the complexity of the real environment, the information provided by a single image is limited, and there are problems such as the lack of single-view information and the occlusion of objects in the scene. In response to the above problems, we propose a 3D scene generation framework SGMT, which realizes multi-view position information fusion and reconstructs the 3D scene from multi-view video time series data to compensate for the missing object position in existing methods. We demonstrated the effectiveness of multi-view scene generation of SGMT on the UrbanScene3D and SUNRGBD dataset and studied the influence of SGCN and joint fine-tuning. In addition, we further explored the transfer ability of the SGMT between datasets and discussed future improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds (2017)

    Google Scholar 

  2. Dasgupta, S., Fang, K., Chen, K., Savarese, S.: DeLay: robust spatial layout estimation for cluttered indoor scenes. In: Computer Vision Pattern Recognition (2016)

    Google Scholar 

  3. Genova, K., Cole, F., Sud, A., Sarna, A., Funkhouser, T.: Local deep implicit functions for 3D shape (2019)

    Google Scholar 

  4. Gkioxari, G., Malik, J., Johnson, J.: Mesh R-CNN (2019)

    Google Scholar 

  5. Hirzer, M., Roth, P.M., Lepetit, V.: Smart hypothesis generation for efficient and robust room layout estimation (2019)

    Google Scholar 

  6. Hsiao, C.W., Sun, C., Sun, M., Chen, H.T.: Flat2Layout: flat representation for estimating layout of general room types (2019)

    Google Scholar 

  7. Huang, S.: Cooperative holistic scene understanding: unifying 3D object, layout, and camera pose estimation (2018)

    Google Scholar 

  8. Huang, S., Qi, S., Zhu, Y., Xiao, Y., Xu, Y., Zhu, S.C.: Holistic 3D scene parsing and reconstruction from a single RGB image. In: European Conference on Computer Vision (2018)

    Google Scholar 

  9. Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Darrell, T.: A category-level 3D object dataset: putting the kinect to work. In: IEEE International Conference on Computer Vision Workshops (2013)

    Google Scholar 

  10. Kulkarni, N., Misra, I., Tulsiani, S., Gupta, A.: 3D-RelNet: joint object and relational network for 3D prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  11. Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  12. Lin, H.J., Huang, S.W., Lai, S.H., Chiang, C.K.: Indoor scene layout estimation from a single image. In: 2018 24th International Conference on Pattern Recognition (ICPR) (2018)

    Google Scholar 

  13. Liu, Y., Xue, F., Huang, H.: UrbanScene3D: a large scale urban scene dataset and simulator (2021)

    Google Scholar 

  14. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: ACM SIGGRAPH Computer Graphics, pp. 163–169 (1987)

    Google Scholar 

  15. Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  16. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  17. Nie, Y., Han, X., Guo, S., Zheng, Y., Zhang, J.J.: Total3DUnderstanding: joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  19. Ren, Y., Li, S., Chen, C., Kuo, C.-C.J.: A Coarse-to-Fine Indoor Layout Estimation (CFILE) method. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 36–51. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54193-8_3

    Chapter  Google Scholar 

  20. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  21. Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGP-D: a RGP-D scene understanding benchmark suite. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 567–576 (2015)

    Google Scholar 

  22. Xiao, J., Owens, A.H., Torralba, A.: SUN3D: a database of big spaces reconstructed using SfM and object labels. In: 2013 IEEE International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  23. Xiao, J., Wang, R., Chen, X.: Holistic pose graph: modeling geometric structure among objects in a scene using graph inference for 3D object prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12717–12726, (October 2021)

    Google Scholar 

  24. Zhang, C., Cui, Z., Zhang, Y., Zeng, B., Liu, S.: Holistic 3D scene understanding from a single image with implicit representation (2021)

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by the Excellent Youth Scholars Program of Shandong Province (Grant no. 2022HWYQ-048) and the Oversea Innovation Team Project of the “20 Regulations for New Universities” funding program of Jinan (Grant no. 2021GXRC073).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Meng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, W. et al. (2022). Sequential Fusion of Multi-view Video Frames for 3D Scene Generation. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20497-5_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20496-8

  • Online ISBN: 978-3-031-20497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics