Skip to main content
Log in

GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Distributed parallel rendering provides a valuable way to navigate large-scale scenes. However, previous works typically focused on outputting ultra-high-resolution images. In this paper, we target on improving the interactivity of navigation and propose a large-scale scene navigation method, GuideRender, based on multi-modal view frustum movement prediction. Given previous frames, user inputs and object information, GuideRender first extracts frames, user inputs and objects features spatially and temporally using the multi-modal extractor. To obtain effective fused features for prediction, we introduce an attentional guidance fusion module to fuse these features of different domains with attention. Finally, we predict the movement of the view frustum based on the attentional fused features and obtain its future state for loading data in advance to reduce latency. In addition, to facilitate GuideRender, we design an object hierarchy hybrid tree for scene management based on the object distribution and hierarchy, and an adaptive virtual sub-frustum decomposition method based on the relationship between the rendering cost and the rendering node capacity for task decomposition. Experimental results show that GuideRender outperforms baselines in navigating large-scale scenes. We also conduct a user study to show that our method satisfies the navigation requirements in large-scale scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets that support the current study are available from the corresponding author on reasonable request.

Notes

  1. https://unity.com/.

  2. https://www.ogre3d.org/.

References

  1. Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)

  2. Borji, A., Sihite, D.N., Itti, L.: Probabilistic learning of task-specific visual attention. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 470–477 (2012)

  3. Chim, J.H.P., Green, M., Lau, R.W.H., Va Leong, H., Si, A.: On caching and prefetching of virtual objects in distributed virtual environments. In: Proceedings of the Sixth ACM International Conference on Multimedia, MULTIMEDIA ’98, pp. 171–180. Association for Computing Machinery, New York, NY, USA (1998). https://doi.org/10.1145/290747.290769

  4. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)

    Article  MathSciNet  Google Scholar 

  5. Crockett, T.W.: An introduction to parallel rendering. Parallel Comput. 23(7), 819–843 (1997)

    Article  MATH  Google Scholar 

  6. Eilemann, S., Makhinya, M., Pajarola, R.: Equalizer: a scalable parallel rendering framework. IEEE Trans. Visual Comput. Graphics 15, 436–452 (2009)

    Article  Google Scholar 

  7. Eilemann, S., Steiner, D., Pajarola, R.: Equalizer 2.0-convergence of a parallel rendering framework. IEEE Trans. Visual Comput. Graphics 26, 1292–1307 (2020)

    Article  Google Scholar 

  8. Han, M., Wald, I., Usher, W., Morrical, N., Knoll, A., Pascucci, V., Johnson, C.R.: A virtual frame buffer abstraction for parallel rendering of large tiled display walls. In: 2020 IEEE Visualization Conference (VIS), pp. 11–15. IEEE (2020)

  9. Hartmann, D., Van der Auweraer, H.: Digital twins. In: Progress in Industrial Mathematics: Success Stories, pp. 3–17. Springer (2021)

  10. Hu, Z., Bulling, A., Li, S., Wang, G.: Fixationnet: forecasting eye fixations in task-oriented virtual environments. IEEE Trans. Visual Comput. Graphics 27, 2681–2690 (2021)

    Article  Google Scholar 

  11. Hu, Z., Li, S., Zhang, C., Yi, K., Wang, G., Manocha, D.: Dgaze: Cnn-based gaze prediction in dynamic scenes. IEEE Trans. Visual Comput. Graphics 26, 1902–1911 (2020)

    Article  Google Scholar 

  12. Humphreys, G., Houston, M., Ng, R., Frank, R., Ahern, S., Kirchner, P., Klosowski, J.T.: Chromium: a stream-processing framework for interactive rendering on clusters. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (2002)

  13. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1254–1259 (1998)

    Article  Google Scholar 

  14. Johnson, G., Abram, G., Westing, B.M., Navrátil, P.A., Gaither, K.: Displaycluster: An interactive visualization environment for tiled displays. In: 2012 IEEE International Conference on Cluster Computing, pp. 239–247 (2012)

  15. Karras, T., Aila, T.: Fast parallel construction of high-quality bounding volume hierarchies. In: Proceedings of the 5th High-Performance Graphics Conference, pp. 89–99 (2013)

  16. Koulieris, G., Drettakis, G., Cunningham, D., Mania, K.: Gaze prediction using machine learning for dynamic stereo manipulation in games. In: 2016 IEEE Virtual Reality (VR), pp. 113–120 (2016)

  17. Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder-decoder network for visual saliency prediction. Neural Netw. Off. J. Int. Neural Netw. Soc. 129, 261–270 (2020)

    Article  Google Scholar 

  18. Kummerer, M., Wallis, T.S., Gatys, L.A., Bethge, M.: Understanding low-and high-level contributions to fixation prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4789–4798 (2017)

  19. Lai, D.Q.: A distributed memory hierarchy and data management for interactive scene navigation and modification on tiled display walls. IEEE Trans. Visual Comput. Graphics 21, 714–729 (2015)

    Article  Google Scholar 

  20. Lee, D., Choi, M., Lee, J.: Prediction of head movement in 360-degree videos using attention model. Sensors 21(11), 3678 (2021)

    Article  Google Scholar 

  21. Marrinan, T., Rizzi, S., Insley, J., Long, L., Renambot, L., Papka, M.: Pxstream: Remote visualization for distributed rendering frameworks. In: 2019 IEEE 9th Symposium on Large Data Analysis and Visualization (LDAV), pp. 37–41 (2019)

  22. Meagher, D.: Octree encoding: A new technique for the representation, manipulation and display of arbitrary 3-d objects by computer (1980)

  23. Moloney, B., Ament, M., Weiskopf, D., Möller, T.: Sort-first parallel volume rendering. IEEE Trans. Visual Comput. Graphics 17, 1164–1177 (2011)

    Article  Google Scholar 

  24. Ning, H., Wang, H., Lin, Y., Wang, W., Dhelim, S., Farha, F., Ding, J., Daneshmand, M.: A survey on metaverse: the state-of-the-art, technologies, applications, and challenges. arXiv preprint arXiv:2111.09673 (2021)

  25. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  26. Pribyl, J., Zemcik, P.: Multi-resolution next location prediction for distributed virtual environments. In: 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, pp. 247–254. IEEE (2010)

  27. Ren, X., Lis, M.: Chopin: Scalable graphics rendering in multi-gpu systems via parallel image composition. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 709–722. IEEE (2021)

  28. Repplinger, M., Löffler, A., Rubinstein, D., Slusallek, P.: Drone: A flexible framework for distributed rendering and display. In: International Symposium on Visual Computing, pp. 975–986. Springer (2009)

  29. Theis, L., Korshunova, I., Tejani, A., Huszár, F.: Faster gaze prediction with dense networks and fisher pruning. arXiv preprint arXiv:1801.05787 (2018)

  30. Tsai, Y.H.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L.P., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, pp. 6558. NIH Public Access (2019)

  31. Wald, I., Johnson, G., Amstutz, J., Brownlee, C., Knoll, A., Jeffers, J., Günther, J., Navrátil, P.A.: Ospray—a cpu ray tracing framework for scientific visualization. IEEE Trans. Visual Comput. Graphics 23, 931–940 (2017)

    Article  Google Scholar 

  32. Wu, C., Zhang, R., Wang, Z., Sun, L.: A spherical convolution approach for learning long term viewport prediction in 360 immersive video. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 14003–14040 (2020)

  33. Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)

    Article  Google Scholar 

  34. Xu, Y., Dong, Y., Wu, J., Sun, Z., Shi, Z., Yu, J., Gao, S.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)

  35. Yao, J., Pan, Z., Zhang, H.: A distributed render farm system for animation production. In: International Conference on Entertainment Computing, pp. 264–269. Springer (2009)

Download references

Funding

This work was supported by the National Key Research and Development Program of China under grant number 2022YFC2407000, the Interdisciplinary Program of Shanghai Jiao Tong University under grant number YG2023LC11 and YG2022ZD007, National Natural Science Foundation of China under grant number 62272298 and 62077037, the College-level Project Fund of Shanghai Jiao Tong University Affiliated Sixth People’s Hospital under grant number ynlc201909, the Medical-industrial Cross-fund of Shanghai Jiao Tong University under grant number YG2022QN089.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Sheng.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 58376 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Chi, X., Sheng, B. et al. GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction. Vis Comput 39, 3597–3607 (2023). https://doi.org/10.1007/s00371-023-02922-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02922-x

Keywords

Navigation