GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction

Qin, Yiming; Chi, Xiaoyu; Sheng, Bin; Lau, Rynson W. H.

doi:10.1007/s00371-023-02922-x

GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction

Original article
Published: 26 June 2023

Volume 39, pages 3597–3607, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yiming Qin^1,3,
Xiaoyu Chi²,
Bin Sheng¹ &
…
Rynson W. H. Lau³

316 Accesses
18 Citations
Explore all metrics

Abstract

Distributed parallel rendering provides a valuable way to navigate large-scale scenes. However, previous works typically focused on outputting ultra-high-resolution images. In this paper, we target on improving the interactivity of navigation and propose a large-scale scene navigation method, GuideRender, based on multi-modal view frustum movement prediction. Given previous frames, user inputs and object information, GuideRender first extracts frames, user inputs and objects features spatially and temporally using the multi-modal extractor. To obtain effective fused features for prediction, we introduce an attentional guidance fusion module to fuse these features of different domains with attention. Finally, we predict the movement of the view frustum based on the attentional fused features and obtain its future state for loading data in advance to reduce latency. In addition, to facilitate GuideRender, we design an object hierarchy hybrid tree for scene management based on the object distribution and hierarchy, and an adaptive virtual sub-frustum decomposition method based on the relationship between the rendering cost and the rendering node capacity for task decomposition. Experimental results show that GuideRender outperforms baselines in navigating large-scale scenes. We also conduct a user study to show that our method satisfies the navigation requirements in large-scale scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Data availability

The datasets that support the current study are available from the corresponding author on reasonable request.

Notes

References

Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
Borji, A., Sihite, D.N., Itti, L.: Probabilistic learning of task-specific visual attention. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 470–477 (2012)
Chim, J.H.P., Green, M., Lau, R.W.H., Va Leong, H., Si, A.: On caching and prefetching of virtual objects in distributed virtual environments. In: Proceedings of the Sixth ACM International Conference on Multimedia, MULTIMEDIA ’98, pp. 171–180. Association for Computing Machinery, New York, NY, USA (1998). https://doi.org/10.1145/290747.290769
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)
Article MathSciNet Google Scholar
Crockett, T.W.: An introduction to parallel rendering. Parallel Comput. 23(7), 819–843 (1997)
Article MATH Google Scholar
Eilemann, S., Makhinya, M., Pajarola, R.: Equalizer: a scalable parallel rendering framework. IEEE Trans. Visual Comput. Graphics 15, 436–452 (2009)
Article Google Scholar
Eilemann, S., Steiner, D., Pajarola, R.: Equalizer 2.0-convergence of a parallel rendering framework. IEEE Trans. Visual Comput. Graphics 26, 1292–1307 (2020)
Article Google Scholar
Han, M., Wald, I., Usher, W., Morrical, N., Knoll, A., Pascucci, V., Johnson, C.R.: A virtual frame buffer abstraction for parallel rendering of large tiled display walls. In: 2020 IEEE Visualization Conference (VIS), pp. 11–15. IEEE (2020)
Hartmann, D., Van der Auweraer, H.: Digital twins. In: Progress in Industrial Mathematics: Success Stories, pp. 3–17. Springer (2021)
Hu, Z., Bulling, A., Li, S., Wang, G.: Fixationnet: forecasting eye fixations in task-oriented virtual environments. IEEE Trans. Visual Comput. Graphics 27, 2681–2690 (2021)
Article Google Scholar
Hu, Z., Li, S., Zhang, C., Yi, K., Wang, G., Manocha, D.: Dgaze: Cnn-based gaze prediction in dynamic scenes. IEEE Trans. Visual Comput. Graphics 26, 1902–1911 (2020)
Article Google Scholar
Humphreys, G., Houston, M., Ng, R., Frank, R., Ahern, S., Kirchner, P., Klosowski, J.T.: Chromium: a stream-processing framework for interactive rendering on clusters. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (2002)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1254–1259 (1998)
Article Google Scholar
Johnson, G., Abram, G., Westing, B.M., Navrátil, P.A., Gaither, K.: Displaycluster: An interactive visualization environment for tiled displays. In: 2012 IEEE International Conference on Cluster Computing, pp. 239–247 (2012)
Karras, T., Aila, T.: Fast parallel construction of high-quality bounding volume hierarchies. In: Proceedings of the 5th High-Performance Graphics Conference, pp. 89–99 (2013)
Koulieris, G., Drettakis, G., Cunningham, D., Mania, K.: Gaze prediction using machine learning for dynamic stereo manipulation in games. In: 2016 IEEE Virtual Reality (VR), pp. 113–120 (2016)
Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder-decoder network for visual saliency prediction. Neural Netw. Off. J. Int. Neural Netw. Soc. 129, 261–270 (2020)
Article Google Scholar
Kummerer, M., Wallis, T.S., Gatys, L.A., Bethge, M.: Understanding low-and high-level contributions to fixation prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4789–4798 (2017)
Lai, D.Q.: A distributed memory hierarchy and data management for interactive scene navigation and modification on tiled display walls. IEEE Trans. Visual Comput. Graphics 21, 714–729 (2015)
Article Google Scholar
Lee, D., Choi, M., Lee, J.: Prediction of head movement in 360-degree videos using attention model. Sensors 21(11), 3678 (2021)
Article Google Scholar
Marrinan, T., Rizzi, S., Insley, J., Long, L., Renambot, L., Papka, M.: Pxstream: Remote visualization for distributed rendering frameworks. In: 2019 IEEE 9th Symposium on Large Data Analysis and Visualization (LDAV), pp. 37–41 (2019)
Meagher, D.: Octree encoding: A new technique for the representation, manipulation and display of arbitrary 3-d objects by computer (1980)
Moloney, B., Ament, M., Weiskopf, D., Möller, T.: Sort-first parallel volume rendering. IEEE Trans. Visual Comput. Graphics 17, 1164–1177 (2011)
Article Google Scholar
Ning, H., Wang, H., Lin, Y., Wang, W., Dhelim, S., Farha, F., Ding, J., Daneshmand, M.: A survey on metaverse: the state-of-the-art, technologies, applications, and challenges. arXiv preprint arXiv:2111.09673 (2021)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pribyl, J., Zemcik, P.: Multi-resolution next location prediction for distributed virtual environments. In: 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, pp. 247–254. IEEE (2010)
Ren, X., Lis, M.: Chopin: Scalable graphics rendering in multi-gpu systems via parallel image composition. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 709–722. IEEE (2021)
Repplinger, M., Löffler, A., Rubinstein, D., Slusallek, P.: Drone: A flexible framework for distributed rendering and display. In: International Symposium on Visual Computing, pp. 975–986. Springer (2009)
Theis, L., Korshunova, I., Tejani, A., Huszár, F.: Faster gaze prediction with dense networks and fisher pruning. arXiv preprint arXiv:1801.05787 (2018)
Tsai, Y.H.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L.P., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, pp. 6558. NIH Public Access (2019)
Wald, I., Johnson, G., Amstutz, J., Brownlee, C., Knoll, A., Jeffers, J., Günther, J., Navrátil, P.A.: Ospray—a cpu ray tracing framework for scientific visualization. IEEE Trans. Visual Comput. Graphics 23, 931–940 (2017)
Article Google Scholar
Wu, C., Zhang, R., Wang, Z., Sun, L.: A spherical convolution approach for learning long term viewport prediction in 360 immersive video. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 14003–14040 (2020)
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)
Article Google Scholar
Xu, Y., Dong, Y., Wu, J., Sun, Z., Shi, Z., Yu, J., Gao, S.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)
Yao, J., Pan, Z., Zhang, H.: A distributed render farm system for animation production. In: International Conference on Entertainment Computing, pp. 264–269. Springer (2009)

Download references

Funding

This work was supported by the National Key Research and Development Program of China under grant number 2022YFC2407000, the Interdisciplinary Program of Shanghai Jiao Tong University under grant number YG2023LC11 and YG2022ZD007, National Natural Science Foundation of China under grant number 62272298 and 62077037, the College-level Project Fund of Shanghai Jiao Tong University Affiliated Sixth People’s Hospital under grant number ynlc201909, the Medical-industrial Cross-fund of Shanghai Jiao Tong University under grant number YG2022QN089.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Yiming Qin & Bin Sheng
Qingdao Research Institute, Beihang University, Qingdao, China
Xiaoyu Chi
Department of Computer Science, City University of Hong Kong, Kowloon, China
Yiming Qin & Rynson W. H. Lau

Authors

Yiming Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Chi
View author publications
You can also search for this author in PubMed Google Scholar
Bin Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Rynson W. H. Lau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Sheng.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 58376 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qin, Y., Chi, X., Sheng, B. et al. GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction. Vis Comput 39, 3597–3607 (2023). https://doi.org/10.1007/s00371-023-02922-x

Download citation

Accepted: 28 May 2023
Published: 26 June 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00371-023-02922-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation