research-article

Towards Full-scene Volumetric Video Streaming via Spatially Layered Representation and NeRF Generation

Authors:

Jiangchuan Liu,

Jingdong XuAuthors Info & Claims

NOSSDAV '24: Proceedings of the 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video

Pages 22 - 28

https://doi.org/10.1145/3651863.3651879

Published: 15 April 2024 Publication History

NOSSDAV '24: Proceedings of the 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video

Towards Full-scene Volumetric Video Streaming via Spatially Layered Representation and NeRF Generation

Pages 22 - 28

Abstract
References

Abstract

Immersive full-scene volumetric video (VV) showcases the richness and detail of the 3D world, yet poses significant streaming challenges given its massive data volume. Existing 3D tile-based viewport approaches struggle to effectively adapt to full-scene VV owing to their small video buffer limitation, high tile segmentation overhead, and lack of full-scene consideration.

In response, by exploiting spatially independent attributes of elements in VV, we present V²NeRF, a novel full-scene VV streaming system featured by layered representation. It harmonizes the implicit neural radiance field (NeRF) with explicit point clouds to represent the static background and dynamic foreground, thereby avoiding large data transfer. Moreover, we propose a lightweight non-visible background removal method and a two-stage decoupled architecture to address the issues of intensive computation requirements and multiscale adaptation scheduling. An efficient buffer-aware simulated annealing algorithm is developed, alongside the utilization of a perceptually-learned metric, to enhance user experience. Extensive prototype evaluations demonstrate V²NeRF's superior streaming and viewing performance.

References

[1]

J. T Barron, B. Mildenhall, D. Verbin, et al. 2022. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '22). 5470--5479.

[2]

R. Cheng, K. Liu, N. Wu, and B. Han. 2023. Enriching telepresence with semantic-driven holographic communication. In Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets '23). 147--156.

[3]

Federal communications commission (FCC). 2023. Measuring broadband raw data releases. https://www.fcc.gov/oet/mba/raw-data-releases/. [Online; accessed 3-Dec-2023].

[4]

E. d'Eon, B. Harrison, T. Myers, and P. Chou, A. Geneva, January, 2017. 8i voxelized full bodies - A voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006 (Geneva, January, 2017).

[5]

Google. 2023. Draco 3D Data Compression. https://github.com/google/draco. [Online; accessed 3-Oct-2023].

[6]

D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, and A. Tabatabai. 2020. An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Transactions on Signal and Information Processing 9 (2020), e13.

[7]

Y. Guan, X. Hou, N. Wu, B. Han, and T. Han. 2023. MetaStream: Live volumetric content capture, creation, delivery, and rendering in real time. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '23). 1--15.

[8]

S. Gül, D. Podborski, T. Buchholz, et al. 2020. Low-latency cloud-based volumetric video streaming using head motion prediction. In Proceedings of the ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV '20). 27--33.

Digital Library

[9]

B. Han, Y. Liu, and F. Qian. 2020. ViVo: Visibility-aware mobile volumetric video streaming. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '20). 1--13.

[10]

A. hen, Z. Xu, X. Wei, S. Tang, H. Su, and A. Geiger. 2023. Dictionary fields: Learning a neural basis decomposition. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '23).

[11]

K. Hu, Y. Jin, H. Yang, J. Liu, and F. Wang. 2023. FSVVD: A dataset of full scene volumetric video. In Proceedings of the Conference on ACM Multimedia Systems (MMSys '23). 410--415.

[12]

K. Hu, H. Yang, Y. Jin, et al. 2023. Understanding user behavior in volumetric video watching: Dataset, analysis and prediction. In Proceedings of the ACM international conference on multimedia (MM '23). 1108--1116.

Digital Library

[13]

ISO/IEC 14496-2 Information Technology --- Coding of Audio-visual Objects --- Part 2: Visual. 1999. https://api.semanticscholar.org/CorpusID:14775904

[14]

S. Kirkpatrick, C D. Gelatt Jr, and M. P Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671--680.

[15]

K. Lee, J. Yi, Y. Lee, et al. 2020. GROOT: A real-time streaming system of high-fidelity volumetric videos. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '20). 1--14.

Digital Library

[16]

L. Li, Z. Shen, Z. Wang, L. Shen, and L. Bo. 2023. Compressing volumetric radiance fields to 1 MB. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '23). 4222--4231.

[17]

T. Li, M. Slavcheva, M. Zollhoefer, S. Green, et al. 2022. Neural 3D video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '22). 5521--5531.

[18]

J. Liu, Y. Wang, Y. Wang, Y. Wang, S. Cui, and F. Wang. 2023. Mobile volumetric video streaming system through implicit neural representation. In Proceedings of the Workshop on Emerging Multimedia Systems (EMS '23). 1--7.

[19]

J. Liu, B. Zhu, F. Wang, et al. 2023. CaV3: Cache-assisted viewport adaptive volumetric video streaming. In IEEE Conference Virtual Reality and 3D User Interfaces (VR '23). 173--183.

[20]

Kaiyan Liu et al. 2023. Toward next-generation volumetric video streaming with neural-based content representations. In Proceedings of the ACM Workshop on Mobile Immersive Computing, Networking, and Systems (ImmerCom '23). 199--207.

[21]

Y. Liu, B. Han, et al. 2022. Vues: Practical mobile volumetric video streaming through multiview transcoding. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '22). 514--527.

Digital Library

[22]

H. Mao et al. 2017. Neural adaptive video streaming with pensieve. In Proceedings of ACM Special Interest Group on Data Communication (SIGCOMM '17). 197--210.

Digital Library

[23]

B. Mildenhall, P. P Srinivasan, M. Tancik, et al. 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV '20). 4700--4708.

Digital Library

[24]

T. Müller, A. Evans, C. Schied, and A. Keller. 2022. Instant neural graphics primitives with a multiresolution hash encoding. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '22).

[25]

D. Raca, D. Leahy, C. J. Sreenan, and J. J. Quinlan. 2020. Beyond throughput, the next generation: A 5G dataset with channel and context metrics. In Proceedings of the ACM Multimedia Systems Conference (MMSys '20). 303--308.

[26]

K. Spiteri et al. 2020. BOLA: Near-optimal bitrate adaptation for online videos. IEEE/ACM Transactions On Networking (TON) 28, 4 (2020), 1698--1711.

Digital Library

[27]

S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, and P. Cesar. 2022. Evaluating the impact of tiled user-adaptive real-time point cloud streaming on VR remote communication. In Proceedings of the ACM International Conference on Multimedia (MM '22). 3094--3103.

[28]

T. Takikawa, J. Evans, A.and Tremblay, T. Müller, M. McGuire, A. Jacobson, and S. Fidler. 2022. Variable bitrate neural fields. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '22). 1--9.

[29]

H. Turki et al. 2023. SUDS: Scalable urban dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '23).

[30]

L. Wang, C. Li, W. Dai, et al. 2022. QoE-driven adaptive streaming for point clouds. IEEE Transactions on Multimedia (TMM) 25 (2022), 2543--2558.

Digital Library

[31]

Y. Wang, D. Zhao, H. Zhang, C. Huang, T. Gao, Z. Guo, L. Pang, and H. Ma. 2023. Hermes: Leveraging implicit inter-frame correlation for bandwidth-efficient mobile volumetric video streaming. In Proceedings of the ACM international conference on multimedia (MM '23). 9185--9193.

[32]

M. Wijnants, G. Rovelo, P. Quax, and W. Lamotte. 2016. A pragmatically designed adaptive and web-compliant object-based video streaming methodology: Implementation and subjective evaluation. In Proceedings of the ACM international conference on multimedia (MM '16). 1267--1276.

[33]

Wiki. 2023. The introduction of Framebuffer. https://en.wikipedia.org/wiki/Framebuffer. [Online; accessed 26-Nov-2023].

[34]

J. Wu, Y. Guan, Q. Mao, Y. Cui, Z. Guo, and X. Zhang. 2023. ZGaming: Zero-latency 3D cloud gaming by image prediction. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '23). 710--723.

[35]

Z. Xia, Y. Zhou, F. Y Yan, and J. Jiang. 2022. Genet: Automatic curriculum generation for learning adaptation in networking. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '22). 397--413.

[36]

F. Y Yan, H. Ayers, Ch. Zhu, S. Fouladi, J. Hong, et al. 2020. Learning in situ: A randomized experiment in video streaming. In USENIX Symposium on Networked Systems Design and Implementation (NSDI '20). 495--511.

[37]

A. Zhang, C. Wang, B. Han, and F. Qian. 2022. YuZu: Neural-enhanced volumetric video streaming. In USENIX Symposium on Networked Systems Design and Implementation (NSDI '22). 137--154.

[38]

J. Zhang, X. Liu, X. Ye, et al. 2021. Editable free-viewpoint video using a layered neural representation. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '21).

[39]

R. Zhang, P. Isola, A. A Efros, E. Shechtman, and O. Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '18).

Cited By

Yin DShi JZhang MHuang ZLiu JDong FCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680908

Index Terms

Towards Full-scene Volumetric Video Streaming via Spatially Layered Representation and NeRF Generation
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature Grid
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Given the truly immersive viewing experiences, full-scene volumetric videos have received increasing attention from both academia and industry. Their vast data volumes, however, present significant challenges for real-time streaming over today's ...
Adaptively layered statistical volumetric obscurance
HPG '15: Proceedings of the 7th Conference on High-Performance Graphics

We accelerate volumetric obscurance, a variant of ambient occlusion, and solve undersampling artifacts, such as banding, noise or blurring, that screen-space techniques traditionally suffer from. We make use of an efficient statistical model to evaluate ...
Mobile Volumetric Video Streaming System through Implicit Neural Representation
EMS '23: Proceedings of the 2023 Workshop on Emerging Multimedia Systems

Volumetric video (VV) emerges as a new video paradigm with six degree-of-freedom (DoF) immersive viewing experience. Most existing VV systems focus on the point cloud (PtCl)-based architecture, which is however far from effective due to the huge video ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

NOSSDAV '24: Proceedings of the 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video

April 2024

77 pages

ISBN:9798400706134

DOI:10.1145/3651863

Program Chairs:
Amr Rizk
University of Duisburg-Essen, Germany
,
Maria Torres Vega
Katholieke Universiteit Leuven, Belgium

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

NOSSDAV '24

Sponsor:

SIGMM

NOSSDAV '24: 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video

April 15 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 118 of 363 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
282
Total Downloads

Downloads (Last 12 months)282
Downloads (Last 6 weeks)19

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yin DShi JZhang MHuang ZLiu JDong FCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680908

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten