skip to main content
10.1145/3651863.3651879acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article

Towards Full-scene Volumetric Video Streaming via Spatially Layered Representation and NeRF Generation

Published: 15 April 2024 Publication History

Abstract

Immersive full-scene volumetric video (VV) showcases the richness and detail of the 3D world, yet poses significant streaming challenges given its massive data volume. Existing 3D tile-based viewport approaches struggle to effectively adapt to full-scene VV owing to their small video buffer limitation, high tile segmentation overhead, and lack of full-scene consideration.
In response, by exploiting spatially independent attributes of elements in VV, we present V2NeRF, a novel full-scene VV streaming system featured by layered representation. It harmonizes the implicit neural radiance field (NeRF) with explicit point clouds to represent the static background and dynamic foreground, thereby avoiding large data transfer. Moreover, we propose a lightweight non-visible background removal method and a two-stage decoupled architecture to address the issues of intensive computation requirements and multiscale adaptation scheduling. An efficient buffer-aware simulated annealing algorithm is developed, alongside the utilization of a perceptually-learned metric, to enhance user experience. Extensive prototype evaluations demonstrate V2NeRF's superior streaming and viewing performance.

References

[1]
J. T Barron, B. Mildenhall, D. Verbin, et al. 2022. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '22). 5470--5479.
[2]
R. Cheng, K. Liu, N. Wu, and B. Han. 2023. Enriching telepresence with semantic-driven holographic communication. In Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets '23). 147--156.
[3]
Federal communications commission (FCC). 2023. Measuring broadband raw data releases. https://www.fcc.gov/oet/mba/raw-data-releases/. [Online; accessed 3-Dec-2023].
[4]
E. d'Eon, B. Harrison, T. Myers, and P. Chou, A. Geneva, January, 2017. 8i voxelized full bodies - A voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006 (Geneva, January, 2017).
[5]
Google. 2023. Draco 3D Data Compression. https://github.com/google/draco. [Online; accessed 3-Oct-2023].
[6]
D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, and A. Tabatabai. 2020. An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Transactions on Signal and Information Processing 9 (2020), e13.
[7]
Y. Guan, X. Hou, N. Wu, B. Han, and T. Han. 2023. MetaStream: Live volumetric content capture, creation, delivery, and rendering in real time. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '23). 1--15.
[8]
S. Gül, D. Podborski, T. Buchholz, et al. 2020. Low-latency cloud-based volumetric video streaming using head motion prediction. In Proceedings of the ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV '20). 27--33.
[9]
B. Han, Y. Liu, and F. Qian. 2020. ViVo: Visibility-aware mobile volumetric video streaming. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '20). 1--13.
[10]
A. hen, Z. Xu, X. Wei, S. Tang, H. Su, and A. Geiger. 2023. Dictionary fields: Learning a neural basis decomposition. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '23).
[11]
K. Hu, Y. Jin, H. Yang, J. Liu, and F. Wang. 2023. FSVVD: A dataset of full scene volumetric video. In Proceedings of the Conference on ACM Multimedia Systems (MMSys '23). 410--415.
[12]
K. Hu, H. Yang, Y. Jin, et al. 2023. Understanding user behavior in volumetric video watching: Dataset, analysis and prediction. In Proceedings of the ACM international conference on multimedia (MM '23). 1108--1116.
[13]
ISO/IEC 14496-2 Information Technology --- Coding of Audio-visual Objects --- Part 2: Visual. 1999. https://api.semanticscholar.org/CorpusID:14775904
[14]
S. Kirkpatrick, C D. Gelatt Jr, and M. P Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671--680.
[15]
K. Lee, J. Yi, Y. Lee, et al. 2020. GROOT: A real-time streaming system of high-fidelity volumetric videos. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '20). 1--14.
[16]
L. Li, Z. Shen, Z. Wang, L. Shen, and L. Bo. 2023. Compressing volumetric radiance fields to 1 MB. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '23). 4222--4231.
[17]
T. Li, M. Slavcheva, M. Zollhoefer, S. Green, et al. 2022. Neural 3D video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '22). 5521--5531.
[18]
J. Liu, Y. Wang, Y. Wang, Y. Wang, S. Cui, and F. Wang. 2023. Mobile volumetric video streaming system through implicit neural representation. In Proceedings of the Workshop on Emerging Multimedia Systems (EMS '23). 1--7.
[19]
J. Liu, B. Zhu, F. Wang, et al. 2023. CaV3: Cache-assisted viewport adaptive volumetric video streaming. In IEEE Conference Virtual Reality and 3D User Interfaces (VR '23). 173--183.
[20]
Kaiyan Liu et al. 2023. Toward next-generation volumetric video streaming with neural-based content representations. In Proceedings of the ACM Workshop on Mobile Immersive Computing, Networking, and Systems (ImmerCom '23). 199--207.
[21]
Y. Liu, B. Han, et al. 2022. Vues: Practical mobile volumetric video streaming through multiview transcoding. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom '22). 514--527.
[22]
H. Mao et al. 2017. Neural adaptive video streaming with pensieve. In Proceedings of ACM Special Interest Group on Data Communication (SIGCOMM '17). 197--210.
[23]
B. Mildenhall, P. P Srinivasan, M. Tancik, et al. 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV '20). 4700--4708.
[24]
T. Müller, A. Evans, C. Schied, and A. Keller. 2022. Instant neural graphics primitives with a multiresolution hash encoding. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '22).
[25]
D. Raca, D. Leahy, C. J. Sreenan, and J. J. Quinlan. 2020. Beyond throughput, the next generation: A 5G dataset with channel and context metrics. In Proceedings of the ACM Multimedia Systems Conference (MMSys '20). 303--308.
[26]
K. Spiteri et al. 2020. BOLA: Near-optimal bitrate adaptation for online videos. IEEE/ACM Transactions On Networking (TON) 28, 4 (2020), 1698--1711.
[27]
S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, and P. Cesar. 2022. Evaluating the impact of tiled user-adaptive real-time point cloud streaming on VR remote communication. In Proceedings of the ACM International Conference on Multimedia (MM '22). 3094--3103.
[28]
T. Takikawa, J. Evans, A.and Tremblay, T. Müller, M. McGuire, A. Jacobson, and S. Fidler. 2022. Variable bitrate neural fields. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '22). 1--9.
[29]
H. Turki et al. 2023. SUDS: Scalable urban dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '23).
[30]
L. Wang, C. Li, W. Dai, et al. 2022. QoE-driven adaptive streaming for point clouds. IEEE Transactions on Multimedia (TMM) 25 (2022), 2543--2558.
[31]
Y. Wang, D. Zhao, H. Zhang, C. Huang, T. Gao, Z. Guo, L. Pang, and H. Ma. 2023. Hermes: Leveraging implicit inter-frame correlation for bandwidth-efficient mobile volumetric video streaming. In Proceedings of the ACM international conference on multimedia (MM '23). 9185--9193.
[32]
M. Wijnants, G. Rovelo, P. Quax, and W. Lamotte. 2016. A pragmatically designed adaptive and web-compliant object-based video streaming methodology: Implementation and subjective evaluation. In Proceedings of the ACM international conference on multimedia (MM '16). 1267--1276.
[33]
Wiki. 2023. The introduction of Framebuffer. https://en.wikipedia.org/wiki/Framebuffer. [Online; accessed 26-Nov-2023].
[34]
J. Wu, Y. Guan, Q. Mao, Y. Cui, Z. Guo, and X. Zhang. 2023. ZGaming: Zero-latency 3D cloud gaming by image prediction. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '23). 710--723.
[35]
Z. Xia, Y. Zhou, F. Y Yan, and J. Jiang. 2022. Genet: Automatic curriculum generation for learning adaptation in networking. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '22). 397--413.
[36]
F. Y Yan, H. Ayers, Ch. Zhu, S. Fouladi, J. Hong, et al. 2020. Learning in situ: A randomized experiment in video streaming. In USENIX Symposium on Networked Systems Design and Implementation (NSDI '20). 495--511.
[37]
A. Zhang, C. Wang, B. Han, and F. Qian. 2022. YuZu: Neural-enhanced volumetric video streaming. In USENIX Symposium on Networked Systems Design and Implementation (NSDI '22). 137--154.
[38]
J. Zhang, X. Liu, X. Ye, et al. 2021. Editable free-viewpoint video using a layered neural representation. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '21).
[39]
R. Zhang, P. Isola, A. A Efros, E. Shechtman, and O. Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR '18).

Cited By

View all
  • (2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024

Index Terms

  1. Towards Full-scene Volumetric Video Streaming via Spatially Layered Representation and NeRF Generation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    NOSSDAV '24: Proceedings of the 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video
    April 2024
    77 pages
    ISBN:9798400706134
    DOI:10.1145/3651863
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 April 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. full-scene volumetric video
    2. spatially layered representation
    3. NeRF generation
    4. explicit point cloud

    Qualifiers

    • Research-article

    Conference

    NOSSDAV '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 118 of 363 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)282
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media