Abstract
Compared with the traditional video, omnidirectional stereo video (ODSV) provides a larger field of view (FOV) with depth perception but makes the capturing, processing and displaying more complicated. Even though many attempts have been made to address these challenges, they leave one or more of the following problems: complicated camera rig, high latency and visible distortions. This paper presents a practical end-to-end solution based on a novel hybrid representation to solve these problems simultaneously. The proposed solution is directly from capturing to displaying, which removes the processing step, thus reducing the total time consumption and visible stitching distortions. This hybrid representation is piecewise linear about the horizontal viewing direction whose domain of definition is 0∘ to 360∘ with an assumption that the background is static, consisting of both static and moving regions. Using this representation, ODSV can be presented by omnidirectional stereo images and normal stereo pair of videos respectively. Moreover, a single panoramic camera strategy can be adopted to capture the omnidirectional stereo images in real environment and a normal binocular camera can be used to capture the stereo pair of videos. To display the ODSV, this paper presents a real-time tracking-based rendering algorithm for head mounted display (HMD). Experiments show that the proposed method is effective and cost-efficient. In contrast to state-of-the-art methods, the proposed method significantly reduces the complexity of camera rig and data amount, preserving a competitive stereo quality without visible distortions.






Similar content being viewed by others
References
Anderson R, Gallup D, Barron J T, Kontkanen J, Snavely N, Hernandez C, Agarwal S, Seitz S M (2016) Jump: virtual reality video. ACM Trans Graph 198:1–13. https://doi.org/10.1145/2980179.2980257
Appia K, Batur U (2014) Fully automatic 2D to 3D conversion with aid of High-Level image features. In: Proceedings of SPIE 9011, stereoscopic displays and applications XXV, 90110W. https://doi.org/10.1117/12.2040907
FFmpeg Developers (2021) ffmpeg tool (version be1d324)[software]. http://ffmpeg.org/. Accessed 18 May 2021
Facebook (2021) Introducing oculus quest 2, the next generation of all-in-one VR. https://www.oculus.com/blog/introducing-oculus-quest-2-the-next-generation-of-all-in-one-vr-gaming/. Accessed 18 May 2021
Fan CL, Lo WC, Pai YT, Hsu CH (2019) A survey on 360∘ video streaming: acquisition, transmission, and display. ACM Comput Surv 52 (4):1–36. https://doi.org/10.1145/3329119
Flynn J, Neulander I, Philbin J, Snavely N (2016) Deep stereo: learning to predict new views from the world’s imagery. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), vol 595, pp 5515–5524. https://doi.org/10.1109/CVPR.2016.595
Forrest B (2021) Surround360 is now open source. https://engineering.fb.com/2016/07/26/video-engineering/surround-360-is-now-open-source/. Accessed 18 May 2021
Google VR (2021) Experience virtual reality in a simple, fun, and affordable way. https://arvr.google.com/cardboard/. Accessed 18 May 2021
Huang SK, Lin HS, Ouhyoung M (2017) Effective omnistereo panorama video generation by deformable spheres. ACM SIGGRAPH 2017 Posters (SIGGRAPH ’17) 22:1–2. https://doi.org/10.1145/3102163.3102199
Konrad J, Wang M, Ishwar P, Wu C, Mukherjee D (2013) Learning-based, automatic 2D-to-3D image and video conversion. IEEE Trans Image Process 22 (9):3485–3496. https://doi.org/10.1109/TIP.2013.2270375
Konrad R, Dansereau DG, Masood A, Wetzstein G (2017) SpinVR: towards live-streaming 3D virtual reality video. ACM Trans Graph 36 (6):1–12. https://doi.org/10.1145/3130800.3130836
Koulieris GA, Akşit K, Stengel M, Mantiuk RK, Mania K, Richardt C (2019) Near-Eye display and tracking technologies for virtual and augmented reality. Comput Graph Forum 38:493–519. https://doi.org/10.1111/cgf.13654
Lee J, Kim B, Kim K, Kim Y, Noh J (2016) Rich360: optimized spherical representation from structured panoramic camera arrays. Int Conf Comput Graph Interact Tech 35(4):1–11. https://doi.org/10.1145/2897824.2925983
Lessig C, Desbrun M, Fiume E (2014) A constructive theory of sampling for image synthesis using reproducing kernel bases. ACM Trans Graph 33 (7):1–14. https://doi.org/10.1145/2601097.2601149
Limonov A, Yu X, Juan L, Lei C, Jian Y (2018) Stereoscopic realtime 360-degree video stitching. In: 2018 IEEE international conference on consumer electronics (ICCE), pp 1–6. https://doi.org/10.1109/ICCE.2018.8326105https://doi.org/10.1109/ICCE.2018.8326105
Lu J, Yang Y, Liu RY, Kang SB, Yu JY (2019) 2D-to-stereo panorama conversion using gan and concentric mosaics. IEEE Access 7:23187–23196. https://doi.org/10.1109/ACCESS.2019.2899221
Mashraki A (2021) Surround360. https://github.com/facebook/Surround360. Accessed 18 May 2021
Matzen K, Cohen M F, Evans B, Kopf J, Szeliski R (2017) Low-Cost 360 Stereo photography and video capture. ACM Trans Graph 36(4):1–12. https://doi.org/10.1145/3072959.3073645
Ochi D, Kunita Y, Kameda A, Kojima A, Iwaki S (2015) Live streaming system for omnidirectional video. In: 2015 IEEE virtual reality (VR), pp 349–350. https://doi.org/10.1109/VR.2015.7223439
Peleg S, Ben-ezra M (1999) Stereo panorama with a single camera. Proceedings 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149) 1:395–401. https://doi.org/10.1109/CVPR.1999.786969
Pharr M, Jakob W, Humphreys G (2016) Physically based rendering: from theory to implementation, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
Pico (2021) Pico Enterprise solutions. https://www.pico-interactive.com/us/. Accessed 18 May 2021
Ra KK, Clark JJ (2019) Decoupled hybrid 360∘ panoramic stereo video. In: 2019 International conference on 3D vision (3DV), pp 386–394. https://doi.org/10.1109/3DV.2019.00050
Richardt C (2020) Omnidirectional stereo. In: Ikeuchi K (ed) Computer vision. Springer, Cham. https://doi.org/10.1007/978-3-030-03243-2_808-1
Richardt C, Pritch Y, Zimmer H, Sorkine-Hornung A (2013) Megastereo: constructing high-resolution stereo panoramas. 2013 IEEE Conference on Computer Vision and Pattern Recognition 1:1256–1263. https://doi.org/10.1109/CVPR.2013.166
Richardt C, Hedman P, Overbeck R S, Cabral B, Konrad B, Sullivan S (2019) Capture4VR: from VR photography to VR video. In: SIGGRAPH courses, pp 1–319. https://doi.org/10.1145/3305366.3328028
Richardt C, Tompkin J, Wetzstein G (2020) Capture, reconstruction, and representation of the visual real world for virtual reality. In: Magnor M, Sorkine-Hornung A (eds) Real VR – immersive digital reality. Lecture notes in computer science, vol 11900. Springer, Cham, pp 3–32. https://doi.org/10.1007/978-3-030-41816-8_1
Rousselle F, Jarosz W, Novak J (2016) Image-space control variates for rendering. ACM Trans Graph 35(6):1–12. https://doi.org/10.1145/2980179.2982443
Schroers C, Bazin JC, Sorkine-Hornung A (2018) An omnistereoscopic video pipeline for capture and display of real-world VR. ACM Trans Graph 37 (3):1–13. https://doi.org/10.1145/3225150
Sengupta S, Jayaram V, Curless B, Seitz S, Kemelmacher-Shlizerman I (2020) Background matting: the world is your green screen. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2291–2300
Tang M, Wen J, Zhang Y, Gu J, Junker P, Guo B, Jhao G, Zhu Z, Han Y (2019) A universal optical flow based real-time low-latency omnidirectional stereo video system. IEEE Trans Multimedia 21(4):957–972. https://doi.org/10.1109/TMM.2018.2867266
W3C Immersive web working and community groups (2021) WebXR. https://immersiveweb.dev/. Accessed 18 May 2021
Wikipedia contributors (2021) Anaglyph 3D. https://en.wikipedia.org/wiki/Anaglyph_3D. Accessed 18 May 2021
Xie J, Girshick R, Farhadi A (2016) Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. ECCV 2016. Lecture notes in computer science, vol 9908. Springer, Cham. https://doi.org/10.1007/978-3-319-46493-0_51
Xu F, Zhao T, Luo B, Dai Q (2018) Generating VR live videos with tripod panoramic rig. In: 2018 IEEE conference on virtual reality and 3D user interfaces (VR), pp 446–449. https://doi.org/10.1109/VR.2018.8448283
Zhang E, Cohen MF, Curless B (2016) Emptying, refurnishing, and relighting indoor spaces. ACM Trans Graph 35(6):1–14. https://doi.org/10.1145/2980179.2982432
Zhang J, Zhu T, Zhang A, Yuan X, Wang Z, Beetschen S, Xu L, Lin X, Dai Q, Fang L (2020) Multiscale-VR: multiscale gigapixel 3D panoramic videography for virtual reality. In: 2020 IEEE international conference on computational photography (ICCP), pp 1–12. https://doi.org/10.1109/ICCP48838.2020.9105244
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ai, X., Wang, Y., Chen, X. et al. Omnidirectional stereo video using a hybrid representation. Multimed Tools Appl 82, 3995–4010 (2023). https://doi.org/10.1007/s11042-022-13432-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13432-8