3D analysis and image-based rendering for immersive TV applications

https://doi.org/10.1016/S0923-5965(02)00079-6Get rights and content

Abstract

Depth perception in images and video has been a relevant research issue for years, with the main focus on the basic idea of “stereoscopic” viewing. However, it is well known from the literature that stereovision is only one of the relevant depth cues and that motion parallax, as well as color, brightness and geometric appearance of video objects are at least of the same importance and that their individual influence mainly depending on the object distance. Thus, for depth perception it may sometimes be sufficient to watch pictures or movies on large screens with brilliant quality or to provide head-motion parallax viewing on conventional 2D displays. Based on this observation we introduce an open, flexible and modular immersive TV system that is backwards compatible to today's 2D digital television and that is able to support a wide range of different 2D and 3D displays. The system is based on a three-stage concept and aims to add more and more depth cues at each additional layer.

Introduction

As early as in the 1920s, John Logie Baird, one of the TV pioneers, dreamed of developing high-quality, three-dimensional (3D) color TV, as only such a TV would provide the most natural viewing experience [12], [2]. Eighty years later, the first black-and-white television prototypes have evolved into high-definition digital color TV, but the vision of a system that provides the viewer with the distinct feeling of “being there” still remains to be fulfilled. Today, this vision is described by a new buzzword called “immersion” and immersive media in general are a relevant research issue in many ongoing international R&D activities [13], [16], [18].

Usually, the main focus of immersion is the visualization of virtual environments on special, large-scale projection systems, such as workbenches or CAVEs [5]. Typical fields of application are training and education, as well as computer-aided design (CAD) and collaborative team working. The state-of-the-art form of presentation of the visual information is the classical “stereoscopic” viewing, where a user wears special glasses in combination with a head-tracker. Individual views for each eye are then generated according to the viewer's current head position, a feature also known as head-motion parallax viewing. Up to now, most systems in use are designed to visualize computer-generated content from digital storage media, whereby some are even integrated in shared, networked environments to facilitate interaction between users.

The application of immersive techniques to broadcast services, however, is rather new and opens a wide and exciting field of innovation. The ultimate challenge of an immersive broadcast system is the natural 3D reproduction and rendering of large-scale, real-world video scenes and their interactive presentation on suitable displays.

In this context, we describe the notion of an open, flexible and modular immersive TV system that is backwards-compatible to today's conventional, 2D digital television and that is able to support a wide range of different 2D and 3D displays. The flexibility of our concept is based on the observation that stereovision is only one of the relevant depth cues and that motion parallax, as well as color, brightness and geometric appearance of video objects are at least of the same importance [6]. For scene objects that are sufficiently far away from the viewer, these additional cues can even become dominant (see Fig. 1). Thus, to provide the user with a first, limited-depth impression, it may sometimes be sufficient to display high-resolution pictures or movies on large panoramic screens with brilliant quality. The best example for this is the successful story of the IMAX Dome®.

A somewhat more advanced depth presentation could provide head-motion parallax viewing on conventional 2D displays. In this case, the user's head position is tracked and images with the correct perspective are rendered in dependence on the current head-motion. This approach is of particular importance for the introduction of future 3D-TV systems, because it offers an intermediate solution with lower requirements on receiver and display complexity.

Following this ordering of depth cues and their relevance to receiver complexity, our system is based on a three-stage concept that aims to add more and more depth cues at each additional layer:

  • A relatively simple first stage, called immersive TV (ImTV), allows for the interactive viewing of large panoramic images and videos, supporting high resolution with brilliant quality as a first important parameter for depth perception. In this system, the visual information is captured by an omnidirectional camera setup and transmitted via a number of digital video broadcast (DVB) channels.

  • Starting from this baseline system, an extension called interactive virtual view video (IVVV) supports head-motion parallax as a second, additional depth cue. In this system, the scene is captured by a multiple baseline camera setup, from which an image-based depth representation is extracted through a 3D analysis of the visual data. The additional information is transmitted together with the basic video data via at least two DVB broadcast channels. At the receiver side, the head-motion of the viewer is tracked automatically to control a “virtual” camera, providing the correct perspective on a large 2D-TV screen.

  • With relatively low additional effort from an algorithmic point of view, the IVVV approach can be updated in a third stage to a full three-dimensional television (3D-TV) system, adding stereovision as a third depth cue. The flexible, image-based depth representation of IVVV makes it possible to adjust the “virtual” view synthesis to a wide range of different displays and viewing conditions.

In the next sections, we will discuss the basic concept of our three-stage immersive TV approach with the main focus on the first two stages of the concept. We will explain some algorithmic details of an IVVV system that is targeted at the live recording and transmission of real-world events, such as a soccer game or a theatre play, and we will present some promising results from computer simulations. Finally, we will conclude the paper and we will give a short outlook on our future work.

Section snippets

Immersive TV (ImTV)

A very attractive short-term concept of an ImTV approach has recently been proposed by the Independent Television Commission (ITC) [18]. In this concept it is envisaged to capture, encode and broadcast wide-angle, high-resolution views of live events combined with multi-channel audio. The visual information is then displayed in a way that accounts for an individual immersive viewing experience, using a head-mounted display in combination with a head-tracker as well as other multi-sensory

Perception of depth in 2D images

The so far described ImTV approach is based on the assumption that the visual changes that occur to a viewer who turns his head can correctly be modeled by a camera that rotates around its center of projection—with the resulting views being merely different clippings of a large, panoramic 2D image. Thus, the different HDTV frames can properly be captured by an omnidirectional setup, where the cameras are located at the same position but look into different directions. The composition of the

Interactive virtual view video (IVVV)

In this context HHI has started a new research activity called IVVV. The objective of IVVV is to develop new computer vision and computer graphics techniques that allow to extend the above-described short-term scenario of ImTV through the support of head-motion parallax viewing [7], [8]. For this purpose, a real-world scene is captured by a multiple baseline array of cameras, arranged to look at the particular location of interest. The visual information recorded by this camera setup is

Three-dimensional television (3D-TV)

With relatively low additional effort from an algorithmic point of view, the so far described IVVV approach can be updated in a third stage to a full 3D-TV system, adding stereovision as a supplementary, third depth cue. This extension merely requires the addition of a second rendering unit to the receiver system, so that “virtual” views for both eyes can be generated simultaneously. That way, images and movies can be viewed on any kind of stereoscopic single-user display. By adding even more

Conclusion and outlook

Starting from the observation that depth perception is of special importance for the development of future, more immersive entertainment technologies, this paper in particular introduced an open, flexible and modular immersive TV system that is backwards-compatible to today's conventional, digital 2D-TV and that is able to support a wide range of different 2D and 3D displays. The system is based on a three-stage concept with the aim to add more and more depth cues at each additional layer:

  • A

Acknowledgements

This work has been supported by the German Research Society (DFG), Grant-No. SCHA 877/1-1, as well as by the Ministry of Science and Technology of the Federal Republic of Germany (BMBF), Grant-No. 01 AK 022. The authors wish to thank Serap Askar, Nicole Brandenburg, Ingo Feldmann, Ulrich Gölz, Ulrich Höfker, Marcus Müller, Aljoscha Smolić and Christian Weissig for their support in this work, as well as all members of the ATTEST project for the ongoing fruitful discussions.

References (30)

  • C. Baillard, A. Zisserman, Automatic reconstruction of piecewise planar models from multiple views, in: Proceedings of...
  • M.H.I. Baird, Eye of the World: John Logie Baird and Television, Part II, 1996,...
  • B. Caprile et al.

    Using vanishing points for camera calibration

    The Int. J. Comput. Vis.

    (1990)
  • R. Cipolla, E. Boyer, 3D model acquisition from uncalibrated images, in: Proceedings of IAPR Workshop on Machine Vision...
  • C. Cruz-Neira, Virtual reality based on multiple projection screens: the CAVE and its applications to computational...
  • J.E. Cutting, P.M. Vishton, Perceiving layout and knowing distances: the interaction, relative potency, and contextual...
  • C. Fehn, E. Cooke, O. Schreer, P. Kauff, 3D analysis and image-based rendering for immersive TV applications, in:...
  • C. Fehn, P. Kauff, O. Schreer, R. Schäfer, Interactive virtual view video for immersive TV applications, in:...
  • I. Feldmann, S. Askar, N. Brandenburg, P. Kauff, O. Schreer, Real-time segmentation for advanced disparity estimation...
  • M.A. Fischler et al.

    Random sample consensusa paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (June 1981)
  • C. Harris, M. Stephans, A combined corner and edge detector, in: Proceedings of Fourth Alvey Conference, 1988, pp....
  • A.R. Hills, Eye of the World: John Logie Baird and Television, Part I, 1996,...
  • J.M. Hollerbach et al.

    The convergence of robotics, vision, and computer graphics for user information

    Int. J. Robotics Res.

    (1999)
  • P. Kauff, N. Brandenburg, M. Karl, O. Schreer, Fast hybrid block- and pixel-recursive disparity analysis for real-time...
  • P. Kauff, U. Höfker, U. Gölz, Immersive TV—the TV experience of the future?, in: Proceedings of 19th Annual Conference...
  • Cited by (48)

    • Low complexity Bi-Partition mode selection for 3D video depth intra coding

      2015, Displays
      Citation Excerpt :

      With the rapid growing in the three dimensional video market in recent years, research in this field has also been intensified in all areas, from 3D video capture to the 3D display technology. This also includes novel 3D video coding methods for efficient compression and transmission [1]. As the next generation video coding standard, High Efficiency Video Coding (HEVC) achieves superior bitrate and quality performance compared with that of the H.264/MPEG-4 AVC standard.

    • Early merge mode decision for texture coding in 3D-HEVC

      2015, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      In recent years, three-dimension video (3DV) [1] has become prevalent in a wide range of broadcasting services like 3D television (3DTV) [2], free viewpoint television (FTV) [3], video conference and home entertainment system.

    • Joint depth-motion dense estimation for multiview video coding

      2010, Journal of Visual Communication and Image Representation
    • Performance Analysis of Depth Intra-Coding in 3D-HEVC

      2019, IEEE Transactions on Circuits and Systems for Video Technology
    • Fast encoder decision for dependent texture coding in 3D-AVS

      2017, VCIP 2016 - 30th Anniversary of Visual Communication and Image Processing
    • A local-adapted disparity vector derivation scheme for 3D-AVS

      2017, VCIP 2016 - 30th Anniversary of Visual Communication and Image Processing
    View all citing articles on Scopus

    Expanded version of a talk presented at the International Conference on Augmented, Virtual Environments and Three-Dimensional Imaging, Mykonos, Greece, May–June 2001.

    View full text