Comparative study of frame-compatible stereo 3D services and a novel method for spatial interleaving using HEVC

https://doi.org/10.1016/j.jvcir.2014.11.012Get rights and content

Highlights

  • We compare spatial and temporal interleaving based frame-compatible methods.

  • We benchmark the methods with MVC and present the results for both H.264 and HEVC.

  • We propose to pack the views into a frame and encode the same in different tiles.

  • We also propose to disable in-loop filters across the tile boundary.

  • We benchmark the proposed method with the “no-tiles” case.

Abstract

There are a number of ways to realize frame-compatible stereo 3D services, which includes the spatial and temporal interleaving based methods. We first present the detailed analysis of these methods with respect to compression efficiency, backward compatibility and the ability to re-use existing infrastructure. Simulations have been set up to benchmark the Rate–Distortion (RD) performance of these options with the state-of-the-art Multiview Video Coding (MVC) and results show that temporal interleaving based method performs better when compared to spatial interleaving in both H.264/AVC as well as in HEVC compression schemes, with MVC still outperforms both the methods. In the case of the spatial interleaving method using HEVC, we propose to encapsulate the two views in different tiles with filters turned off across the tile boundary. Simulation results show that the proposed method, when compared to “no-tiles” case, improves decoder performance by approximately 50% with no reduction in RD performance.

Introduction

Success of movies like Avatar 3D along with the increase in the availability of cost effective 3D-ready devices (Televisions, Gaming consoles, Mobile phones) have renewed the interest in 3D and stereoscopic 3D video services are getting introduced across various industries. In the mobile industry, HTC Evo is one of the 3D mobile phones, which allow the user to capture the 3D video and stream the contents wirelessly to the 3DTV [1]. In the consumer industry, the Blu-ray consortium announced the support for 3D Blu-ray and almost all major TV manufacturers are producing 3D Blu-ray players. In the broadcast industry, Digital Video Broadcast (DVB) specification, which is used in a significant number of countries, provides the support for 3D. In the UK, the BBC introduced 3D broadcasting using the ‘spare’ available spectrum on BBC HD [2]. Hence, considerable momentum has been built up for 3D, with still the need to maintain backward compatibility with the existing 2D receivers (including set-top boxes (STB) and integrated receivers/decoders (IRD)).

RF spectrum is precious and the possibility of second digital dividend (UHF 700 MHz band) will make the spectrum even more precious [3]. Hence, apart from backward compatibility, the increased video data (right view of stereoscopic 3D) also have to be efficiently compressed, in addition to the ability to reuse the existing production and distribution infrastructures as much as possible. H.262/MPEG-2 [4] was the earliest standard, which provided support for the compression of stereoscopic 3D video material. It was not popular at that time, as display/compression technology was not advanced enough to adopt 3D video in the existing 2D infrastructure and workflow. Two decades later, the technologies have improved to an extent that H.264/AVC requires approximately half of the bandwidth when compared to MPEG-2 to deliver the same video content with almost the same quality [5] and one of the goals of HEVC [6] is to reduce this bandwidth requirement by further half. The state-of-the-art multiview video coding standard is the H.264 Multiview Video Coding (MVC) specification, which provides a compact representation of multiple views of a video scene and stereo 3D video has been considered a special case of MVC [7]. Though this specification has been adopted by the 3D Blu-ray consortium [8], unfortunately it has not been adopted in other industries such as broadcast. For example, DVB organization has adopted only the H.264/AVC and H.264 Scalable Video Coding (SVC) codecs [9], [10] for broadcast services from the H.264 family.

There are a number of ways to efficiently compress the stereoscopic 3D video material apart from using H.264 MVC. One approach is to squeeze the left and right views into a single frame and treat the result as a 2D frame (spatial interleaving). Alternatively, the left and right view frames could be temporally interleaved to form a 2D video sequence and compress the resultant 2D video using two temporal layers. In this scenario, enhancement layer contains the right view pictures and base layer contains the left view pictures. In addition to this, there are simulcast methods available which broadcasts both the 2D and 3D videos, where squeezing the left and right view videos into single 2D frame will form the 3D video. Apart from these methods, there are others, which include the support for encoding auxiliary information such as depth maps and camera parameters, along with the standards based compression for the efficient reconstruction of a scene. Though these methods are being studied in the context of H.264 AVC, H.264 MVC and HEVC [11], these are not standardized yet and hence not considered in the paper. All these methods have their own advantages and disadvantages (in terms of compression efficiency, backward compatibility, reuse of existing infrastructures, etc.) and to the best of the authors’ knowledge, no prior work has considered these aspects in the comparison of spatial and temporal interleaving based methods. In this paper, an attempt has been made to compare these methods in terms of the above-mentioned parameters and also proposes a novel method of encapsulating the two views in different tiles (for the case of spatial interleaving in HEVC) with filters turned off across the tile boundary, which enables parallel encoding/decoding of stereo video content.

The rest of the paper is organized as follows: Section 2 explains the various frame-compatible stereo 3D services, from the different perspectives as outlined in the last paragraph, Section 3 illustrates the proposed method where the two views are encapsulated in different tiles with filters turned off across tile boundary, Section 4 details the simulation setup and presents the simulation results, Section 5 provides the conclusions and finally references are listed at the end of the paper.

Section snippets

Spatial interleaving using Frame Packing Arrangement

The major advantage of spatially interleaving the left and right views to form a single 2D frame is that it allows the existing contribution and distribution infrastructures to be used effectively for the seamless introduction of stereoscopic 3D video services. Though there are a number of spatial interleaving frame-compatible formats, namely Side-by-Side (SbS), Top-And-Bottom (TaB), Row interleaved, Column interleaved and Checkerboard [8] exists, only SbS and TaB methods are used widely and

Proposed method

There is a growing interest in resolutions beyond HD and real time decoding of UHD (3840 × 2160 (4 K), 7680 × 4320 (8 K)) video content requires multi-core platforms [17]. However, to enable parallel decoding with minimal effect on coding efficiency, presence of independent units is required. A sample frame, by arranging the view_0 and view_1 frames of Akko & Kayo sequence [18] in SbS method is shown in Fig. 5. It could be seen that, the two views could be encapsulated in independent units as they

Simulation setup and results

There are two simulations setup. The objective of the first simulation is to evaluate spatial interleaving (FPA SEI method) and temporal interleaving (SVC-TS) with the state-of-the-art MVC method. The above benchmarking has been carried out for both H.264/AVC and HEVC compression specifications. As noted in section two since FPA method is used in simulcast environment, along with the base view encoding in 2D format, this method is compared with the H.264/MVC and H.264/SVC [27] specifications.

Conclusions

Various methods for the realization of frame-compatible stereo 3D services have been discussed in the paper along with their merits and demerits from a number of perspectives such as compression efficiency, reuse of existing infrastructures and backward compatibility with the existing 2D receivers. Simulation results show that the temporal interleaving scheme realized using SVC-TS provides better RD performance than spatial interleaving scheme (SbS and TaB based methods) in both H.264/AVC and

References (30)

  • Preparing 3D video with S3D metadata on HTC Evo <http://www.htcdev.com/devcenter/s3d> (last accessed on...
  • Andy Quested, Gearing up to Deliver Wimbledon 3D...
  • Ohji Nakagami, Teruhiko Suzuki, Giovanni Ballocca, Vittorio Giovara, Paola Sunna, Marco Arena, Jean Francois Travers,...
  • ITU-T and ISO/IEC JTC 1, Generic coding of moving pictures and associated audio information – Part 2: Video, ITU-T...
  • ITU-T and ISO/IEC JTC 1, Advanced video coding for generic audiovisual services, ITU-T Recommendation H.264 and ISO/IEC...
  • ITU-T and ISO/IEC JTC 1, High efficiency video coding, ITU-T Recommendation HEVC and ISO/IEC 23008-2,...
  • ITU-T and ISO/IEC Joint Technical Committee, Advanced video coding for generic audio-visual services, ITU-T...
  • A. Vetro et al.

    Overview of the stereo and multiview video coding extensions of the H.264/MPEG-4 AVC standard

    Proc. IEEE

    (2011)
  • L. Kondrad, I. Bouazizi, V.K.M. Vadakital, M.M. Hannuksela, M. Gabbouj, Cross-layer optimized transmission of H.264/SVC...
  • Digital Video Broadcasting (DVB), Specification for the use of video and audio coding in DVB services delivered...
  • Ying Chen, Thomas Rusert, Heiko Schwarz, Ad hoc on 3D video coding high level syntax, JCT2-VC Document, JCT2-A0006,...
  • Gary J. Sullivan, On the frame packing arrangement SEI message, Study Group 16, VCEG Document, VCEG-AO12,...
  • Digital Video Broadcasting (DVB), Frame compatible plano-stereoscopic 3DTV (DVB-3DTV), DVB Document...
  • Ohji Nakagami, Teruhiko Suzuki, On stereo 3D coding using frame packing arrangement SEI, JCT-VC Document, JCTVC-I0058,...
  • Ye-Kui Wang, AHG9: indication of frame-packed or interlaced video, JCT-VC Document, JCTVC-K0119,...
  • Cited by (0)

    This paper has been recommended for acceptance by M.T. Sun.

    View full text