Advanced texture and depth coding in 3D-HEVC

https://doi.org/10.1016/j.jvcir.2017.11.003Get rights and content

Abstract

The 3D extension of High Efficiency Video Coding (3D-HEVC) is a new international video coding standard developed by the Joint Collaborative Team on 3D Video Coding Extensions (JCT-3V) in order to support coding of multiple views and its associated depth data. 3D-HEVC aims at improving the coding efficiency of 3D and multi-view videos by introducing new coding tools to utilize the correlations between views and between texture and depth components. In this paper, an inter-view motion prediction (inter-view merge candidate) and an inter-component motion prediction (texture merge candidate) are proposed to explore the inter-view and the inter-component redundancies for texture and depth components, respectively. Moreover, a new coding mode termed as single depth mode which simply reconstructs a coding block with a single depth value based on block merging scheme under the HEVC quad-tree based block partitioning is also introduced. All the proposed schemes are adopted in 3D-HEVC. The experimental results evaluated under the common test conditions (CTC) for developing 3D-HEVC show that the proposed inter-view merge candidate, texture merge candidate, and single depth mode achieve significant BD-rate reductions of 19.5% for dependent texture views and 8.3% for the synthesized texture views.

Introduction

Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing and the multi-view video is a key technology for 3D TV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. Due to the strong demand of improving the coding efficiency of 3D and multi-view videos caused by the requirements of coding multiple view data, larger picture resolution and better quality, various techniques have been proposed [1], [2].

As an extension of HEVC and a next generation 3D video coding standard, the standardization of 3D-HEVC video coding standard was formally launched by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V) in July 2012 and was finalized after the 11th JCT-3V meeting held in February 2015. In order to support the auto-stereoscopic multi-view display more practically, multi-view video plus depth (MVD) format was introduced as a new 3D video format for 3D-HEVC [3]. The MVD format consists of a texture picture and its associated depth map. Unlike a texture picture representing the luminance and chrominance information of an object, a depth map is an image containing information relating to the distance of the objects from the camera-captured plane, and it is generally employed for virtual view rendering as non-visual information.

Since all cameras capture the same scene from different viewpoints, a multi-view video contains a large amount of inter-view redundancy. In 3D-HEVC, to share the previously encoded motion information of adjacent views, the motion information for a current block can be predicted by the motion information of one or more corresponding blocks, which are located by a disparity vector (DV), in the inter-view pictures [4], [5]. The disparity vector for locating inter-view corresponding blocks could be derived either by the coded disparity motion vector or the depth information from a corresponding block [6]. To fully utilize the motion information of the inter-view pictures, a sub-PU inter-view motion prediction (SPIVMP) method is applied to obtain the motion predictor at fine granularity [7]. Since the texture picture and its associated depth map correspond to projections of the same scenery from the same viewpoint at the same time instant, the motion characteristics of the texture picture and its associated depth map should be similar. In order to enable efficient encoding of the depth map data, the motion information of depth map could also be predicted or inherited from the corresponding video signal [8]. To explore the inter-view and the inter-component redundancies for texture and depth coding, in this paper, we introduce the inter-view motion vector prediction and the inter-component motion vector prediction to inherit the motion information from the neighboring views or from the associated texture picture for 3D video coding.

Moreover, as an example shown in Fig. 1(a) and (b), the depth map shows different signal characteristics compared to natural video data. The most intuitive perception is that depth map contains a lot of smooth area with similar pixel value. For most of the cases, the pixels within the smooth area even share one identical pixel value. The existing intra prediction modes (e.g. DC, planar and 33 angular prediction) in [9] cannot efficiently specify the pixels within current block sharing the same single depth value which is also the same as one of the neighboring pixel.

In 3D-HEVC, pictures are usually decomposed into blocks such that each block is associated with a particular set of model or coding parameters. Each block is either spatially or temporally predicted and the resulting prediction residual is represented by using transform coding. For the purpose of partitioning, quad-tree structured schemes are suitable for image coding, as they can be optimized in the rate-distortion (R-D) sense by simple algorithms. However, it has been pointed out that quad-tree structured partitioning may result in suboptimal RD performance when dependencies between leaf nodes of different parents are not exploited [10].

In this paper, we further propose the single depth intra mode to efficiently code the smooth area within a depth picture. The concept of single depth mode is to simply reconstruct the current coding unit (CU) as a smooth area with a single depth sample value. With the help of this new coding mode, the smooth area within a depth map can be coded more efficiently by incorporating a leaf merging of the pixel values. The proposed single depth mode has been adopted into 3D-HEVC and has been evaluated under common test conditions [11].

To provide a satisfactory introduction to the motion vector (MV) prediction, MV coding and the depth intra prediction in 3D-HEVC, in this paper, a general overview of basic coding structure, MV coding techniques and depth intra prediction techniques are first presented in Section 2. Sections 3 Inter-view motion vector prediction, 4 Inter-component motion vector prediction, 5 Single depth intra mode describe our proposed inter-view motion prediction, inter-component motion prediction schemes and single depth mode which were adopted into the 3D-HEVC standard, respectively. Experimental results and conclusions are given in Sections 6 Experimental results, 7 Conclusion, respectively.

Section snippets

Quad-tree partitioning structure

As shown in Fig. 2, Fig. 3, 3D-HEVC is being developed for encoding/decoding multi-view video sequences. One of the views, which is also referred to as the base view or the independent view, is coded independently of the other views using a conventional HEVC video coder. The other views are usually termed as dependent views since they may be coded depending on the data of the other views.

The dependent texture views and all depth views apply 3D-HEVC coding which is also based on a hybrid

Inter-view motion vector prediction

The basic concept of the proposed inter-view prediction of motion parameters is illustrated in Fig. 3. In order to derive the motion parameters of inter-view motion prediction (IVMP) candidate for a current PU in a dependent view, a disparity vector (DV) is derived for the current prediction unit (PU). By adding the derived DV to the center position of the current PU, a reference sample location is obtained. The prediction block that covers the sample location in the already coded picture of

Inter-component motion vector prediction

To explore the inter-component motion correlation between texture and depth map, a texture merge candidate is proposed for depth coding to share the coded motion information of the associated texture picture. The proposed texture merge candidate is added into the merge candidate set for the depth merge/skip mode coding.

In texture coding, the motion vectors (MVs) and reference index of the corresponding block in the inter-view are reused as an inter-view merge candidate. Similar to the concept

Single depth intra mode

Although the existing 3D-HEVC Intra prediction schemes exploit the localized prediction process to predict the pixel values of current block from the neighboring reconstructed pixels, it does not provide an efficient way to indicate the pixels within current block sharing single depth value with the pixels of the neighboring blocks. This may still result in redundant sets of prediction parameters being transmitted. For example, if a given CU is divided into four sub CUs, all sub CUs are

Experimental results

The evaluation of the coding efficiency of the proposed algorithms is performed based on HTM-16.0 [21], and several experiments are conducted in comparison with the anchor generated by HTM-16.0 under the common test conditions used for 3D-HEVC standardization activities. The hierarchical B prediction structure is utilized in the common test conditions [22] and the testing sequences are listed in Table 2.

The coding performance was measured by the Bjøntegaard-distortion (BD-rate) saving [23]

Conclusion

To exploit the inter-view and inter-component motion redundancies, in this paper, we have introduced the inter-view motion vector prediction as well as the inter-component motion vector prediction to inherit the motion information from the neighboring views or from the associated texture picture for 3D video coding. To further improve the coding efficiency, we proposed a single depth intra mode to provide a functionality of pixel domain merging scheme. A single depth coded depth CU is simply

References (23)

  • ISO/IEC JTC1/SC29/WG11, Text of ISO/IEC 14496-10:200X/FDAM 1 Multiview Video Coding, Doc. N9978,...
  • M.M. Hannuksela, Y. Chen, T. Suzuki, J.-R. Ohm, G. Sullivan, 3D-AVC Draft Text 8, Doc. JCT3V-F1002,...
  • G. Tech, K. Wegner, Y. Chen, S. Yea, 3D-HEVC Draft Text 6, JCT3V-J1001,...
  • L. Zhang, Y. Chen, V. Thirumalai, J.-L. Lin, Y.-W. Chen et al., Inter-view motion prediction in 3D-HEVC, in: Proc....
  • J. An, Y.-W. Chen, J.-L. Lin, Y.-W. Huang, S. Lei, 3D-CE5.h related: inter-view motion prediction for HEVC-based 3D...
  • Y.-L. Chang, Y.-W. Chen, J.-L. Lin, N. Zhang, J. An, Y.-W. Huang, S. Lei, 3D-CE2.h related: Simplified DV Derivation...
  • J. An, K. Zhang, J.-L. Lin, S. Lei, 3D-CE3: Sub-PU level inter-view motion prediction, Doc. JCT3V-F0110,...
  • Y.-W. Chen, J.-L. Lin, Y.-W. Huang, S. Lei, 3D-CE3.h results on removal of parsing dependency and picture buffers for...
  • J. Lainema et al.

    Intra coding of the HEVC standard

    IEEE Trans. Circuits Syst. Video Technol.

    (Dec. 2012)
  • P. Helle et al.

    Block merging for quadtree-based partitioning in HEVC

    IEEE Trans. Circuits Syst. Video Technol.

    (Dec. 2012)
  • Y.-W. Chen, J.-L. Lin, Y.-W. Huang, S. Lei, 3D-CE2: Single depth intra mode for 3D-HEVC, JCT3V-I0095,...
  • This paper has been recommended for acceptance by Zicheng Liu.

    View full text