HEVC backward compatible scalability: A low encoding complexity distributed video coding based approach

https://doi.org/10.1016/j.image.2015.02.003Get rights and content

Highlights

  • Proposes a novel HEVC backward compatible scalable extension solution, named DSVC, which exploits the distributed video coding principle at the enhancement layers (ELs).

  • Proposes several advanced coding tools which are integrated in a quality scalable DSVC framework, notably a novel side information creation process, a correlation modeling solution and a novel EL coding mode selection method.

  • Illustrates the advancements of DSVC framework in respect to the relevant alternative simulcast and scalable coding solutions while offering a lower encoding complexity.

Abstract

The growing heterogeneity and dynamic nature of the networks, terminals, and usage environments has boosted the need for powerful scalable video coding engines able to efficiently adapt to changing consumption conditions. Some emerging applications such as video surveillance, visual sensor networks, and remote space transmission require scalable coding solutions, which are not only compression efficient but also provide low encoding complexity and high error resilience. Following the call for scalable extensions of the emerging High Efficiency Video Coding (HEVC) standard, targeting the so-called Scalable HEVC (SHVC) standard, this paper proposes a novel scalable video coding solution offering quality scalability by combining the predictive and distributed video coding approaches by means of an HEVC compliant base layer and distributed coding enhancement layers. The proposed Distributed Scalable Video Coding (DSVC) solution follows an Intra-encoding with Inter-decoding approach to provide low encoding complexity and error robustness while achieving high compression. Towards this objective, this paper adopts a novel coding architecture and proposes several novel enhancement layers coding tools, notably for side information creation, correlation modeling, and coding mode selection. The experimental results reveal that the DSVC RD performance outperforms the relevant alternative coding solutions, notably by up to 33.22% and 11.28% BD-Rate gains regarding the relevant HEVC-Simulcast and SHVC-Intra-benchmarks, while achieving a lower encoding complexity.

Introduction

Multimedia applications have been playing a major role in the current society with video coding technologies largely driving the development of new services and applications to provide increasing quality of experience. These applications typically deploy a powerful video compression engine, following the so-called predictive coding paradigm, largely adopted by the available video coding standards. The state-of-the-art on predictive video coding is nowadays represented by the recent High Efficiency Video Coding (HEVC) standard [1], which targets all resolutions, now up to ultra-high definition (UHD) video content. As usual, this standard provides about 50% bitrate reduction for the same perceptual quality comparing to the previously available standards [2], notably the largely market deployed H.264/AVC (Advanced Video Coding) standard [3]. While the H.264/AVC and HEVC standards are highly efficient, they are not able to provide the adaptation capabilities necessary for the large heterogeneity of networks, devices, consumption environments and user preferences, which have only increased over the years. Moreover, the connections may have different capabilities and characteristics along time making necessary a dynamic adaptation of the transmitted video streams to provide the best user experience. To address these adaptation requirements, several scalable video coding solutions have been developed in the past, e.g. in the MPEG-2 Video and MPEG-4 Visual standards. In these scalable video coding solutions, the video content is coded in such a way that partial, still compliant, bitstreams can be easily extractable from the full bitstream to provide video services with lower temporal or spatial resolutions, or reduced fidelity while retaining a reconstruction quality commensurate to the rate of the partial bitstreams [4]. This contrasts to the more commonly used non-scalable video coding approach where the full bitstream has to be received as otherwise a significant quality degradation may happen, e.g. associated to packet losses and dynamic bandwidth reductions. Scalable coding adopts a ‘code once for all’ approach as the same single bitstream can be easily ‘manipulated’ to provide a variety of sub-bitstreams appropriate for different consumption conditions; ideally, these partial bitstreams should provide a rate-distortion (RD) performance similar to the RD performance of a non-scalable stream coded for the same specific conditions, notably for the same rate. The rate difference between these scalable and non-scalable streams (for the same final quality) expresses the rate penalty of the scalability functionalities and its reduction has been a major research target in the last 25 years.

The most recent scalable video coding standard addressing these adaptation requirements is the H.264/AVC scalable video coding extension, named Scalable Video Coding (SVC) standard [4], developed around 2007. The SVC standard adopts a hierarchical layered design where subsets of the video bitstream can be decoded with different qualities or/and spatial and temporal resolutions. Contrary to past scalable video coding standards, the SVC standard builds on the highly efficient H.264/AVC standard providing a backward compatible base layer (BL) and additional enhancement layers (ELs). Moreover, it asks for a small rate penalty regarding equivalent non-scalable coding, notably only around 10% of the total bitrate regarding the H.264/AVC single layer (non-scalable) benchmark with equivalent quality/resolution, while supporting several scalability features [5]. Due to the usage of multiple prediction coding modes within (intra-layer) and between (inter-layer) the scalable layers, the compression efficiency has been brought to a level that is considered competitive for many applications and services.

The posterior development of the HEVC standard, with its 50% compression gains compared to H.264/AVC, and the increasing market relevance of heterogeneous and dynamic transmission environments, have boosted the development of a new scalable video coding standard, with a compression performance beyond the SVC standard. The emerging requirements include increased spatial, temporal, and bit-depth resolutions, random access and error resilience, low encoding complexity and, naturally, increased compression efficiency [6]. These requirements emerge from a large variety of applications and services [7], many giving special emphasis to low encoding complexity, which is the specific scenario addressed in this paper. In this context, the ISO/IEC MPEG and ITU-T VCEG groups have decided to launch a joint Call for Proposals targeting a scalable extension of the HEVC standard [8], well known as Scalable HEVC (SHVC), providing BL HEVC backward compatibility.

While the current predictive video coding paradigm mainly targets one-to-many applications where few, rather complex encoders feed numerous, much simpler decoders, many emerging applications such as wireless video surveillance, visual sensor networks and remote space transmission have different complexity requirements and match better an alternative coding paradigm where simpler encoders should feed still simple or also more complex decoders. These emerging applications can also greatly benefit from scalable video coding, which should offer different levels of spatial, temporal and quality scalabilities to optimize the system resources, such as computational power and network bandwidth, and meet the dynamic adaptation needs in terms of network, storage and terminals. In addition, these applications typically operate in error-prone and congested channels (mostly wireless), thus asking for robustness to channel (bit) errors and packet losses. This request is typically addressed by mitigating the effects of error propagation and drift, rather common in predictive video coding.

Distributed video coding (DVC) has attracted much attention in the last decade due to its specific coding features, notably a flexible distribution of the codec complexity, inherent error robustness, and codec independent scalability [9] as no prediction loop is used as in predictive coding. DVC is based on the well-known Slepian–Wolf and Wyner–Ziv theorems [10], [11], which allow exploiting the temporal correlation at the decoder rather than at the encoder without RD performance penalty, under certain specific conditions; this allows moving complexity to the decoder, thus achieving simpler encoders. The literature shows that there are two main practical approaches to DVC design: the DVC Stanford solution [12] and the DVC Berkeley solution, known as PRISM from Power-efficient, Robust, high compression Syndrome based Multimedia coding [13].

Inspired by the strengths of both the predictive and distributed video coding paradigms, this paper proposes a novel scalable video coding framework for applications asking for HEVC backward compatibility, high compression efficiency, low encoding complexity and error resilience, combining the strengths of the HEVC standard at the BL with the strengths of DVC in the ELs. Following its features, this scalable coding framework is called Distributed Scalable Video Coding (DSVC). While most current scalable video coding research follows a predictive approach also for the ELs, e.g. SHVC, this paper intends to demonstrate that there are alternative, and still competitive, less conventional coding architectures, which is very healthy from a research point of view. The importance of the low complexity application scenarios is also largely recognized by the JCT-VC standardization group not only in the SHVC requirements but also in the context of the SHVC development [7] by defining a specific low-complexity Intra-scenario for performance assessment where no temporal predictions are allowed in the coding process. By creating a mix between spatial (inter-layer) encoder predictions and temporal decoder estimations, the compression efficiency can be greatly improved in this paper without compromising the error resilience when compared to traditional Intra-coding solutions. In addition, backward compatibility is assured since the BL can be compliant with any non-scalable video coding standard such as HEVC and H.264/AVC in any codec/profile configuration.

In summary, the major goal of this paper is to propose a novel HEVC backward compatible scalable extension solution based on a more challenging combination of coding technologies to address the requirements of the aforementioned emerging applications in terms of high compression, low encoding complexity and error resilience. The technical novelty of this paper regards not only the novel scalable video coding architecture but also: i) a novel SI creation process to be used at both the encoder and decoder to drive the correlation modeling, and at the decoder to drive the EL reconstruction; ii) a novel correlation modeling solution to determine at both the encoder and decoder the number of least significant bitplanes which are worthwhile to encode; and finally, iii) a novel EL coding mode selection process to decide the best coding mode for the EL residue. These novel coding tools have been integrated in a quality scalable DSVC framework (which is also temporally scalable) with a compression performance outperforming the most relevant alternative simulcast and scalable coding solutions, notably by up to 33.22% and 11.28% (BD-Rate) with respect to the HEVC-Simulcast and SHVC-Intra-benchmarks, while offering a lower encoding complexity.

To achieve its objectives, this paper has been organized as follows: Section 2 reviews the relevant background work on scalable video coding. Next, Section 3 presents the selected DSVC requirements and the derived DSVC encoder and decoder architectures. Next, Section 4 describes in detail the novel coding tools while Section 5 presents and discusses the DSVC performance in comparison with relevant benchmarks. Finally, Section 6 presents the main conclusions and ideas for future work.

Section snippets

Relevant background works

To achieve a scalable video coding engine, a set of hierarchical layers is usually adopted with one BL and one or more ELs. If backward compatibility is a requirement, the BL frames are coded with a predictive coding standard such as H.264/AVC or HEVC, while the EL frames can be coded following either predictive or distributed coding principles. While scalable coding standards have been adopting the predictive coding paradigm for both the BL and ELs, there are several examples in the literature

Distributed scalable video coding solution

Before going into the details on the novel DSVC codec tools, this section describes the considered requirements, as well as the encoder and decoder architectures designed to address them.

Novel DSVC coding tools

In the proposed DSVC framework, some HEVC coding techniques such as the DCT and SQ are combined with novel coding tools such as the SI residue creation, correlation modeling, EL coding mode selection, residue nested scalar quantization, and modified CABAC coding. While some of the tools are applied only at the encoder such as the EL coding mode selection, most tools are applied at both the encoder and decoder, sometimes in a similar way such as the SI residue creation and correlation modeling

Performance assessment

This section presents and discusses the performance evaluation of the proposed DSVC codec, notably the RD performance and encoding complexity are compared to the most relevant benchmarks.

Conclusions

This paper has presented a novel low encoding complexity scalable video coding solution, backward compatible with the emerging HEVC standard in the BL and adopting a DVC approach for the ELs. This coding design guarantees that a low encoding complexity can be achieved while the temporal error propagation and drift are avoided, thus satisfying the requirements of many emerging applications such as video surveillance, visual sensor networks, and remote space transmissions. To improve the

References (42)

  • ISO/IEC JTC 1/SC 29/WG 11 and ITU-T SG16 WP3, Joint Call for Proposals on Scalable Video Coding Extensions of High...
  • D. Slepian et al.

    Noiseless coding of correlated information sources

    IEEE Trans. Inf. Theory

    (1973)
  • A.D. Wyner et al.

    The rate-distortion function for source coding with side information at the decoder

    IEEE Trans. Inf. Theory

    (1976)
  • B. Girod

    Distributed video coding

    Proc. IEEE

    (2005)
  • R. Puri and K. Ramchandran, PRISM: a new robust video coding architecture based on distributed compression principles,...
  • W. Li

    Overview of fine granularity scalability in MPEG-4 video standard

    IEEE Trans. Circuits Syst. Video Technol.

    (2001)
  • P. Helle, et al., A scalable video coding extension of HEVC, in: Proceedings of the Data Compression Conference,...
  • G.J. Sullivan et al.

    Standardized extensions of High Efficiency Video Coding (HEVC)

    IEEE J. Sel. Top. Signal Process.

    (2013)
  • X. Li, J. Chen, K. Rapaka, and M. Karczewicz, Generalized inter-layer residual prediction for scalable extension of...
  • Z. Zhao, J. Si, J. Ostermann, and W. Li, Inter-layer intra mode coding for the scalable extension of HEVC, in:...
  • M. Guo, S. Liu, S. Lei, J. Min, and T. Lee, Inter-layer intra mode prediction for scalable extension of HEVC, in:...
  • Cited by (12)

    • Neural texture transfer assisted video coding with adaptive up-sampling

      2022, Signal Processing: Image Communication
      Citation Excerpt :

      Therefore, more advanced video compression techniques are of vital importance, which will support more efficient storage and transmission of videos. During the last three decades, the development of traditional statistical video compression methods [3–7] has somewhat saturated and most recent endeavors turned to deep learning models [8–10], which have proved their capacity to discover knowledge from unstructured massive data and provide data-driven predictions. Deep learning has the potential to provide new opportunities for further upgrading video coding technologies.

    • An improved encoding scheme for distributed video coding based on wavelet transform

      2020, Communications in Computer and Information Science
    • Fast side information generation for high-resolution videos in distributed video coding applications

      2020, International Journal of Advanced Computer Science and Applications
    • Complexity controlled side information creation for distributed scalable video coding

      2019, Proceedings - 2019 3rd International Conference on Recent Advances in Signal Processing, Telecommunications and Computing, SigTelCom 2019
    • Scalable distributed video coding for wireless video sensor networks

      2018, IEICE Transactions on Information and Systems
    View all citing articles on Scopus
    View full text