Dynamic computational complexity and bit allocation for optimizing H.264/AVC video compression

doi:10.1016/j.jvcir.2007.05.002

Journal of Visual Communication and Image Representation

Volume 19, Issue 1, January 2008, Pages 56-74

https://doi.org/10.1016/j.jvcir.2007.05.002 Get rights and content

Abstract

In this work, we present a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity (such as a number of CPU clocks) and bits for encoding each coding element (basic unit) within a video sequence, according to its predicted MAD (mean absolute difference). Our approach is based on a computational complexity–rate–distortion (C–R–D) analysis, which adds a complexity dimension to the conventional rate–distortion (R–D) analysis. Both theoretically and experimentally, we prove that by implementing the proposed approach for the dynamic allocation better results are achieved. We also prove that the optimal computational complexity allocation along with optimal bit allocation is better than the constant computational complexity allocation along with optimal bit allocation. In addition, we present a method and system for implementing the proposed approach, and for controlling computational complexity and bit allocation in real-time and off-line video coding. We divide each frame into one or more basic units, wherein each basic unit consists of at least one macroblock (MB), whose contents are related to a number of coding modes. We determine how much computational complexity and bits should be allocated for encoding each basic unit, and then allocate a corresponding group of coding modes and a quantization step-size, according to the estimated distortion (calculated by a linear regression model) of each basic unit and according to the remaining computational complexity and bits for encoding remaining basic units. For allocating the corresponding group of coding modes and the quantization step-size, we develop computational complexity–complexity step–rate (C–I–R) and rate–quantization step-size–computational complexity (R–Q–C) models.

Introduction

The ITU-T H.264/AVC (ISO/IEC MPEG-4 Part 10) video coding standard [1], [2] has become a challenge for real-time and off-line video applications. Compared to others standards, it gains about 50% in bit rate, while providing the same visual quality. In addition to having all the advantages of MPEG-2 [2], H.263 [3] and MPEG-4 [4], the H.264 video coding standard possesses a number of improvements, such as the content-adaptive-based arithmetic codec (CABAC) [5], enhanced transform and quantization, prediction of “Intra” macroblocks (spatial prediction), and others. H.264 is designed for both constant bit rate (CBR) and variable bit rate (VBR) video coding, useful for transmitting video sequences over statistically multiplexed networks (e.g., asynchronous transfer mode (ATM), the Ethernet, or other Internet networks). This video coding standard can also be used at any bit rate range for various applications, varying from wireless video phones to HDTV and digital video broadcasting (DVB) [6]. In addition, H.264 provides significantly improved coding efficiency and greater functionality, such as rate scalability, “Intra” prediction and error resilience in comparison with its predecessors, MPEG-2 and H.263. However, H.264 is much more complex in comparison to other coding standards and to achieve maximum quality encoding, high computational resources are required.

In the last decade, several rate control and bit allocation methods have been proposed for minimizing distortion in video compression standards, preceding the H.264/AVC standard. Conventional optimal encoding methods [7], [8] decrease a video sequence distortion only by optimizing the bit allocation. In [8] has been presented a theoretical study for achieving an optimal bit allocation and minimizing distortion by considering a relationship between rate and distortion and finding an optimal set of quantizers for a given information source. In [7] another solution is proposed for achieving an optimal bit allocation and minimizing distortion by implementing the Viterbi algorithm. However, these solutions are overcomplicated, since they implement dynamic programming for updating quantizers settings. More recent papers, such as [9] propose a feedback rate control scheme for minimizing distortion in the MPEG-2 and MPEG-4 by calculating the target bit rate for each frame basing on a quadratic equation of a rate distortion function. However, [9] does not consider encoding mode selection and computational complexity for minimizing distortion and therefore does not provide an optimal solution. In addition, [10] proposes a method for determining a set of optimal coding modes for encoding each macroblock in the H.264/AVC standard. According to this method, the rate distortion optimization (RDO) for each macroblock is performed for selecting an optimal coding mode by minimizing a Lagrange function. However, according to [10] the coding mode selection is also overcomplicated, since all coding modes are considered for selecting the optimal coding mode. Further, in [11] is implemented the quadratic rate control scheme for the H.264/AVC standard. However, [11] does not consider computational complexity and considers only the quantization settings and optimal mode selection. Thus, similarly to [10], the method of [11] provides an optimal solution only if all coding modes are selected. Further, in [12] is proposed a complexity control algorithm for the H.264 encoder. Computational savings are achieved by early prediction of skipped macroblocks prior to motion estimation through estimating a Lagrange rate–distortion–complexity cost function. A feedback control algorithm ensures that the encoder maintains a predefined target computational complexity. However, according to [12] each macroblock is either skipped (not coded) or coded by considering all coding modes, leading to over-complexity because of a large number of possible coding modes, and leading (especially for low bit rates) to significant fluctuations of distortions between each skipped and coded macroblock. In the most recent paper [13], a power–rate–distortion (P–R–D) analysis framework is proposed, extending the traditional R–D analysis by including the power consumption as an additional dimension. However, according to [13], a power consumption is considered for determining an overall constant complexity level according to the average P–R–D model. Thus, the solution of [13] is not optimal, since that activity of each coding element is not considered.

In this paper, we overcome all drawbacks mentioned above by developing novel techniques for providing real-time and off-line high quality video coding for H.264/AVC applications. We suggest a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity (such as a number of CPU clocks) and bits of each basic unit within a video sequence, according to its predicted MAD (mean absolute difference). We define a basic unit as a group of adjacent MBs. A basic unit can be a MB, slice, field, or frame [14]. Our approach is based on a computational complexity–rate–distortion (C–R–D) analysis, which adds a complexity dimension to the conventional rate–distortion (R–D) analysis. Both theoretically and experimentally, we prove that by implementing the proposed approach better results are achieved. We also prove that the optimal computational complexity allocation along with optimal bit allocation is better than the constant computational complexity allocation along with optimal bit allocation. The computational complexity issue is critical for the present and future real-time video applications implemented by the AVC/H.264 standard, which has a large number of coding modes. In conventional advanced video coding applications, these coding modes are not fully selected at the time of video sequence encoding, since selecting all possible coding modes leads to a significant increase of the overall computational complexity. The greater computational complexity is, the larger processing (power) resources are required; and when used for mobile/wireless devices, the power issue becomes critical. On the other hand, not selecting all possible coding modes leads to an increase of the encoded video sequence distortion, and in turn to a decrease of the overall video quality. Therefore, by dynamically allocating the computational complexity and bits for encoding each basic unit within a video sequence, we minimize the video sequence distortion and achieve better video quality. In addition, we present a method and system for implementing the approach for the dynamic allocation. According to this method, the overall encoding process can be performed at different levels of video quality. Higher quality levels require more computational complexity (in terms of CPU clocks). When setting the quality levels, we take into account the computational constraints of our system, the characteristics of the input, such as video sequence statistics, and the characteristics of the output, such as the distortion and a number of CPU clocks that was required for encoding each basic unit.

In order to change the overall computational constraints, the proposed method needs to be robust. The difference in the overall video sequence quality, related to an optimal and constant computational complexity and bit allocations, defines the level of robustness. Our method relies on the prior video quality estimates for a given implementation. Furthermore, for making tradeoffs between the computational complexity and bit allocation and the video quality, while appropriately allocating groups of coding modes for different basic units within a given video sequence, the method relies on the past observations of the encoding process inputs and outputs. Moreover, our method and system can be used for the real-time and off-line implementations for maximizing the overall processing performance. We achieve in real-time a maximal processing resources usage, such as the CPU usage for a predetermined period of time (see Section 2). As a result, the method is computationally efficient.

The structure of this paper is as follows. At the beginning of Section 2, we review two main problems related to the computational complexity and bit allocation of the conventional off-line and real-time video encoding and decoding systems. Then, in Section 2.1, we describe an optimal coding modes selection, and in Section 2.2 we present a complete theoretical study of computational complexity and bit allocation problems based on the C–R–D analysis. In Section 3, we propose a method and system for implementing the approach for the dynamic allocation by providing frame and basic unit levels computational complexity and bit rate control. Experimental results and conclusions are given in Sections 4 Experimental results, 5 Conclusions, respectively. In Section 6, a future research work is presented.

Section snippets

Computational complexity and bit allocation problems

There are two main problems related to computational complexity and bit allocation of a conventional off-line and real-time video encoding and decoding systems (see Fig. 1). One problem is that according to the conventional R–D analysis, a user or a system is unable to define (manually or automatically) the computational complexity (such as a number of CPU clocks, memory bandwidth usage, and power consumption) for the overall encoding process. This issue becomes critical for video applications

Method and system for dynamic allocation of encoding computational complexity and bits

As described above, while selecting the minimal encoding computational complexity, we cannot obtain an optimal set of coding modes for encoding each basic unit, and as a result we achieve a maximal distortion. In other words, the minimal encoding computational complexity relates to a single coding mode, and as a result, to a maximal distortion.

The H.264/AVC standard has a conventional method [10], [14] for determining an optimal coding mode for encoding each macroblock. According to [10] and

Experimental results

For simulating the proposed dynamic allocation approach, we statistically provide, for the simplicity, a set of best groups of coding modes for each video sequence. Also, for the simplicity, we define a basic unit as a frame. Each basic unit receives its constant set of best groups of coding modes. We have 12 variable coding modes, and then for each tested video sequence, we perform 2¹² = 4096 experiments, creating convex-hull graphs based on the obtained results. From each convex-hull graph we

Conclusions

In this work, we have introduced a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity and bits for each basic unit within a video sequence, according to its predicted MAD. We have based the proposed approach on the C–R–D analysis. Both theoretically and experimentally, we have proved that better results are achieved. We also have proved that the optimal computational complexity allocation along with optimal bit allocation is better than

Future research

In future research we plan to improve our dynamic allocation by:

1.
Developing an algorithm to take the place of the statistical approach presented in this paper.
2.
We should consider encoder and decoder buffers and end-to-end delay, which were not taken into the account in this work.
3.
Developing a computational complexity and rate control algorithm based on human visual perception, where each frame can be divided into a number of regions. Each region can be perceived differently by the human visual

Acknowledgment

We are grateful thank to L. Izrin for help in developing the program code.

Evgeny Kaminsky received B.Sc. and M.Sc. degree in Electrical and Computer Engineering from Ben-Gurion University in 1995 and 1999, respectively. He worked as a VLSI Engineer with VisionTech Ltd. (now BroadCom Israel) in the area of MPEG-2, and with Intel Israel in the area of Pentium-4 manufacturing testing. He is presently a Ph.D. student at the Department of Electrical and Computer Engineering at Ben-Gurion University. Mr. Kaminsky is interested in video information processing, algorithms

References (21)

T. Wiegand, Working draft number 2, Revision 2 (WD-2), in: Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, March...
T. Wiegand, G. Sullivan, Final draft ITU-T recommendation and final draft international standard of joint video...
ITU Telecom Standardization Sector of ITU, H.263, Video coding for low bit rate communications, ITU-T Recommendation...
T. Sikora
The MPEG-4 video standard verification model
IEEE Trans. Circuits Syst. Video Technol.
(1997)
VCODEX: H.264 tutorial white papers. [Online]. Available from:...
U. Reimers
Digital video broadcasting
IEEE Commun. Mag.
(1998)
A. Ortega et al.
Optimal trellis-based buffered compression and fast approximations
IEEE Trans. Image Processing
(1994)
J. Choi et al.
A stable feedback control of the buffer state using the controlled Lagrange multiplier method
IEEE Trans. Image Processing
(1994)
T. Chiang et al.
A new rate control scheme using quadratic rate distortion model
IEEE Trans. Circuit Syst. Video Technol.
(1997)
T. Wiegand et al.
Rate-constrained coder control and comparison of video coding standards
IEEE Trans. Circuit Syst. Video Technol.
(2003)

There are more references available in the full text version of this article.

Cited by (25)

Semantical video coding: Instill static-dynamic clues into structured bitstream for AI tasks
2023, Journal of Visual Communication and Image Representation
Traditional media coding schemes typically encode image or video into a semantic-unknown binary stream, which fails to directly support downstream intelligent tasks at the bitstream level. Semantically Structured Image Coding (SSIC) (Sun et al., 2020) makes the first attempt to enable partial-decoding image intelligent task analysis via a Semantically Structured Bitstream (SSB). However, the SSIC considers image coding and its generated SSB only contains the static object information. In this paper, we propose an advanced Semantically Structured Video Coding (SSVC). Video signals contain more rich dynamic motion information and redundancy. Thus, we present a reformulation of semantically structured bitstream (SSB) in SSVC which contains both static object characteristics and dynamic motion clues. Specifically, we introduce optical flow to encode continuous motion information and reduce cross-frame redundancy via a predictive coding architecture, then the optical flow and residual information are reorganized into SSB, which enables the proposed SSVC could better adaptively support video-based downstream intelligent applications. Extensive experiments on various vision tasks demonstrate that the proposed SSVC framework could directly support multiple intelligent tasks just depending on a partially decoded bitstream, saving bitrate consumption for intelligent analytics.
A generic, comprehensive and granular decoder complexity model for the H.264/AVC standard
2014, Journal of Visual Communication and Image Representation
Citation Excerpt :
To do so, we should estimate the required amount of power consumption for any desired combination of encoding parameters and modes using a complexity model. Considering the computational complexity of the H.264/AVC encoder, several encoder complexity modeling and management methods for the encoder side are proposed in the literature [2–31]. At the same time, others have worked on the complexity of the decoder and have proposed methods to model and control its complexity [32–41].
With recent advances in computing and communication technologies, ubiquitous access to high quality multimedia content such as high definition video using smartphones, netbooks, or tablets is a fact of our daily life. However, power consumption is still a major concern for portable devices. One approach to address this concern is to control and optimize power consumption using a power model for each multimedia application, such as a video decoder. In this paper, a generic, comprehensive and granular decoder complexity model for the baseline profile of H.264/AVC decoder has been proposed. The modeling methodology was designed to ensure a platform and implementation independent complexity model. Simulation results indicate that the proposed model estimates decoder complexity with an average accuracy of 92.15% for a wide range of test sequences using both the JM reference software and the x264 software implementation of H.264/AVC, and 89.61% for a dedicated hardware implementation of the motion compensation module. It should be noted that in addition to power consumption control, the proposed model can be used for designing a receiver-aware H.264/AVC encoder, where the complexity constraints of the receiver side are taken into account during compression. To further evaluate the proposed model, a receiver-aware encoder has been designed and implemented. Our simulation results indicate that using the proposed model the designed receiver aware encoder performs similar to the original encoder, while still being able to satisfy the complexity constraints of various decoders.
Depth perceptual region-of-interest based multiview video coding
2010, Journal of Visual Communication and Image Representation
MultiView Video (MVV) has attracted considerable attention recently since it is capable of providing users with three-dimensional perception and interactive functionalities. However, these MVV data require large mount of storage and bandwidth in network transmission. In this paper, we present a novel Depth Perceptual Region-Of-Interest (DP-ROI) based Multiview Video Coding (RMVC) scheme to extensively improve data compression efficiency by exploiting redundancies in depth perception. Firstly, we define DP-ROI according to the three-dimensional depth sensation of human visual system. Then, a framework of RMVC is developed to improve compression efficiency by properly segmenting the MVV into different macroblock wise DP-ROIs and encoding them separately. And then, we propose three fast depth based DP-ROI extraction and tracking algorithms by jointly using motion, texture, depth as well as previous extracted DP-ROIs. Finally, on the basis of the extracted DP-ROI, bit allocation optimization model is proposed to allocate more bits on DP-ROIs for high image quality and fewer bits on background regions for high compression ratio. Experimental results show that the presented RMVC scheme achieves significant coding gains at high rate while comparing with original joint multiview video model. To be specific, up to 14.22–23.32% bit-rate are saved while 0.16–0.68 dB coding gains are achieved in DP-ROIs at the cost of the image quality degradation in background.
Flexible distribution of complexity by hybrid predictive-distributed video coding
2010, Signal Processing: Image Communication
Citation Excerpt :
As such, besides rate and distortion, available computational complexity is considered an important parameter in a video coding system. This prompted methods for rate-distortion-complexity (RDC) optimization, which have been developed primarily for predictive video coding [3–6]. In predictive video coding, motion estimation is a very computationally complex task, due to the high number of coding modes and the high computational complexity of many of these modes.
There is currently limited flexibility for distributing complexity in a video coding system. While rate-distortion-complexity (RDC) optimization techniques have been proposed for conventional predictive video coding with encoder-side motion estimation, they fail to offer true flexible distribution of complexity between encoder and decoder since the encoder is assumed to have always more computational resources available than the decoder. On the other hand, distributed video coding solutions with decoder-side motion estimation have been proposed, but hardly any RDC optimized systems have been developed.
To offer more flexibility for video applications involving multi-tasking or battery-constrained devices, in this paper, we propose a codec combining predictive video coding concepts and techniques from distributed video coding and show the flexibility of this method in distributing complexity. We propose several modes to code frames, and provide complexity analysis illustrating encoder and decoder computational complexity for each mode. Rate distortion results for each mode indicate that the coding efficiency is similar. We describe a method to choose which mode to use for coding each inter frame, taking into account encoder and decoder complexity constraints, and illustrate how complexity is distributed more flexibly.
Complexity-aware adaptive preprocessing scheme for region-of-interest spatial scalable video coding
2014, IEEE Transactions on Circuits and Systems for Video Technology
A frame-level HEVC rate control algorithm for videos with complex scene over wireless network
2014, IEEE Vehicular Technology Conference

View all citing articles on Scopus

Dan Grois was born in Kharkov, Ukraine, in 1976. He received the B.Sc. degree in Electrical and Computer Engineering from the Ben-Gurion University (BGU), Beer-Sheva, Israel, in 2002, and the M.Sc. degree in Electro-Optics Engineering from BGU, in 2006. He is currently pursuing the Ph.D. degree in Electro-Optics Engineering at BGU. Dan has wide work experience in the field of electronics. He worked as a quality assurance engineer at Eltek Ltd. (Israeli printed circuit board manufacturer) between the years 1997 and 1999. During 2000, he worked at Motorola Inc. at the CAD (Computer Aided Design) department, defining electronics components. Between the years 2001 and 2003 he worked at Israel Aircraft Industries Ltd. as a system engineer, performing various engineering tasks including integration circuits design, writing testing programs for verification and validation. From 2004, he is employed at the Luzzatto&Luzzatto patent attorneys office.

His research interests include image and video processing, imaging systems, image and video compression standards, buffer and bit rate control, and analog and digital design.

Ofer Hadar received B.Sc. and M.Sc. (cum laude) and the Ph.D. degrees from the Ben-Gurion University of the Negev, Israel, in 1990, 1992 and 1997, respectively, all in Electrical and Computer Engineering. From August 1996 to February 1997, he was with CREOL at Central Florida University, Orlando, FL, as a Research Visiting Scientist, working on angular dependence of sampling MTF and over-sampling MTF. From October 1997 to March 1999, he was Post-Doctoral Fellow in the Department of Computer Science at the Technion-Israel Institute of Technology, Haifa. Currently he is a faculty member at the Communication Systems Engineering Department at Ben-Gurion University of the Negev.

His research interests include: image compression, video compression, routing in ATM networks, flow control in ATM networks, packet video, and transmission of video over IP networks and video rate smoothing and multiplexing. Hadar also works as a consultant for several Hi-tech companies such as, EnQuad Technologies Ltd in the area of MPEG-4, and Scopus in the area of video compression and transmission over satellite networks. Hadar is a member of the IEEE and SPIE.

View full text

Dynamic computational complexity and bit allocation for optimizing H.264/AVC video compression

Abstract

Introduction

Section snippets

Computational complexity and bit allocation problems

Method and system for dynamic allocation of encoding computational complexity and bits

Experimental results

Conclusions

Future research

Acknowledgment

The MPEG-4 video standard verification model

IEEE Trans. Circuits Syst. Video Technol.

Digital video broadcasting

IEEE Commun. Mag.

Optimal trellis-based buffered compression and fast approximations

IEEE Trans. Image Processing

A stable feedback control of the buffer state using the controlled Lagrange multiplier method

IEEE Trans. Image Processing

A new rate control scheme using quadratic rate distortion model

IEEE Trans. Circuit Syst. Video Technol.

Rate-constrained coder control and comparison of video coding standards

IEEE Trans. Circuit Syst. Video Technol.