Dynamic computational complexity and bit allocation for optimizing H.264/AVC video compression

https://doi.org/10.1016/j.jvcir.2007.05.002Get rights and content

Abstract

In this work, we present a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity (such as a number of CPU clocks) and bits for encoding each coding element (basic unit) within a video sequence, according to its predicted MAD (mean absolute difference). Our approach is based on a computational complexity–rate–distortion (C–R–D) analysis, which adds a complexity dimension to the conventional rate–distortion (R–D) analysis. Both theoretically and experimentally, we prove that by implementing the proposed approach for the dynamic allocation better results are achieved. We also prove that the optimal computational complexity allocation along with optimal bit allocation is better than the constant computational complexity allocation along with optimal bit allocation. In addition, we present a method and system for implementing the proposed approach, and for controlling computational complexity and bit allocation in real-time and off-line video coding. We divide each frame into one or more basic units, wherein each basic unit consists of at least one macroblock (MB), whose contents are related to a number of coding modes. We determine how much computational complexity and bits should be allocated for encoding each basic unit, and then allocate a corresponding group of coding modes and a quantization step-size, according to the estimated distortion (calculated by a linear regression model) of each basic unit and according to the remaining computational complexity and bits for encoding remaining basic units. For allocating the corresponding group of coding modes and the quantization step-size, we develop computational complexity–complexity step–rate (C–I–R) and rate–quantization step-size–computational complexity (R–Q–C) models.

Introduction

The ITU-T H.264/AVC (ISO/IEC MPEG-4 Part 10) video coding standard [1], [2] has become a challenge for real-time and off-line video applications. Compared to others standards, it gains about 50% in bit rate, while providing the same visual quality. In addition to having all the advantages of MPEG-2 [2], H.263 [3] and MPEG-4 [4], the H.264 video coding standard possesses a number of improvements, such as the content-adaptive-based arithmetic codec (CABAC) [5], enhanced transform and quantization, prediction of “Intra” macroblocks (spatial prediction), and others. H.264 is designed for both constant bit rate (CBR) and variable bit rate (VBR) video coding, useful for transmitting video sequences over statistically multiplexed networks (e.g., asynchronous transfer mode (ATM), the Ethernet, or other Internet networks). This video coding standard can also be used at any bit rate range for various applications, varying from wireless video phones to HDTV and digital video broadcasting (DVB) [6]. In addition, H.264 provides significantly improved coding efficiency and greater functionality, such as rate scalability, “Intra” prediction and error resilience in comparison with its predecessors, MPEG-2 and H.263. However, H.264 is much more complex in comparison to other coding standards and to achieve maximum quality encoding, high computational resources are required.

In the last decade, several rate control and bit allocation methods have been proposed for minimizing distortion in video compression standards, preceding the H.264/AVC standard. Conventional optimal encoding methods [7], [8] decrease a video sequence distortion only by optimizing the bit allocation. In [8] has been presented a theoretical study for achieving an optimal bit allocation and minimizing distortion by considering a relationship between rate and distortion and finding an optimal set of quantizers for a given information source. In [7] another solution is proposed for achieving an optimal bit allocation and minimizing distortion by implementing the Viterbi algorithm. However, these solutions are overcomplicated, since they implement dynamic programming for updating quantizers settings. More recent papers, such as [9] propose a feedback rate control scheme for minimizing distortion in the MPEG-2 and MPEG-4 by calculating the target bit rate for each frame basing on a quadratic equation of a rate distortion function. However, [9] does not consider encoding mode selection and computational complexity for minimizing distortion and therefore does not provide an optimal solution. In addition, [10] proposes a method for determining a set of optimal coding modes for encoding each macroblock in the H.264/AVC standard. According to this method, the rate distortion optimization (RDO) for each macroblock is performed for selecting an optimal coding mode by minimizing a Lagrange function. However, according to [10] the coding mode selection is also overcomplicated, since all coding modes are considered for selecting the optimal coding mode. Further, in [11] is implemented the quadratic rate control scheme for the H.264/AVC standard. However, [11] does not consider computational complexity and considers only the quantization settings and optimal mode selection. Thus, similarly to [10], the method of [11] provides an optimal solution only if all coding modes are selected. Further, in [12] is proposed a complexity control algorithm for the H.264 encoder. Computational savings are achieved by early prediction of skipped macroblocks prior to motion estimation through estimating a Lagrange rate–distortion–complexity cost function. A feedback control algorithm ensures that the encoder maintains a predefined target computational complexity. However, according to [12] each macroblock is either skipped (not coded) or coded by considering all coding modes, leading to over-complexity because of a large number of possible coding modes, and leading (especially for low bit rates) to significant fluctuations of distortions between each skipped and coded macroblock. In the most recent paper [13], a power–rate–distortion (P–R–D) analysis framework is proposed, extending the traditional R–D analysis by including the power consumption as an additional dimension. However, according to [13], a power consumption is considered for determining an overall constant complexity level according to the average P–R–D model. Thus, the solution of [13] is not optimal, since that activity of each coding element is not considered.

In this paper, we overcome all drawbacks mentioned above by developing novel techniques for providing real-time and off-line high quality video coding for H.264/AVC applications. We suggest a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity (such as a number of CPU clocks) and bits of each basic unit within a video sequence, according to its predicted MAD (mean absolute difference). We define a basic unit as a group of adjacent MBs. A basic unit can be a MB, slice, field, or frame [14]. Our approach is based on a computational complexity–rate–distortion (C–R–D) analysis, which adds a complexity dimension to the conventional rate–distortion (R–D) analysis. Both theoretically and experimentally, we prove that by implementing the proposed approach better results are achieved. We also prove that the optimal computational complexity allocation along with optimal bit allocation is better than the constant computational complexity allocation along with optimal bit allocation. The computational complexity issue is critical for the present and future real-time video applications implemented by the AVC/H.264 standard, which has a large number of coding modes. In conventional advanced video coding applications, these coding modes are not fully selected at the time of video sequence encoding, since selecting all possible coding modes leads to a significant increase of the overall computational complexity. The greater computational complexity is, the larger processing (power) resources are required; and when used for mobile/wireless devices, the power issue becomes critical. On the other hand, not selecting all possible coding modes leads to an increase of the encoded video sequence distortion, and in turn to a decrease of the overall video quality. Therefore, by dynamically allocating the computational complexity and bits for encoding each basic unit within a video sequence, we minimize the video sequence distortion and achieve better video quality. In addition, we present a method and system for implementing the approach for the dynamic allocation. According to this method, the overall encoding process can be performed at different levels of video quality. Higher quality levels require more computational complexity (in terms of CPU clocks). When setting the quality levels, we take into account the computational constraints of our system, the characteristics of the input, such as video sequence statistics, and the characteristics of the output, such as the distortion and a number of CPU clocks that was required for encoding each basic unit.

In order to change the overall computational constraints, the proposed method needs to be robust. The difference in the overall video sequence quality, related to an optimal and constant computational complexity and bit allocations, defines the level of robustness. Our method relies on the prior video quality estimates for a given implementation. Furthermore, for making tradeoffs between the computational complexity and bit allocation and the video quality, while appropriately allocating groups of coding modes for different basic units within a given video sequence, the method relies on the past observations of the encoding process inputs and outputs. Moreover, our method and system can be used for the real-time and off-line implementations for maximizing the overall processing performance. We achieve in real-time a maximal processing resources usage, such as the CPU usage for a predetermined period of time (see Section 2). As a result, the method is computationally efficient.

The structure of this paper is as follows. At the beginning of Section 2, we review two main problems related to the computational complexity and bit allocation of the conventional off-line and real-time video encoding and decoding systems. Then, in Section 2.1, we describe an optimal coding modes selection, and in Section 2.2 we present a complete theoretical study of computational complexity and bit allocation problems based on the C–R–D analysis. In Section 3, we propose a method and system for implementing the approach for the dynamic allocation by providing frame and basic unit levels computational complexity and bit rate control. Experimental results and conclusions are given in Sections 4 Experimental results, 5 Conclusions, respectively. In Section 6, a future research work is presented.

Section snippets

Computational complexity and bit allocation problems

There are two main problems related to computational complexity and bit allocation of a conventional off-line and real-time video encoding and decoding systems (see Fig. 1). One problem is that according to the conventional R–D analysis, a user or a system is unable to define (manually or automatically) the computational complexity (such as a number of CPU clocks, memory bandwidth usage, and power consumption) for the overall encoding process. This issue becomes critical for video applications

Method and system for dynamic allocation of encoding computational complexity and bits

As described above, while selecting the minimal encoding computational complexity, we cannot obtain an optimal set of coding modes for encoding each basic unit, and as a result we achieve a maximal distortion. In other words, the minimal encoding computational complexity relates to a single coding mode, and as a result, to a maximal distortion.

The H.264/AVC standard has a conventional method [10], [14] for determining an optimal coding mode for encoding each macroblock. According to [10] and

Experimental results

For simulating the proposed dynamic allocation approach, we statistically provide, for the simplicity, a set of best groups of coding modes for each video sequence. Also, for the simplicity, we define a basic unit as a frame. Each basic unit receives its constant set of best groups of coding modes. We have 12 variable coding modes, and then for each tested video sequence, we perform 212 = 4096 experiments, creating convex-hull graphs based on the obtained results. From each convex-hull graph we

Conclusions

In this work, we have introduced a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity and bits for each basic unit within a video sequence, according to its predicted MAD. We have based the proposed approach on the C–R–D analysis. Both theoretically and experimentally, we have proved that better results are achieved. We also have proved that the optimal computational complexity allocation along with optimal bit allocation is better than

Future research

In future research we plan to improve our dynamic allocation by:

  • 1.

    Developing an algorithm to take the place of the statistical approach presented in this paper.

  • 2.

    We should consider encoder and decoder buffers and end-to-end delay, which were not taken into the account in this work.

  • 3.

    Developing a computational complexity and rate control algorithm based on human visual perception, where each frame can be divided into a number of regions. Each region can be perceived differently by the human visual

Acknowledgment

We are grateful thank to L. Izrin for help in developing the program code.

Evgeny Kaminsky received B.Sc. and M.Sc. degree in Electrical and Computer Engineering from Ben-Gurion University in 1995 and 1999, respectively. He worked as a VLSI Engineer with VisionTech Ltd. (now BroadCom Israel) in the area of MPEG-2, and with Intel Israel in the area of Pentium-4 manufacturing testing. He is presently a Ph.D. student at the Department of Electrical and Computer Engineering at Ben-Gurion University. Mr. Kaminsky is interested in video information processing, algorithms

References (21)

  • T. Wiegand, Working draft number 2, Revision 2 (WD-2), in: Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, March...
  • T. Wiegand, G. Sullivan, Final draft ITU-T recommendation and final draft international standard of joint video...
  • ITU Telecom Standardization Sector of ITU, H.263, Video coding for low bit rate communications, ITU-T Recommendation...
  • T. Sikora

    The MPEG-4 video standard verification model

    IEEE Trans. Circuits Syst. Video Technol.

    (1997)
  • VCODEX: H.264 tutorial white papers. [Online]. Available from:...
  • U. Reimers

    Digital video broadcasting

    IEEE Commun. Mag.

    (1998)
  • A. Ortega et al.

    Optimal trellis-based buffered compression and fast approximations

    IEEE Trans. Image Processing

    (1994)
  • J. Choi et al.

    A stable feedback control of the buffer state using the controlled Lagrange multiplier method

    IEEE Trans. Image Processing

    (1994)
  • T. Chiang et al.

    A new rate control scheme using quadratic rate distortion model

    IEEE Trans. Circuit Syst. Video Technol.

    (1997)
  • T. Wiegand et al.

    Rate-constrained coder control and comparison of video coding standards

    IEEE Trans. Circuit Syst. Video Technol.

    (2003)
There are more references available in the full text version of this article.

Cited by (25)

  • Semantical video coding: Instill static-dynamic clues into structured bitstream for AI tasks

    2023, Journal of Visual Communication and Image Representation
  • A generic, comprehensive and granular decoder complexity model for the H.264/AVC standard

    2014, Journal of Visual Communication and Image Representation
    Citation Excerpt :

    To do so, we should estimate the required amount of power consumption for any desired combination of encoding parameters and modes using a complexity model. Considering the computational complexity of the H.264/AVC encoder, several encoder complexity modeling and management methods for the encoder side are proposed in the literature [2–31]. At the same time, others have worked on the complexity of the decoder and have proposed methods to model and control its complexity [32–41].

  • Depth perceptual region-of-interest based multiview video coding

    2010, Journal of Visual Communication and Image Representation
  • Flexible distribution of complexity by hybrid predictive-distributed video coding

    2010, Signal Processing: Image Communication
    Citation Excerpt :

    As such, besides rate and distortion, available computational complexity is considered an important parameter in a video coding system. This prompted methods for rate-distortion-complexity (RDC) optimization, which have been developed primarily for predictive video coding [3–6]. In predictive video coding, motion estimation is a very computationally complex task, due to the high number of coding modes and the high computational complexity of many of these modes.

  • Complexity-aware adaptive preprocessing scheme for region-of-interest spatial scalable video coding

    2014, IEEE Transactions on Circuits and Systems for Video Technology
View all citing articles on Scopus

Evgeny Kaminsky received B.Sc. and M.Sc. degree in Electrical and Computer Engineering from Ben-Gurion University in 1995 and 1999, respectively. He worked as a VLSI Engineer with VisionTech Ltd. (now BroadCom Israel) in the area of MPEG-2, and with Intel Israel in the area of Pentium-4 manufacturing testing. He is presently a Ph.D. student at the Department of Electrical and Computer Engineering at Ben-Gurion University. Mr. Kaminsky is interested in video information processing, algorithms for image and video compression, transmission of video over modern communications networks, and VLSI design.

Dan Grois was born in Kharkov, Ukraine, in 1976. He received the B.Sc. degree in Electrical and Computer Engineering from the Ben-Gurion University (BGU), Beer-Sheva, Israel, in 2002, and the M.Sc. degree in Electro-Optics Engineering from BGU, in 2006. He is currently pursuing the Ph.D. degree in Electro-Optics Engineering at BGU. Dan has wide work experience in the field of electronics. He worked as a quality assurance engineer at Eltek Ltd. (Israeli printed circuit board manufacturer) between the years 1997 and 1999. During 2000, he worked at Motorola Inc. at the CAD (Computer Aided Design) department, defining electronics components. Between the years 2001 and 2003 he worked at Israel Aircraft Industries Ltd. as a system engineer, performing various engineering tasks including integration circuits design, writing testing programs for verification and validation. From 2004, he is employed at the Luzzatto&Luzzatto patent attorneys office.

His research interests include image and video processing, imaging systems, image and video compression standards, buffer and bit rate control, and analog and digital design.

Ofer Hadar received B.Sc. and M.Sc. (cum laude) and the Ph.D. degrees from the Ben-Gurion University of the Negev, Israel, in 1990, 1992 and 1997, respectively, all in Electrical and Computer Engineering. From August 1996 to February 1997, he was with CREOL at Central Florida University, Orlando, FL, as a Research Visiting Scientist, working on angular dependence of sampling MTF and over-sampling MTF. From October 1997 to March 1999, he was Post-Doctoral Fellow in the Department of Computer Science at the Technion-Israel Institute of Technology, Haifa. Currently he is a faculty member at the Communication Systems Engineering Department at Ben-Gurion University of the Negev.

His research interests include: image compression, video compression, routing in ATM networks, flow control in ATM networks, packet video, and transmission of video over IP networks and video rate smoothing and multiplexing. Hadar also works as a consultant for several Hi-tech companies such as, EnQuad Technologies Ltd in the area of MPEG-4, and Scopus in the area of video compression and transmission over satellite networks. Hadar is a member of the IEEE and SPIE.

View full text