Journal of Visual Communication and Image Representation
Dynamic computational complexity and bit allocation for optimizing H.264/AVC video compression
Introduction
The ITU-T H.264/AVC (ISO/IEC MPEG-4 Part 10) video coding standard [1], [2] has become a challenge for real-time and off-line video applications. Compared to others standards, it gains about 50% in bit rate, while providing the same visual quality. In addition to having all the advantages of MPEG-2 [2], H.263 [3] and MPEG-4 [4], the H.264 video coding standard possesses a number of improvements, such as the content-adaptive-based arithmetic codec (CABAC) [5], enhanced transform and quantization, prediction of “Intra” macroblocks (spatial prediction), and others. H.264 is designed for both constant bit rate (CBR) and variable bit rate (VBR) video coding, useful for transmitting video sequences over statistically multiplexed networks (e.g., asynchronous transfer mode (ATM), the Ethernet, or other Internet networks). This video coding standard can also be used at any bit rate range for various applications, varying from wireless video phones to HDTV and digital video broadcasting (DVB) [6]. In addition, H.264 provides significantly improved coding efficiency and greater functionality, such as rate scalability, “Intra” prediction and error resilience in comparison with its predecessors, MPEG-2 and H.263. However, H.264 is much more complex in comparison to other coding standards and to achieve maximum quality encoding, high computational resources are required.
In the last decade, several rate control and bit allocation methods have been proposed for minimizing distortion in video compression standards, preceding the H.264/AVC standard. Conventional optimal encoding methods [7], [8] decrease a video sequence distortion only by optimizing the bit allocation. In [8] has been presented a theoretical study for achieving an optimal bit allocation and minimizing distortion by considering a relationship between rate and distortion and finding an optimal set of quantizers for a given information source. In [7] another solution is proposed for achieving an optimal bit allocation and minimizing distortion by implementing the Viterbi algorithm. However, these solutions are overcomplicated, since they implement dynamic programming for updating quantizers settings. More recent papers, such as [9] propose a feedback rate control scheme for minimizing distortion in the MPEG-2 and MPEG-4 by calculating the target bit rate for each frame basing on a quadratic equation of a rate distortion function. However, [9] does not consider encoding mode selection and computational complexity for minimizing distortion and therefore does not provide an optimal solution. In addition, [10] proposes a method for determining a set of optimal coding modes for encoding each macroblock in the H.264/AVC standard. According to this method, the rate distortion optimization (RDO) for each macroblock is performed for selecting an optimal coding mode by minimizing a Lagrange function. However, according to [10] the coding mode selection is also overcomplicated, since all coding modes are considered for selecting the optimal coding mode. Further, in [11] is implemented the quadratic rate control scheme for the H.264/AVC standard. However, [11] does not consider computational complexity and considers only the quantization settings and optimal mode selection. Thus, similarly to [10], the method of [11] provides an optimal solution only if all coding modes are selected. Further, in [12] is proposed a complexity control algorithm for the H.264 encoder. Computational savings are achieved by early prediction of skipped macroblocks prior to motion estimation through estimating a Lagrange rate–distortion–complexity cost function. A feedback control algorithm ensures that the encoder maintains a predefined target computational complexity. However, according to [12] each macroblock is either skipped (not coded) or coded by considering all coding modes, leading to over-complexity because of a large number of possible coding modes, and leading (especially for low bit rates) to significant fluctuations of distortions between each skipped and coded macroblock. In the most recent paper [13], a power–rate–distortion (P–R–D) analysis framework is proposed, extending the traditional R–D analysis by including the power consumption as an additional dimension. However, according to [13], a power consumption is considered for determining an overall constant complexity level according to the average P–R–D model. Thus, the solution of [13] is not optimal, since that activity of each coding element is not considered.
In this paper, we overcome all drawbacks mentioned above by developing novel techniques for providing real-time and off-line high quality video coding for H.264/AVC applications. We suggest a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity (such as a number of CPU clocks) and bits of each basic unit within a video sequence, according to its predicted MAD (mean absolute difference). We define a basic unit as a group of adjacent MBs. A basic unit can be a MB, slice, field, or frame [14]. Our approach is based on a computational complexity–rate–distortion (C–R–D) analysis, which adds a complexity dimension to the conventional rate–distortion (R–D) analysis. Both theoretically and experimentally, we prove that by implementing the proposed approach better results are achieved. We also prove that the optimal computational complexity allocation along with optimal bit allocation is better than the constant computational complexity allocation along with optimal bit allocation. The computational complexity issue is critical for the present and future real-time video applications implemented by the AVC/H.264 standard, which has a large number of coding modes. In conventional advanced video coding applications, these coding modes are not fully selected at the time of video sequence encoding, since selecting all possible coding modes leads to a significant increase of the overall computational complexity. The greater computational complexity is, the larger processing (power) resources are required; and when used for mobile/wireless devices, the power issue becomes critical. On the other hand, not selecting all possible coding modes leads to an increase of the encoded video sequence distortion, and in turn to a decrease of the overall video quality. Therefore, by dynamically allocating the computational complexity and bits for encoding each basic unit within a video sequence, we minimize the video sequence distortion and achieve better video quality. In addition, we present a method and system for implementing the approach for the dynamic allocation. According to this method, the overall encoding process can be performed at different levels of video quality. Higher quality levels require more computational complexity (in terms of CPU clocks). When setting the quality levels, we take into account the computational constraints of our system, the characteristics of the input, such as video sequence statistics, and the characteristics of the output, such as the distortion and a number of CPU clocks that was required for encoding each basic unit.
In order to change the overall computational constraints, the proposed method needs to be robust. The difference in the overall video sequence quality, related to an optimal and constant computational complexity and bit allocations, defines the level of robustness. Our method relies on the prior video quality estimates for a given implementation. Furthermore, for making tradeoffs between the computational complexity and bit allocation and the video quality, while appropriately allocating groups of coding modes for different basic units within a given video sequence, the method relies on the past observations of the encoding process inputs and outputs. Moreover, our method and system can be used for the real-time and off-line implementations for maximizing the overall processing performance. We achieve in real-time a maximal processing resources usage, such as the CPU usage for a predetermined period of time (see Section 2). As a result, the method is computationally efficient.
The structure of this paper is as follows. At the beginning of Section 2, we review two main problems related to the computational complexity and bit allocation of the conventional off-line and real-time video encoding and decoding systems. Then, in Section 2.1, we describe an optimal coding modes selection, and in Section 2.2 we present a complete theoretical study of computational complexity and bit allocation problems based on the C–R–D analysis. In Section 3, we propose a method and system for implementing the approach for the dynamic allocation by providing frame and basic unit levels computational complexity and bit rate control. Experimental results and conclusions are given in Sections 4 Experimental results, 5 Conclusions, respectively. In Section 6, a future research work is presented.
Section snippets
Computational complexity and bit allocation problems
There are two main problems related to computational complexity and bit allocation of a conventional off-line and real-time video encoding and decoding systems (see Fig. 1). One problem is that according to the conventional R–D analysis, a user or a system is unable to define (manually or automatically) the computational complexity (such as a number of CPU clocks, memory bandwidth usage, and power consumption) for the overall encoding process. This issue becomes critical for video applications
Method and system for dynamic allocation of encoding computational complexity and bits
As described above, while selecting the minimal encoding computational complexity, we cannot obtain an optimal set of coding modes for encoding each basic unit, and as a result we achieve a maximal distortion. In other words, the minimal encoding computational complexity relates to a single coding mode, and as a result, to a maximal distortion.
The H.264/AVC standard has a conventional method [10], [14] for determining an optimal coding mode for encoding each macroblock. According to [10] and
Experimental results
For simulating the proposed dynamic allocation approach, we statistically provide, for the simplicity, a set of best groups of coding modes for each video sequence. Also, for the simplicity, we define a basic unit as a frame. Each basic unit receives its constant set of best groups of coding modes. We have 12 variable coding modes, and then for each tested video sequence, we perform 212 = 4096 experiments, creating convex-hull graphs based on the obtained results. From each convex-hull graph we
Conclusions
In this work, we have introduced a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity and bits for each basic unit within a video sequence, according to its predicted MAD. We have based the proposed approach on the C–R–D analysis. Both theoretically and experimentally, we have proved that better results are achieved. We also have proved that the optimal computational complexity allocation along with optimal bit allocation is better than
Future research
In future research we plan to improve our dynamic allocation by:
- 1.
Developing an algorithm to take the place of the statistical approach presented in this paper.
- 2.
We should consider encoder and decoder buffers and end-to-end delay, which were not taken into the account in this work.
- 3.
Developing a computational complexity and rate control algorithm based on human visual perception, where each frame can be divided into a number of regions. Each region can be perceived differently by the human visual
Acknowledgment
We are grateful thank to L. Izrin for help in developing the program code.
Evgeny Kaminsky received B.Sc. and M.Sc. degree in Electrical and Computer Engineering from Ben-Gurion University in 1995 and 1999, respectively. He worked as a VLSI Engineer with VisionTech Ltd. (now BroadCom Israel) in the area of MPEG-2, and with Intel Israel in the area of Pentium-4 manufacturing testing. He is presently a Ph.D. student at the Department of Electrical and Computer Engineering at Ben-Gurion University. Mr. Kaminsky is interested in video information processing, algorithms
References (21)
- T. Wiegand, Working draft number 2, Revision 2 (WD-2), in: Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, March...
- T. Wiegand, G. Sullivan, Final draft ITU-T recommendation and final draft international standard of joint video...
- ITU Telecom Standardization Sector of ITU, H.263, Video coding for low bit rate communications, ITU-T Recommendation...
The MPEG-4 video standard verification model
IEEE Trans. Circuits Syst. Video Technol.
(1997)- VCODEX: H.264 tutorial white papers. [Online]. Available from:...
Digital video broadcasting
IEEE Commun. Mag.
(1998)- et al.
Optimal trellis-based buffered compression and fast approximations
IEEE Trans. Image Processing
(1994) - et al.
A stable feedback control of the buffer state using the controlled Lagrange multiplier method
IEEE Trans. Image Processing
(1994) - et al.
A new rate control scheme using quadratic rate distortion model
IEEE Trans. Circuit Syst. Video Technol.
(1997) - et al.
Rate-constrained coder control and comparison of video coding standards
IEEE Trans. Circuit Syst. Video Technol.
(2003)
Cited by (25)
Semantical video coding: Instill static-dynamic clues into structured bitstream for AI tasks
2023, Journal of Visual Communication and Image RepresentationA generic, comprehensive and granular decoder complexity model for the H.264/AVC standard
2014, Journal of Visual Communication and Image RepresentationCitation Excerpt :To do so, we should estimate the required amount of power consumption for any desired combination of encoding parameters and modes using a complexity model. Considering the computational complexity of the H.264/AVC encoder, several encoder complexity modeling and management methods for the encoder side are proposed in the literature [2–31]. At the same time, others have worked on the complexity of the decoder and have proposed methods to model and control its complexity [32–41].
Depth perceptual region-of-interest based multiview video coding
2010, Journal of Visual Communication and Image RepresentationFlexible distribution of complexity by hybrid predictive-distributed video coding
2010, Signal Processing: Image CommunicationCitation Excerpt :As such, besides rate and distortion, available computational complexity is considered an important parameter in a video coding system. This prompted methods for rate-distortion-complexity (RDC) optimization, which have been developed primarily for predictive video coding [3–6]. In predictive video coding, motion estimation is a very computationally complex task, due to the high number of coding modes and the high computational complexity of many of these modes.
Complexity-aware adaptive preprocessing scheme for region-of-interest spatial scalable video coding
2014, IEEE Transactions on Circuits and Systems for Video TechnologyA frame-level HEVC rate control algorithm for videos with complex scene over wireless network
2014, IEEE Vehicular Technology Conference
Evgeny Kaminsky received B.Sc. and M.Sc. degree in Electrical and Computer Engineering from Ben-Gurion University in 1995 and 1999, respectively. He worked as a VLSI Engineer with VisionTech Ltd. (now BroadCom Israel) in the area of MPEG-2, and with Intel Israel in the area of Pentium-4 manufacturing testing. He is presently a Ph.D. student at the Department of Electrical and Computer Engineering at Ben-Gurion University. Mr. Kaminsky is interested in video information processing, algorithms for image and video compression, transmission of video over modern communications networks, and VLSI design.
Dan Grois was born in Kharkov, Ukraine, in 1976. He received the B.Sc. degree in Electrical and Computer Engineering from the Ben-Gurion University (BGU), Beer-Sheva, Israel, in 2002, and the M.Sc. degree in Electro-Optics Engineering from BGU, in 2006. He is currently pursuing the Ph.D. degree in Electro-Optics Engineering at BGU. Dan has wide work experience in the field of electronics. He worked as a quality assurance engineer at Eltek Ltd. (Israeli printed circuit board manufacturer) between the years 1997 and 1999. During 2000, he worked at Motorola Inc. at the CAD (Computer Aided Design) department, defining electronics components. Between the years 2001 and 2003 he worked at Israel Aircraft Industries Ltd. as a system engineer, performing various engineering tasks including integration circuits design, writing testing programs for verification and validation. From 2004, he is employed at the Luzzatto&Luzzatto patent attorneys office.
His research interests include image and video processing, imaging systems, image and video compression standards, buffer and bit rate control, and analog and digital design.
Ofer Hadar received B.Sc. and M.Sc. (cum laude) and the Ph.D. degrees from the Ben-Gurion University of the Negev, Israel, in 1990, 1992 and 1997, respectively, all in Electrical and Computer Engineering. From August 1996 to February 1997, he was with CREOL at Central Florida University, Orlando, FL, as a Research Visiting Scientist, working on angular dependence of sampling MTF and over-sampling MTF. From October 1997 to March 1999, he was Post-Doctoral Fellow in the Department of Computer Science at the Technion-Israel Institute of Technology, Haifa. Currently he is a faculty member at the Communication Systems Engineering Department at Ben-Gurion University of the Negev.
His research interests include: image compression, video compression, routing in ATM networks, flow control in ATM networks, packet video, and transmission of video over IP networks and video rate smoothing and multiplexing. Hadar also works as a consultant for several Hi-tech companies such as, EnQuad Technologies Ltd in the area of MPEG-4, and Scopus in the area of video compression and transmission over satellite networks. Hadar is a member of the IEEE and SPIE.