Error resilient GOP structures on video streaming

doi:10.1016/j.jvcir.2006.12.001

Journal of Visual Communication and Image Representation

Volume 18, Issue 2, April 2007, Pages 151-161

https://doi.org/10.1016/j.jvcir.2006.12.001 Get rights and content

Abstract

In this paper, we describe novel coding dependencies among video frames supporting efficient error resilience in a video streaming system. Our approach is based upon reorganizing the regular linear GOP encoding structure to prevent the error propagation phenomenon and recover lost frames by interpolation. For errors taking place in a single frame, we guarantee that errors will not propagate to the neighboring frames. Therefore, both the past and future temporal information can be used to reconstruct the damaged frame. Furthermore, the proposed coding structures can also recover successive lost frames. In low power and low bandwidth circumstances such as some mobile devices, the proposed structures can also dynamically adjust the video quality level to adapt to a fluctuant network environment.

Introduction

Due to the popularity of applications of streaming videos such as video-on-demand (VOD), video conferencing and 3G phones [1], error resilience of compressed video streams has become an important concern. Recently, the increasing number of mobile users has spurred research on video transmission over wireless network. Wireless streaming environments present more challenges in multimedia delivery due to the error-prone, bandwidth-limited and time-varying characteristics.

A video streaming system typically includes five stages, as shown in Fig. 1 [2]. First, a video is encoded by a video encoder. Next, the encoded bit-stream is segmented into many packets. The packets are then sent over the networks. To make the encoded bit-stream resilient to transmission errors, some redundancy has to be added to the stream, such as forward error correction (FEC), and then encoded videos are transmitted over the networks.

Transmission of streaming videos is very sensitive to delay and loss of information. In contrast with data communications, which are not usually subject to ensure error-free delivery, real-time video is delay sensitive and cannot easily make use of retransmission.

Due to the use of temporal predictive coding, encoded video streams are extremely vulnerable to transmission errors. A transmission error of a macro-block may lead to error propagation in the current and successive frames, as illustrated in Fig. 2. Furthermore, in low-bit-rate circumstances, a frame may be packetized into merely few packets [3], thus a lost packet is likely to damage a large portion of a frame or an entire frame and then propagate to the following frames.

It is possible to retransmit lost information without delay. When a frame is damaged, instead of waiting for the arrival of retransmitted data, the decoder can be continually decoding data by some error concealment schemes. After the arrival of retransmitted data, the damaged part is then corrected. However, packet loss usually results from insufficient network bandwidth, so retransmission of packets will make the network more congested.

Several related error resilient approaches [4], [5], [6], [7], [8], [9], [10] have been proposed to minimize transmission loss with a certain degree of redundancy. These methods can be categorized according to the use of feedback channel or not. H.263 standard [11] suggests a mode of the Reference Picture Selection (RPS), which allows the encoder to select one of several previously decoded frames as a reference picture for prediction relying upon a feedback channel to efficiently stop error propagation. In the case of no feedback information in H.263, the prefixed interleaved RPS scheme suggests the intra-frame insertion and the independent segment prediction. However, this technique is much less efficient than the feedback-based mechanism [12].

The latest video coding standard, H.264/AVC, also provides several error resilience schemes. For example, Flexible Macroblock Ordering (FMO) partitions a frame into different types of slice groups to prevent error propagation. The slice type can be defined by users, such as checker board mode or interleaving mode. Besides, in H.264/AVC, data in a slice are allocated into three different partitions A, B, and C according to their importance. Type A partitions can be used to recover type B and type C partitions by some error concealment schemes, but if a type A partition is lost, it cannot be recovered from type B or type C partitions. All error resilience schemes supported in H.264/AVC protect data from damage and provide more information in the error concealment stage, however, the penalty of redundant bits causes 1–2 dB degradation of the quality compared to videos without using any error resilience tools. More surveys about error resilience schemes provided in H.264/AVC can be found in [13] and [14].

In addition to error resilient coding schemes in H.263 and H.264/AVC, Multiple-Description Coding (MDC) [15] generates several bit-streams of the same source signal and transmits them over separate channels. The reconstructed signal is acceptable with receiving any channel, and that incremental improvement is achievable if all channels are received. Differing from the above schemes, in [16] most motions are supposed to be smooth, and damaged blocks are recovered by interpolating the combination of motion vectors, such as inverse or double of the motion vectors, addition or subtraction of two motion vectors, and so on.

Most above error resilience schemes use some kind of temporal or spatial interpolation approaches. However, once all packets of a frame are lost during transmission, the only available temporal information to recover the damaged frame is its previous frame. This is because all frames behind the damaged frame will also be corrupted. In this paper, our objective is stated as follows: once an inter-frame is damaged, both its previous and next neighboring frames can still be predicted (compensated) through some other frames using the pre-stored prediction information. By achieving this goal, damage in an inter-frame will not propagate to the neighboring frames. In this condition, the visual quality will be smoother when seeing a video clip. Moreover, the damaged frames are able to be reconstructed from both the previous and next neighboring frames by any existing interpolation methods, which can be achieved by most of current error concealment mechanisms with only minor change.

The remaining part of this paper is organized as follows. In Section 2, the defect of the conventional temporal dependencies among frames for error concealment is addressed. A concept to reorganize temporal dependencies is then introduced in Section 3. Next, we illustrate the proposed temporal dependencies in Sections 4 Effective temporal dependencies: double-binary tree structure, 5 Effective temporal dependencies: extended GOP. In Section 6, the discussion of simulation results are given. Finally, several conclusions are drawn in Section 7.

Section snippets

Temporal dependencies among frames

In the conventional coding systems, a video clip consists of many groups of pictures (GOPs), and three types of frames are defined in each GOP, I-, P-, and B-frame. I-frames are encoded and decoded by themselves. P-frames have to be encoded and decoded by referring to the previous I-frame or P-frame. B-frames are similar to P-frames, but the motion compensation is done by referring to the previous I-frame or P-frame, the next I-frame or P-frame, or an interpolation between them.

An example of

A simple example: binary tree structure

The structure of GOP can be viewed as a tree with I-frame being the root. Obviously the conventional GOP structure is a complete skew tree with each P-frame being the child of its previous frame. In this section, a simple structure, the binary tree, is used to illustrate the concept of our mechanism to restructure GOP. The objective is to prevent errors from propagating to other frames and to let errors be able to be recovered more easily. To achieve this goal, we replace the conventional GOP

Effective temporal dependencies: double-binary tree structure

Though the binary tree structure provides an essential solution for error resilience, it faces some problems. For example, a frame with errors occurring near the root of the tree would be unlikely to be recovered and may propagate to more frames. In this section, we present another GOP structure with effective coding dependencies among frames, called double-binary tree structure, which guarantees that errors in one inter-frame do not propagate to its previous and future neighbors, that is,

Effective temporal dependencies: extended GOP

The proposed dependent relation of frames in the previous section concentrates on transmission of minimum dependency information to recover damaged frames. The double-binary tree GOP breaks the regular coding dependencies to construct a new dependent relation, but it is not compatible with the standard GOP structure. Therefore, in this section we extend the conventional GOP structure to another GOP structure, named extended GOP structure, which retains the conventional coding dependencies and

Experimental results

In this section, we conduct several simulations on demonstrating the performance of the proposed coding dependencies among frames. We modified the GOP structure in “XviD” [18], a popular and open source MPEG-4 video codec, to measure the efficiency of our structure. XviD has been widely supported by many video transcoding systems such as TMPGEnc for transmission of videos. Due to the popularity and the openness of XviD, it is adopted in our simulations, even though the bit-rate of compressed

Conclusion

In this paper, two novel GOP structures are proposed for error resilience in streaming videos. The first one is the double-binary tree GOP which reorganizes the conventional coding dependencies. The second one is the extended GOP based on the existing conventional coding dependencies by inserting redundant dependencies. To verify the strength of the proposed GOPs, we also applied a simple error concealment mechanism taking advantage of dual directions of temporal information for recovering

References (20)

R.T. Derryberry et al.
Transmit Diversity in 3G CDMA Systems
IEEE Commun. Mag.
(2002)
Y. Wang et al.
Video Processing and Communications
(2002)
C.P. Lim, E.A.W. Tan, M. Ghanbari, S. Ghanbari, Cell loss concealment and packetization in packet video, Int. J....
ITU-T, SG15/WP15/1, LBC-95-033, Telenor R& D, An Error Resilience Method Based on Back Channel Signaling And FEC, Jan....
ISO/IEC JTC1/SC29/WG11 MPEG96/M0768, Iterated Systems Inc., An Error Recovery Strategy for Videophone Applications,...
S. Fukunaga, T. Nakai, H. Inoue, Error-Resilient Video Coding by Dynamic Replacing of Reference Pictures, GLOBECOM’96,...
Y. Tomita, T. Kimura, T. Ichikawa, Error Resilient Modified Interframe Coding System for Limited Reference Picture...
R. Zhang et al.
Video coding with optimal inter/intra mode switching for packet loss resilience
IEEE J. Select. Areas Commun.
(2000)
J. Liang, R. Talluri, Tools for Robust Image And Video Coding in JPEG-2000 And MPEG-4 Standards, in: Proc. IS& T-SPIE...
R. Talluri
Error-resilient video coding in ISO MPEG-4 standard
IEEE Commun. Mag.
(1998)

There are more references available in the full text version of this article.

Cited by (0)

View full text

Journal of Visual Communication and Image Representation

Error resilient GOP structures on video streaming

Abstract

Introduction

Section snippets

Temporal dependencies among frames

A simple example: binary tree structure

Effective temporal dependencies: double-binary tree structure

Effective temporal dependencies: extended GOP

Experimental results

Conclusion

Transmit Diversity in 3G CDMA Systems

IEEE Commun. Mag.

Video Processing and Communications

Video coding with optimal inter/intra mode switching for packet loss resilience

IEEE J. Select. Areas Commun.

Error-resilient video coding in ISO MPEG-4 standard

IEEE Commun. Mag.