Real-time and rate–distortion optimized video streaming with TCP

https://doi.org/10.1016/j.image.2007.01.001Get rights and content

Abstract

In this paper we explore the use of a new rate–distortion metric for optimizing real-time Internet video streaming with the transmission control protocol (TCP). We lay out the groundwork by developing a simple model that characterizes the expected latency for packets send with TCP-Reno. Subsequently, we develop an analytical model of the expected video distortion at the decoder with respect to the expected latency for TCP, the packetization mechanism, and the error-concealment method used at the decoder. Characterizing the duo protocol/channel more accurately, we obtain a better estimate of the expected distortion and the available channel rate. This better knowledge is exploited with the design of a new algorithm for rate–distortion optimized encoding mode selection for video streaming with TCP. Experimental results for real-time video streaming depict improvement in PSNR in the range of 2 dB over metrics that do not consider the behavior of the transport protocol.

Introduction

Streaming pre-encoded or real-time video over the Internet is a popular application that continues to gain interest as new entertainment services are continuously introduced. In today's Internet, the transmission control protocol (TCP) protocol that transports the bulk of the existing traffic [8] is considered unsuitable for video streaming applications, while its counterpart UDP is usually the protocol of choice. The main reasons why TCP is unsuitable for this class of applications are the rapid throughput fluctuations and the reliability mechanism which introduces additional delays [31]. Therefore, it is generally believed that the transport protocol of choice for video streaming should be UDP, on top of which several application-specific mechanisms can be built (error control, rate control, etc.) [31]. However, the absence of congestion control from UDP can cause performance deterioration for TCP-based applications if wide-scale deployment takes place in the Internet [8], [14]. That is the reason behind the IETF effort to define a new rate control protocol, which is characterized by the same behavior with TCP in the long run, but allows smoother throughput fluctuations [12]. Even so, this standardization effort is in progress and is considered an active research topic. Despite these efforts, however, the majority of commercial IP-based video streaming systems unexpectedly so employ TCP for transport layer services of pre-encoded video content [25], [32], [24]. The widespread use of TCP and its well-understood behavior dominated over the concerns regarding TCP's deficiencies.

We will see that even with the reliable TCP, the problem is essentially error control. Handling errors is a critical task in a video communication system. Especially for real-time video communications delay constraints are very strict, making thus retransmission of lost packets not particularly useful in a practical setup. To overcome this problem, methods like forward error correction (FEC) are usually employed [26], [9]. However, the primary environment where FEC methods are effective is when random bit errors hit the bitstream, while buffer overflow in the Internet generates packet erasures because of the packet drops. This is what would happen if we build a video streaming system with TCP. Furthermore, even though TCP has the inherent retransmission mechanism for error control, when the application payload are video packets, error control obtains a new meaning. This is true because TCP may be able to recover packets with retransmissions but they may be received too late for playback. While someone could “hack” TCP so that it can avoid unnecessary retransmissions, we believe that this cross-layer option is rather invasive to the standardized TCP.

Therefore, blindly using TCP for video streaming will not yield the best possible performance. We believe that when TCP is used for video streaming, we should take into consideration its behavior explicitly so that we can maximize the delivered video quality. The question that we will answer in this paper is how to perform this task efficiently. TCP is a reliable protocol and uses retransmission for packets assumed lost, according to its internal mechanisms. These mechanisms are fast retransmission and retransmission after timer expiration. These mechanisms incur a specific latency for the retransmitted packets which can be calculated precisely and captured in a simple model. In this way the encoder can take into account the additional delays (which are translated to lost macroblocks) explicitly and in real-time. The encoder can also use the closed-form TCP throughput formula for rate control [23].

Our intention with this paper is to demonstrate that real-time video encoding and TCP streaming can be done more efficiently together than the current practice, since we can tackle some of the problems by employing a new systematic optimization approach. What we suggest is to use a new rate–distortion metric that takes into account the aforementioned TCP behavior (through the developed models), so that we can estimate the expected distortion of the video signal at the decoder. Subsequently, the real-time encoder will use this metric in order to optimize decisions for the encoding of individual macroblocks. This is essentially an implicit cross-layer mechanism since the application will regulate its behavior according to the limitations of the underlying protocol but without requiring any modifications of the protocol stack. In summary, two are the goals that we set to achieve with this paper:

  • Derive analytically the expected distortion for a video bitstream at the decoder when TCP-Reno is used for transport.

  • Select optimal encoding mode for each individual macroblock at the encoder.

The rest of this paper is organized as follows. In Section 2 we present the related work in the area of video streaming with TCP. Section 3 presents an overview of the proposed system, while important aspects of our system like the used codec, packetization method, error-concealment method, and network model are covered in more detail. In Section 4 we analyze, based on the previous configuration, how specific encoded macroblocks of a video sequence are sent and lost with TCP. From this analysis we derive the macroblock loss probability as a function of TCP sender parameters. Subsequently, Section 5 analyzes how the latency of these lost macroblocks is affected, since they are retransmitted by the reliable TCP protocol. By knowing which macroblocks are lost, and if they can meet their deadline or not, we present in Section 6 an analytical model of the expected decoder distortion. In Section 7, we present an algorithm for the main goal of our analysis: select the optimal encoding mode based on the expected decoder distortion. Experimental results are finally presented in Section 9 while Section 10 concludes this paper.

Section snippets

Related work

During the last few years, the research community has proposed a number of optimization mechanisms for video streaming with TCP. We review the most noteworthy of these mechanisms next. One of the first mechanisms is time-lined TCP [22], where TCP video streaming is realized by allowing the operating system to control the transmission of data that have strict deadlines. A similar approach to the previous one can be found in [17]. In that work, the main task of the TCP-RTM receiver is to deliver

System overview

Fig. 1 depicts a simplified block diagram of the end-to-end real-time video streaming system. Based on this system model we will derive the desired metric that captures the end-to-end distortion. Besides encoding the input sequence, the encoder performs rate control by polling the TCP protocol in order to obtain the instantaneous available channel rate. At the client, a small startup delay is used before the video playback starts, while the server continuously sends video packets which are

Packet-level analysis of video streaming with TCP

Our primary goal is to understand on the packet level, how TCP would send a stream of video packets that are packetized as groups of encoded macroblocks. Essentially, we want to be able to identify precisely which macroblocks are expected to be lost when the TCP packet that contains them is lost. Ultimately this knowledge will be used for estimating the expected distortion at the decoder. An important point that must be clear before we start our analysis is that we are concerned only with the

TCP packet latency

What we did until this point is to identify the lost TCP packets and the corresponding macroblocks. However, our job is not done since we have to calculate the expected latency for the TCP packets, which might have been lost or not, since excessive latency may result in useless video packets at the decoder. If ts and td represent the time sent and the deadline for Min, respectively, then we denote the maximum allowed latency as Δt(n,i)=td(n,i)-ts(n,i). Clearly, Δt might have a negative value

Analytical model of the expected decoder distortion

Based on the macroblock and packet loss probabilities that we calculated, our goal in this section is to derive analytically the expected distortion at the decoder as a function of the latency introduced by TCP, the packetization method, and the error-concealment method used at the decoder.

One of the assumptions that we make is that the first intraframe in the sequence is always received, even after a number of retransmissions must take place. We have named Min the coded macroblock at location i

Encoding mode selection with Lagrangian optimization

The question that rises now is how to utilize this analytical model of the expected decoder distortion. What we claim is that the more accurate distortion estimate at the encoder can lead to a better allocation of the available TCP rate through selection of the encoding mode of each individual macroblock. Even though this principle has been demonstrated before [36], [33], in this paper we are the first to consider the effect of a transport protocol, and the specific loss pattern that

Implementation issues

Based on our analysis that we performed, our objective now is to define a practical streaming protocol. In Fig. 5, we present in pseudo-code the final protocol, which is based on the previous analysis and derived analytical results. As it can be understood, identifying the optimal encoding vector Θ* is a time consuming task since the MAD operation must be performed twice for the m macroblocks of a single frame. For the QCIF sequences (176×144), we have that m=99, and therefore 198 MAD

Experiments

The network setup shown in Fig. 6 was used throughout our experiments in this paper. The scenario assumes a sender and a receiver that are linux boxes while a freeBSD machine is used for controlling the bottleneck link. The Dummynet software [7] was used in the middlebox in order to emulate various link configurations in terms of packet loss rate, bandwidth, and delay. The QCIF FOREMAN, CARPHONE, and MISS AMERICA sequences [29] were used for real-time encoding with the H.263++ encoder [11] at

Conclusions

In this paper we presented a mechanism for video streaming with the transmission control protocol (TCP) that uses a new RD metric (TCP-RDM) that characterizes the expected video distortion at the decoder. Initially, we developed this metric so that it considers TCP introduced latency, the packetization method, and error-concealment method at the receiver. Based on this analytical distortion model we proposed an algorithm for rate–distortion optimized mode selection (RDOMS-TCP). This algorithm

References (36)

  • E. Gurses et al.

    A simple and effective mechanism for stored video streaming with TCP transport and server-side adaptive frame discard

    Comput. Networks

    (July 2005)
  • A.A. Abouzeid, S. Roy, M. Azizoglu, Stochastic modeling of TCP over lossy links, in: INFOCOM,...
  • N. Cardwell, et al., Modeling TCP latency, in: INFOCOM,...
  • J. Chakareski, B. Girod, Rate–distortion optimized packet scheduling and routing for media streaming with path...
  • P.A. Chou, Z. Miao, Rate–distortion optimized streaming of packetized media, Microsoft Research Technical Report...
  • P.A. Chou, A. Sehgal, Rate–distortion optimized receiver-driven streaming over best-effort networks, in: IEEE...
  • G. Cote et al.

    Optimal mode selection and synchronization for robust video communications over error prone networks

    IEEE J on Sel. Areas in Comm.

    (June 2000)
  • Dummynet....
  • S. Floyd et al.

    Promoting the use of end-to-end congestion control in the Internet

    IEEE/ACM Trans. on Networking

    (August 1999)
  • P. Frossard et al.

    Joint source/FEC rate selection for quality optimal MPEG-2 video delivery

    IEEE Trans. on Image Process.

    (December 2001)
  • H.263 codec....
  • M. Handley, S. Floyd, J. Pahdye, J. Widmer, TCP friendly rate control (TFRC): protocol specification, RFC 3448, January...
  • P.-H. Hsiao, H.T. Kung, K.-S. Tan, Video over TCP with receiver-based delay control, in: ACM NOSSDAV,...
  • V. Jacobson, Congestion avoidance and control, in: ACM SIGCOMM, August 1988, pp....
  • M. Kalman et al.

    Techniques for improved rate–distortion optimized video streaming

    ST J. of Res.

    (November 2005)
  • C. Krasic, K. Li, J. Walpole, The case for streaming multimedia with TCP, in: Workshop on Interactive Distributed...
  • S. Liang, D. Cheriton, TCP-RTM: using TCP for real time applications, in: ICNP,...
  • Y.J. Liang, B. Girod, Prescient RD optimized packet dependency management for low-latency video streaming, in: IEEE...
  • Cited by (0)

    A short version of this paper appeared in ICME 2006.

    View full text