Real-time and rate–distortion optimized video streaming with TCP☆
Introduction
Streaming pre-encoded or real-time video over the Internet is a popular application that continues to gain interest as new entertainment services are continuously introduced. In today's Internet, the transmission control protocol (TCP) protocol that transports the bulk of the existing traffic [8] is considered unsuitable for video streaming applications, while its counterpart UDP is usually the protocol of choice. The main reasons why TCP is unsuitable for this class of applications are the rapid throughput fluctuations and the reliability mechanism which introduces additional delays [31]. Therefore, it is generally believed that the transport protocol of choice for video streaming should be UDP, on top of which several application-specific mechanisms can be built (error control, rate control, etc.) [31]. However, the absence of congestion control from UDP can cause performance deterioration for TCP-based applications if wide-scale deployment takes place in the Internet [8], [14]. That is the reason behind the IETF effort to define a new rate control protocol, which is characterized by the same behavior with TCP in the long run, but allows smoother throughput fluctuations [12]. Even so, this standardization effort is in progress and is considered an active research topic. Despite these efforts, however, the majority of commercial IP-based video streaming systems unexpectedly so employ TCP for transport layer services of pre-encoded video content [25], [32], [24]. The widespread use of TCP and its well-understood behavior dominated over the concerns regarding TCP's deficiencies.
We will see that even with the reliable TCP, the problem is essentially error control. Handling errors is a critical task in a video communication system. Especially for real-time video communications delay constraints are very strict, making thus retransmission of lost packets not particularly useful in a practical setup. To overcome this problem, methods like forward error correction (FEC) are usually employed [26], [9]. However, the primary environment where FEC methods are effective is when random bit errors hit the bitstream, while buffer overflow in the Internet generates packet erasures because of the packet drops. This is what would happen if we build a video streaming system with TCP. Furthermore, even though TCP has the inherent retransmission mechanism for error control, when the application payload are video packets, error control obtains a new meaning. This is true because TCP may be able to recover packets with retransmissions but they may be received too late for playback. While someone could “hack” TCP so that it can avoid unnecessary retransmissions, we believe that this cross-layer option is rather invasive to the standardized TCP.
Therefore, blindly using TCP for video streaming will not yield the best possible performance. We believe that when TCP is used for video streaming, we should take into consideration its behavior explicitly so that we can maximize the delivered video quality. The question that we will answer in this paper is how to perform this task efficiently. TCP is a reliable protocol and uses retransmission for packets assumed lost, according to its internal mechanisms. These mechanisms are fast retransmission and retransmission after timer expiration. These mechanisms incur a specific latency for the retransmitted packets which can be calculated precisely and captured in a simple model. In this way the encoder can take into account the additional delays (which are translated to lost macroblocks) explicitly and in real-time. The encoder can also use the closed-form TCP throughput formula for rate control [23].
Our intention with this paper is to demonstrate that real-time video encoding and TCP streaming can be done more efficiently together than the current practice, since we can tackle some of the problems by employing a new systematic optimization approach. What we suggest is to use a new rate–distortion metric that takes into account the aforementioned TCP behavior (through the developed models), so that we can estimate the expected distortion of the video signal at the decoder. Subsequently, the real-time encoder will use this metric in order to optimize decisions for the encoding of individual macroblocks. This is essentially an implicit cross-layer mechanism since the application will regulate its behavior according to the limitations of the underlying protocol but without requiring any modifications of the protocol stack. In summary, two are the goals that we set to achieve with this paper:
- •
Derive analytically the expected distortion for a video bitstream at the decoder when TCP-Reno is used for transport.
- •
Select optimal encoding mode for each individual macroblock at the encoder.
The rest of this paper is organized as follows. In Section 2 we present the related work in the area of video streaming with TCP. Section 3 presents an overview of the proposed system, while important aspects of our system like the used codec, packetization method, error-concealment method, and network model are covered in more detail. In Section 4 we analyze, based on the previous configuration, how specific encoded macroblocks of a video sequence are sent and lost with TCP. From this analysis we derive the macroblock loss probability as a function of TCP sender parameters. Subsequently, Section 5 analyzes how the latency of these lost macroblocks is affected, since they are retransmitted by the reliable TCP protocol. By knowing which macroblocks are lost, and if they can meet their deadline or not, we present in Section 6 an analytical model of the expected decoder distortion. In Section 7, we present an algorithm for the main goal of our analysis: select the optimal encoding mode based on the expected decoder distortion. Experimental results are finally presented in Section 9 while Section 10 concludes this paper.
Section snippets
Related work
During the last few years, the research community has proposed a number of optimization mechanisms for video streaming with TCP. We review the most noteworthy of these mechanisms next. One of the first mechanisms is time-lined TCP [22], where TCP video streaming is realized by allowing the operating system to control the transmission of data that have strict deadlines. A similar approach to the previous one can be found in [17]. In that work, the main task of the TCP-RTM receiver is to deliver
System overview
Fig. 1 depicts a simplified block diagram of the end-to-end real-time video streaming system. Based on this system model we will derive the desired metric that captures the end-to-end distortion. Besides encoding the input sequence, the encoder performs rate control by polling the TCP protocol in order to obtain the instantaneous available channel rate. At the client, a small startup delay is used before the video playback starts, while the server continuously sends video packets which are
Packet-level analysis of video streaming with TCP
Our primary goal is to understand on the packet level, how TCP would send a stream of video packets that are packetized as groups of encoded macroblocks. Essentially, we want to be able to identify precisely which macroblocks are expected to be lost when the TCP packet that contains them is lost. Ultimately this knowledge will be used for estimating the expected distortion at the decoder. An important point that must be clear before we start our analysis is that we are concerned only with the
TCP packet latency
What we did until this point is to identify the lost TCP packets and the corresponding macroblocks. However, our job is not done since we have to calculate the expected latency for the TCP packets, which might have been lost or not, since excessive latency may result in useless video packets at the decoder. If and represent the time sent and the deadline for , respectively, then we denote the maximum allowed latency as . Clearly, might have a negative value
Analytical model of the expected decoder distortion
Based on the macroblock and packet loss probabilities that we calculated, our goal in this section is to derive analytically the expected distortion at the decoder as a function of the latency introduced by TCP, the packetization method, and the error-concealment method used at the decoder.
One of the assumptions that we make is that the first intraframe in the sequence is always received, even after a number of retransmissions must take place. We have named the coded macroblock at location i
Encoding mode selection with Lagrangian optimization
The question that rises now is how to utilize this analytical model of the expected decoder distortion. What we claim is that the more accurate distortion estimate at the encoder can lead to a better allocation of the available TCP rate through selection of the encoding mode of each individual macroblock. Even though this principle has been demonstrated before [36], [33], in this paper we are the first to consider the effect of a transport protocol, and the specific loss pattern that
Implementation issues
Based on our analysis that we performed, our objective now is to define a practical streaming protocol. In Fig. 5, we present in pseudo-code the final protocol, which is based on the previous analysis and derived analytical results. As it can be understood, identifying the optimal encoding vector is a time consuming task since the MAD operation must be performed twice for the m macroblocks of a single frame. For the QCIF sequences , we have that , and therefore 198 MAD
Experiments
The network setup shown in Fig. 6 was used throughout our experiments in this paper. The scenario assumes a sender and a receiver that are linux boxes while a freeBSD machine is used for controlling the bottleneck link. The Dummynet software [7] was used in the middlebox in order to emulate various link configurations in terms of packet loss rate, bandwidth, and delay. The QCIF FOREMAN, CARPHONE, and MISS AMERICA sequences [29] were used for real-time encoding with the encoder [11] at
Conclusions
In this paper we presented a mechanism for video streaming with the transmission control protocol (TCP) that uses a new RD metric (TCP-RDM) that characterizes the expected video distortion at the decoder. Initially, we developed this metric so that it considers TCP introduced latency, the packetization method, and error-concealment method at the receiver. Based on this analytical distortion model we proposed an algorithm for rate–distortion optimized mode selection (RDOMS-TCP). This algorithm
References (36)
- et al.
A simple and effective mechanism for stored video streaming with TCP transport and server-side adaptive frame discard
Comput. Networks
(July 2005) - A.A. Abouzeid, S. Roy, M. Azizoglu, Stochastic modeling of TCP over lossy links, in: INFOCOM,...
- N. Cardwell, et al., Modeling TCP latency, in: INFOCOM,...
- J. Chakareski, B. Girod, Rate–distortion optimized packet scheduling and routing for media streaming with path...
- P.A. Chou, Z. Miao, Rate–distortion optimized streaming of packetized media, Microsoft Research Technical Report...
- P.A. Chou, A. Sehgal, Rate–distortion optimized receiver-driven streaming over best-effort networks, in: IEEE...
- et al.
Optimal mode selection and synchronization for robust video communications over error prone networks
IEEE J on Sel. Areas in Comm.
(June 2000) - Dummynet....
- et al.
Promoting the use of end-to-end congestion control in the Internet
IEEE/ACM Trans. on Networking
(August 1999) - et al.
Joint source/FEC rate selection for quality optimal MPEG-2 video delivery
IEEE Trans. on Image Process.
(December 2001)
Techniques for improved rate–distortion optimized video streaming
ST J. of Res.
Cited by (0)
- ☆
A short version of this paper appeared in ICME 2006.