Introduction

Nowadays, a lot of peer-to-peer live streaming systems have already emerged on Internet to commercial service such as PPLive [1], QQLive [2], Zattoo [3], Roxbeam [4] to name a few and the technology for applications like TV broadcasting has been getting mature. Nevertheless, almost all these systems target to provide non-interactive live broadcasting service without delay guaranteed. And most systems employ a data-driven/mesh-based P2P streaming protocol [57]. This type of protocol can work fairly well with very large scale users. Some service providers [1] declare that they have the record to support over 1 million concurrent users to watch an event. On the other hand, in academia, many new ideas have also been proposed to improve the performance of large scale non-interactive live P2P streaming [812] recently.

Actually, beyond non-interactive applications, there is an urgent demand of technology support for interactive live application, such as online gaming, online auction, famous person interview & talk, distant learning, new product release & training, video sharing & commenting, etc. In this type of applications, a number of end users (viewers) should watch the same video at the same time, and the viewers can interact with the presenter/publisher at the source. Usually, many of these applications have different level of delay tolerance from several seconds to more than 10 s rather than strictly real-time, and hence it brings potential possibility to use peer-to-peer technology for saving server bandwidth. Meanwhile, although many applications can tolerate a certain amount of delay, however, the delay should be predicable and guaranteed. Actually, the aim of this paper is to find a practical way to provide efficient low-cost open live streaming service with guaranteed delay on Internet for such interactive applications.

Besides, in many interactive applications, the user number in each channel is frequently not large, usually between 5~200. However, there are probably many concurrent channels at the same time. With a lot of small group channels, to improve the system scalability, the best way is to minimize the server bandwidth consumed with respect to the guaranteed delay constraint in every channel. Can live P2P streaming technology help much in such applications? How much server bandwidth is consumed under the constraint of delay? In this paper, we propose and implement a highly efficient protocol—iGridMedia. The basic tradeoff between the required delay and the consumed server bandwidth is studied. Both simulation and PlanetLab experiment show that the performance of the protocol is good. With 5 s guaranteed delay, in static environment (no peer quits after joining), the server bandwidth consumed is only 1.4 times streaming rate even if the peer resource index (defined in Section 3.3) is only 1.4 and the group size is 200; while, in very high churn rate, only about 4 times streaming rate of server bandwidth is consumed when the group size is 100. Our protocol is simple and practical. So far as we know, this work is the current state-of-the-art approach which uses P2P streaming technology to support interactive live applications with guaranteed delay constraint.

The paper is organized as follow. In Section 2, we discuss the related work. Then in Section 3, the detailed protocol of iGridMedia is presented. Its performance is evaluated in Section 4 by both simulation and real-world experiment. We conclude this paper and give our future work in Section 5.

Related work

Most of the recent P2P streaming research work focuses on providing non-interactive live streaming service. CoolStreaming/DONet [6] employs a data-driven protocol to discover content availability among peers, eliminating the trees. PRIME [9] proposes mesh-based based protocol importing swarming content delivery used in file-sharing to P2P streaming systems. Wang et al. [13] shows that using stable peers to form a backbone in the streaming delivery mesh can reduce the transmission delay. R2 [10] presents a protocol by randomly pushing network-coded blocks has better performance compared to traditional pull-based protocol with or without network coding. Outreach [14] seeks to maximize the utilization of available upload bandwidth on each participating peer. Chunkyspread [8] proposes an unstructured multiple tree protocol to degrade the transmission delay and try to improve the robustness of tree-based P2P streaming protocol against peer churn. Some researchers propose hybrid protocol [13, 15] to address both delay and resilience to churn. However, very little work focuses on providing delay-guaranteed live streaming service to support interactive applications using P2P streaming technology, meanwhile, there is a great demand from the industry. We believe this is because it is very difficult not only to provide live streaming service with ensured small delay but also let it have good scalability. So far as we know, this is the first work to use P2P technology to support delay-guaranteed live streaming service.

iGridMedia protocol

Architecture

We target to provide open delay-guaranteed live broadcasting service on Internet. To give such service we assume the service provider has dedicated servers to support the delay-guaranteed interactive applications. As illustrated in Fig. 1, the presenters who want to cast their live show usually use DSL or cable to access Internet and hence have low upload bandwidth. To make full utilization of its upload bandwidth, the presenters should upload their stream to the deployed dedicated servers and the live content will be broadcasted from the server to the end viewers. As shown in Fig. 1, each viewer maintains a rescue connection with the server. We assume the default transmit protocol is UDP. Once an absent streaming packet is about to pass its deadline, it will be requested through the rescue connection from the server.

Fig. 1
figure 1

Service architecture

Basic pull-push protocol

The basic protocol is push-pull and the details can be found in [16]. The protocol is simple and the most important characteristic is its near optimality in upload bandwidth utilization. We briefly introduce this protocol here. For overlay construction, to join a P2P streaming session, nodes must first contact a rendezvous point (RP) which is a server maintaining a partial list of current online nodes in every channel. When a node joins the overlay, it first gets a node list from RP and then randomly finds 15 nodes as its neighbors to keep connections with so that a rich-connected unstructured overlay is built. For the streaming delivery, in the pull mechanism, the video streaming is packetized into fixed-length packets called streaming packets marked by sequence numbers. In all of our simulations and experiments, we pack 1250-byte streaming data in each streaming packet. Each node periodically sends buffer map packets to notify all its neighbors what streaming packets it has in the buffer. Each node then explicitly requests its absent packets from neighbors. Once a packet fails to be pulled, it will be requested again. To avoid duplicated requests, estimating the packet transmission timeout after requesting is important [16]. Since the transmission delay of UDP packet can be well predicted, we use UDP as default transmission protocol. And in the push mechanism, we evenly partition the stream into 16 substreams, and each substream is composed of the packets whose sequence numbers are congruent to the same value modulo 16. Once a packet in one substream is successfully pulled from a peer, the rest packets in this substream will be relayed directly from this peer. When a neighbor quits or packet loss occurs, the pull mechanism is started again.

Server scheduling

The packet scheduling on server is different from the peers. The packets sent out from the server is either pushed or relayed proactively or be pulled by peers passively. The server pushes/relays each fresh packet received from the presenter to part of the peers in every channel. For any packet, if the server directly pushes/relays only one copy to any of the peers, we call it a 1-time-server-push. We say that the server push times is 2 or it is 2-times-server-push, if the server pushes/relayes two copies of each packet to any of the peers, and so on. When the packets are pushed to the peers by the server, the packets will be transmitted between the peers using pull-push protocol. Once an absent packet is about to pass the deadline (described more detailedly below), the peer will directly request this packet from the server through the rescue connection established between the peer and the server. In one word, the server always relays the most fresh packets to the peers and sends back the late packets requested from the peers and within the deadline, the packets will be requested from peers (not server) as much as possible.

To get the packet precise playback deadline, all nodes including the presenters and the viewers should perform clock synchronization with the server when they start to join any channel, using network time protocol (NTP) or just a simple “ping-pong” synchronizing message if the precision requirement is not high. We use t gen to denote the generated clock time of packet i at the presenter. We let t rtt represent the round-trip time between the viewer and the server and let t e denote the end-to-end delay from the presenter to the server. If the required guaranteed delay is T, the viewer will send a packet request to the server, once the packet does not arrive until clock time t gen + T − t e  − t rtt. The server will send the packet to the viewer as soon as it receives the request.

Then we introduce how the server chooses peers to relay the fresh packets. The server select one or several peers for each substream to relay. Each peer reports to the server its current outgoing traffic rate every 15 s. We assume the streaming rate is r and each substream rate is r/16. We use 1, ⋯ ,n to denotes each peer and use O 1, ⋯ ,O n to represent their current outgoing rate. For substream 1, The server finds a peer i 1 who satisfies \(i_1=\max_i \left\{\frac{O_i}{r/16}\right\}\). Actually, we want to relay the substream to a peer who has the most contribution. After that, we let \(O_{i_1}\leftarrow O_{i_1}-r/16\). Then we use the same method to find a peer i 2 to relay substream 2, and so on.

Here, we define term Peer Resource Index (PRI) in every channel here. It is defined as the ratio of the total peer upload capacity \(\sum_{i=1}^{n}{u_i}\) to the minimum bandwidth resource demand (i.e, streaming rate r times the viewer number n), that is., PRI = ∑ u i /nr.

And we define Minimum Server Bandwidth Needed (MSBN) here. If the total peer upload capacity is large enough, that is, ∑ u i  ≥ nr − r, i.e., PRI ≥ 1 − 1/n, MSBN is just the streaming rate r (the server should sent out at least one copy for each packet); while, if the total peer upload capacity is low, ∑ u i  < nr − r, i.e., PRI < 1 − 1/n, the MSBN is the minimum bandwidth resource demand nr minus the total peer upload capacity, that is, MSBN = nr − ∑ u i  = nr(1 − PRI). This is actually the amount of additional server bandwidth that should be supplied except for the total peer upload bandwidth capacity. On the whole, MSBN =  max {r,nr − ∑ u i } =  max {r,nr(1 − PRI)}.

Further, to measure the consumed server bandwidth, we use the term Normalized Consumed Server Bandwidth (NCSB), which is defined as the ratio of the consumed server bandwidth to the minimum server bandwidth needed (MSBN). The normalized server bandwidth consumed is always above 1, and the more it is close to 1, the better the performance is.

Besides, we define Bandwidth Gain as the ratio of the bandwidth capacity demand nr to the consumed server bandwidth, that is, if client-server architecture rather than P2P streaming was employed, how many times server bandwidth would be consumed compared to using P2P streaming. The larger the bandwidth gain is, the better the performance is.

Adaptive server push

Our aim is to minimize the server bandwidth consumption with the constraint of guaranteeing the quality of the video on each node. The challenge is that the total uplink capacity of all the peers are very hard to obtain due to the difficulty of measurement and it is also highly fluctuated. Here we develop an adaptive approach to let the server automatically estimate the minimum server bandwidth needed by the pull requests from peers. Generally speaking, more server bandwidth will be consumed if the peer bandwidth capacity is not enough or the guaranteed delay is small. The packets sent out by the server include pushed packets and pulled packets requested from the peers. So when the server outgoing rate is getting higher, it is reasonable to push more copies of a packet directly to peers rather than waiting for the requests for late packets from the peers. In iGridMedia, we use a simple way to adjust server push times. Assume the current server outgoing traffic rate is r s and streaming rate is r, and the current server push times is γ. If r s /r > 2γ, we let γ←2γ; while if r s /r < 1.2γ, we let γγ/2. This means once the server outgoing traffic rate exceeds 2 times pushed packet bit rate, we double the server push times; while if the server outgoing traffic rate goes down to 1.5 times pushed packet bit rate, we reduce the server push times to half.

Performance evaluation

Simulation

We implement an event-driven packet-level simulator coded in C++ to conduct a series of simulations in this section.Footnote 1 In our simulation, all streaming and control packets and node buffers are carefully simulated. For the end-to-end latency setup, we employ real-world node-to-node latency matrix (2500 × 2500) measured on Internet [17]. We do not consider the queuing management in the routers. The default streaming rate is set to 500 kbit/s. The default neighbor count is 15 and the default request window size is 20 s. We assume that all peers are DSL nodes and assume the bandwidth bottleneck happens only at the last hop, i.e., at the end viewer. To simulate the bandwidth heterogeneity of the peers, we use three different typical DSL nodes. Their upload capacities are 1 Mbit/s, 384 kbit/s and 128 kbit/s respectively. Note that the upload link is the bottleneck. In the simulation, we adjust the fraction of each type of node to obtain different peer resource index (PRI). And for each point in the figures, we average the results by repeating 10 runs with different random seeds.

In our simulation, we assume that the bottleneck is always at the last hop and the server bandwidth is large enough, hence the end viewer can always get sufficient streaming packets to play back due to the rescue connection with the server, that is, all the viewers can watch a full quality of video within the required delay. Therefore, in this section, we mainly study the consumed server bandwidth with respect to different guaranteed delay requirement under different user behavior and network conditions. By default, we use fixed server push times and its default value is 1 in our simulation.

We first investigate the performance in static environment, i.e., no peer quits after joining. Figure 2 shows the normalized consumed server bandwidth (NCSB) with respect to the guaranteed delay in different user number. We set peer resource index (PRI) to 1.4. As shown, when the user number is small such as 20, the NCSB is very close to 1 even if the guaranteed delay is only 2 s. This means that the consumed server bandwidth is nearly equal to the minimum server bandwidth needed (MSBN), i.e., 500 kbit/s. When the user number in a channel is up to 200, the consumed server bandwidth is only 1.2 times the streaming rate 500 kbit/s when the guaranteed delay is limited to 5 s and the bandwidth gain is 200/1.5 = 133 times. And we can also see that, to support 200 users with 2 s guaranteed delay, no more than 3.5 times streaming rate of server bandwidth is consumed.

Fig. 2
figure 2

NCSB with respect to the guaranteed delay and user number in static environment. PRI = 1.4. MSBN = 500 kbit/s

Figure 3 shows the NCSB with respect to the guaranteed delay in different peer resource index. The user number is 100. We see that when the PRI is only 1.2, that is, the total upload capacity is only 20% more than the minimum bandwidth resource demand, the consumed server bandwidth is about 2 times the minimum server bandwidth needed if the required delay is 5 s. The more sufficient the upload capacity of the peers is, the less server bandwidth is consumed. And note that, when the PRI is above 1.6, the real consumed server bandwidth is just the MSBN.

Fig. 3
figure 3

NCSB with respect to the guaranteed delay and peer resource index (PRI) in static environment. User number 100. MSBN = 500 kbit/s

Figure 4 indicates the scenario when the peer upload capacity is less than the minimum bandwidth resource demand. The user number is 100. As shown, when the PRI is only 0.7, the server should supply the additional 30% bandwidth resource needed. We can easily compute the MSBN in such condition is nr(1 − PRI) = 15 Mbit/s. We see that if the required delay is 10 s, the NCSB is only 1.2. This implies that the iGriMedia protocol can ensure that all the upload capacity is fully utilized and that the server only provides the additional bandwidth resource needed. When the PRI is 0.9 and the guaranteed delay is 10 s, the MSBN is about 5M and the NCSB is about 1.6. Note that the absolute server outgoing rate decreases with the increment of the PRI.

Fig. 4
figure 4

NCSB with respect to the guaranteed delay in static environment when the peer upload capacity is low. User number 100

Figure 5 gives the impact of link packet loss ratio. The default protocol in iGridMedia is UDP, so link packet loss ratio is considered here. In this figure, the user number is 100. The figure shows that the performance of iGridMedia gets worse when the link loss ratio is larger. This is because more packets will be pulled rather than be pushed. We see that when the link packet loss ratio is 3%, the bandwidth gain of 5 s guaranteed delay is about 47 times. Later, we will show that using adaptive server push can improve the performance.

Fig. 5
figure 5

NCSB with respect to the guaranteed delay and link packet loss rate in static environment. User number 100. PRI = 1.4. MSBN = 500 kbit/s

Then we investigate the performance in high peer churn rate in Fig. 6. We use Weibull(λ,k) distribution with a CDF \(f(x)=1-e^{-(x/\lambda)^k}\) to randomly generate the lifetime of the viewers. And we assume the peer joining process is a Poisson Process with rate of 20 per second and the maximum online user number is 100 in this figure. The median values of Weibull(300,2), Weibull(400,2), Weibull(500,2), Weibull(600,2) are respectively 104 , 138 , 173 and 208 s. Figure 6 shows the NCSB with respect to the guaranteed delay in different user lifetime distribution. Even in very high churn rate of Weibull(300,2), we get a 5 s guaranteed delay with 6 times streaming rate of server bandwidth consumed. We will show the improvement by adaptive server push later.

Fig. 6
figure 6

NCSB with respect to the guaranteed delay in dynamic environment. PRI = 1.4. User number 100. MSBN = 500 kbit/s

In Fig. 7, we study the scalability of iGridMedia. Although we mainly target the interactive applications with small group, we also would like to see the performance with very large user number. As shown, the user number is ranging from 500 to 8000. And we use dynamic environment with a lifetime distribution of Weibull(600,2). The link packet loss ratio is 1% and PRI is set to 1.4. It is interesting that the consumed server bandwidth do not increase much with rise of the user number, although there is strict guaranteed delay limitation. Even if the guaranteed delay is 5 s, the server bandwidth consumed is only 10.5 times the streaming rate, i.e., 5.25 Mbit/s, when the number of users is 8000.

Fig. 7
figure 7

NCSB with respect to the large user number and the guaranteed delay in Weibull(600,2). Packet loss rate 1%. PRI = 1.4. MSBN = 500 kbit/s

And then, we study the impact of server push times. We vary the server push times in Fig. 8. It shows the NCSB with respect to the guaranteed delay in different server push times in static environment. The link packet loss ratio is 3%. As shown, when we use 1-time-server-push, the consumed server bandwidth is high. With 2-time-server-push, the server bandwidth consumed can be degraded to about half. This is because more direct pushed packets from server results in less requests of late packet to the server. However, when the server push times is up to 3, the server bandwidth consumed is again getting larger. The black curve in the figure shows the performance of our adaptive server push method spends relatively low server bandwidth in most cases.

Fig. 8
figure 8

NCSB with respect to the guaranteed delay and server push times in static environment. Link packet loss rate 3%. PRI = 1.4. User number 100. MSBN = 500 kbit/s

Figure 9 shows the impact of server push times in dynamic environment. The user number is 100. And the user lifetime distribution is Weibull(400,2). We can see that the adaptive server push method can usually get low server bandwidth consumed. With 5 s delay guaranteed, the consumed server bandwidth is about 4 times streaming rate and the bandwidth gain is 25.

Fig. 9
figure 9

NCSB with respect to the guaranteed delay and server push times in dynamic environment (Weibull(400,2)). PRI = 1.4. MSBN = 500 kbit/s

Finally, we give the control overhead of iGridMedia. As shown in Fig. 10, The maximum user number is set to 1000, 4000, 8000 user number and PRI is set to 1.4. Dynamic environment is used and the user lifetime distribution is Weibull(400, 2). The control overhead is defined as the ratio of the control traffic amount to the total traffic. We see that the control overhead is always around or below 1%, which implies a low control traffic amount.

Fig. 10
figure 10

Control overhead. Weibull(400,2). PRI = 1.4

PlanetLab experiment

We have fully implemented iGridMedia protocol and conducted real-world experiments on PlanetLab. The transmission protocol we used is UDP. We upload our viewer programs to 400 nodes on PlanetLab. We also limit the upload bandwidth of each node to 1 Mbit/s, 384 kbit/s and 128 kbit/s and tune the PRI to 1.4. The upload bandwidth is limited at each node by token bucket method in our C++ code. The node lifetime distribution is Weibull(400,2) and the streaming rate is 500 kbit/s. As shown in Fig. 11, when the delay is constrained to 5 s, the server bandwidth consumed is around 3.4 Mbit/s, i.e., 6.8 times streaming rate and the bandwidth gain is about 58. When the delay is limited to 10 s, the consumed server bandwidth is around 1.4 Mbit/s and the bandwidth gain is about 280.

Fig. 11
figure 11

PlanetLab experiment. 400 nodes used. Weibull(400,2). PRI = 1.4. MSBN = 500 kbit/s

Then we conducted two experiments in two typical scenarios. We let iGridMedia to broadcast a real VBR movie file with average streaming rate of 550 kbit/s. The first experiment is to study the performance in small user group with very low delay. In this experiment, we use 20 nodes on PlanetLab and set the guaranteed delay to 1.5 s, and use distribution Weibull(300,2) to simulate the churn rate of users. Since the average online time is only about 260 s, the churn rate is fairly high. And hence the online node number usually keeps between 15 and 20. PRI is set to 1.4.

Figure 12 shows the average server bandwidth consumed. We notice that for most of the time, the bandwidth consumed is less than 1 Mbit/s, that is, less than two times streaming rate.

Fig. 12
figure 12

PlanetLab experiment. Server bandwidth consumed for 1.5 s guaranteed delay in dynamic. Weibull(300,2). PRI = 1.4. Nodes 15sim20. Average streaming rate 550 kbit/s

Figure 13 shows the average delivery ratio of all nodes with time elapsing. Delivery ratio is defined as the ratio of the number of packets that arrive within the guaranteed delay (it is 1.5 s in this experiment) to the number of packets that should arrive. So this metric indicates the quality of the video played back on each node. We can see that the average deliver ratio is very close to 1 although there is extreme peer dynamics. Figure 14 shows the average packet delay is usually under 300 ms. And for this small-group session, for most of the time, the bandwidth gain is between 10 and 20 times.

Fig. 13
figure 13

PlanetLab experiment. Average delivery ratio for 1.5 s guaranteed delay in dynamic. Weibull(300,2). PRI = 1.4. Nodes 15sim20. Average streaming rate 550 kbit/s

Fig. 14
figure 14

PlanetLab experiment. Average packet delay for 1.5 s guaranteed delay in dynamic. Weibull(300,2). PRI = 1.4. Nodes 15sim20. Average streaming rate 550 kbit/s

In the second experiment, we study the performance of iGridMedia under medium user group and low delay. We use 520 PlanetLab nodes and launch 2 iGridMedia client processes on each node to emulate a 1000-node dynamic session. We set the guaranteed delay to 4 s, and use distribution Weibull(250,2) as the user online time. Since the average online time is only about 200 s, the churn rate is even higher. PRI is set to 1.4.

Figure 15 shows the total online nodes with time elapsing. Figure 16 indicates that the server bandwidth consumed for 1000 nodes are usually between 6 and 10 Mbit/s. So the bandwidth gain is around 50 to 120 times.

Fig. 15
figure 15

PlanetLab experiment. Online node number for 4 s guaranteed delay in dynamic. Weibull(250,2). PRI = 1.4. Nodes  1000. Average streaming rate 550 kbit/s

Fig. 16
figure 16

PlanetLab experiment. Server bandwidth consumed for 4 s guaranteed delay in dynamic. Weibull(250,2). PRI = 1.4. Nodes  1000. Average streaming rate 550 kbit/s

Figure 17 gives the average delivery ratio of the session. We note that the average delivery ratio is around 0.98 close to 1. It should be pointed that unlike simulation, the performance of real system is always limited by quite a few practical issues. For example, since each node on PlangLab are shared among a lot of users, the available downlink bandwidth might be less than the streaming rate. So on some nodes the delivery ratio can never achieve 1.

Fig. 17
figure 17

PlanetLab experiment. Average delivery ratio for 4 s guaranteed delay in dynamic. Weibull(250,2). PRI = 1.4. Nodes  1000. Average streaming rate 550 kbit/s

Figure 18 shows the average packet delay in this experiment. For most of the time, the delay is below 2 s.

Fig. 18
figure 18

PlanetLab experiment. Average packet delay for 4 s guaranteed delay in dynamic. Weibull(250,2). PRI = 1.4. Nodes  1000. Average streaming rate 550 kbit/s

Conclusion & future work

In this paper, we present our practical protocol—iGridMedia to support delay-guaranteed peer-to-peer live streaming service for interactive applications. We develop protocol and algorithm to minimize the server bandwidth needed under the constraints of guaranteeing the video quality of each peer. Both simulations and Planet-Lab based real-world experiments show our approach can achieve this goal. Actually, iGridMedia is a nearly fully-featured peer-to-peer streaming system including the implementation of NAT/firewall traversal, UDP/TCP automatically selection and many other features. For future work, on Internet, not all the node can communicate with UDP due to firewall. The impact of TCP connection and firewall should be studied. Since TCP connection have unpredictable transmission delay because of the inherent retransmission mechanism in TCP, it will import more duplicate packets and longer delay, and hence leads to consuming higher server bandwidth. Besides, when the user number is small, the probability that the average peer upload bandwidth is less than the streaming rate is very high. In this case, we would like to use the channel with richer peer upload bandwidth to help the “lean” channel. Server can perceive the channel bandwidth richness by checking its outgoing traffic rate. Further, in some real application, when the user number in a channel is large, not all the user participate the interactive session (they only want to watch), how to provide different QoS service to different users and how to provide smooth user role transfer (such as, from non-interactive user to interactive one) are all future topics.