MTCP: scalable TCP-like congestion control for reliable multicast☆
Introduction
As the Internet becomes more diversified in its capabilities, it becomes feasible to develop and offer services and applications that were not possible under earlier generations of Internet technologies. The Multicast Backbone (MBONE) and IP-multicast are two Internet technologies that have enabled a wide range of new applications. Using multicast, large-scale conferencing involving hundreds to thousands of participants is possible over the Internet. As multicast technologies become more widely deployed, we expect to see new multicast-based applications that demand more bandwidth and higher speed. Many of these applications will require reliable data transfer.
Multicast traffic generated by these applications can be of two types: quality-of-service guaranteed and best effort. QoS guaranteed traffic requires the underlying network to provide per-flow resource reservation and admission control services. Unless these services become widely deployed over the Internet and made sufficiently inexpensive for general use, they will likely be available only to a small fraction of future Internet traffic, and multicast traffic will be primarily best effort. This work is concerned with the flow and congestion control of best-effort multicast traffic.
Congestion control is an integral part of any best-effort Internet data transport protocol. It is widely accepted that the end-to-end congestion control mechanisms employed in TCP [1] have been one of the key contributors to the success of the Internet. A conforming TCP flow is expected to respond to congestion indication (e.g., packet loss) by drastically reducing its transmission rate and by slowly increasing its rate during steady state. This congestion control mechanism encourages the fair sharing of a congested link among multiple competing TCP flows. A flow is said to be TCP-compatible or TCP-like if it behaves similar to a flow produced by TCP under congestion [2]. At steady state, a TCP-compatible flow uses no more bandwidth than a conforming TCP connection running under comparable conditions.
Unfortunately, most of the multicast schemes proposed so far do not employ end-to-end congestion control. Since TCP strongly relies on other network flows to use congestion control schemes similar to its own, TCP-incompatible multicast traffic can completely lock out competing TCP flows and monopolize the available bandwidth. Furthermore, multicast flows insensitive to existing congestion (especially congestion caused by their own traffic) are likely to cause simultaneous congestion collapses in many parts of the Internet [3]. Because of the potential far-reaching damage of TCP-incompatible multicast traffic, it is highly unlikely that transport protocols for large-scale reliable multicast will become widely accepted without TCP-like congestion control mechanisms.
The main challenge of congestion control for reliable multicast is scalability. To respond to congestion occurring at many parts of a multicast tree within a TCP time-scale, the sender needs to receive immediate feedback regarding the receiving status of all receivers. However, because of the potentially large number of receivers involved, the transmission of frequent updates from the receivers directly to the sender becomes prohibitively expensive and non-scalable.
Another challenge is the isolation of the effects of persistent congestion. As a single multicast tree may span many different parts of the Internet, TCP-like congestion control will reduce the sender's transmission rate upon indication of congestion in any part of the tree. While such a feature fosters fairness among different flows (inter-fairness), it does not address the issue of fairness among the receivers in the same multicast group (intra-fairness) [4]. Specifically, it would be unfair for non-congested receivers to be subject to a low transmission rate just because of some isolated instances of congestion.
In this paper, we introduce multicast TCP (MTCP), a new congestion control protocol for reliable multicast that addresses the inter-fairness and scalability issues. The issue of intra-fairness is outside the scope of this paper, and it will be addressed in future work. Our protocol is based on a multilevel logical tree where the root is the sender, and the other nodes in the tree are receivers. The sender multicasts data to all receivers, and the latter send acknowledgments to their parents in the tree. Internal tree nodes, hereafter referred to as sender's agents (SAs), are responsible for handling feedback generated by their children and for retransmitting lost packets. MTCP incorporates several novel features, including:
- 1.
hierarchical congestion status reports that distribute the load of processing feedback from all receivers across the multicast group,
- 2.
the relative time delay (RTD) concept which overcomes the difficulty of estimating round-trip times (RTTs) in tree-based multicast environments,
- 3.
window-based control that prevents the sender from transmitting faster than packets leave the bottleneck link on the multicast path through which the sender's traffic flows,
- 4.
a retransmission window that regulates the flow of repair packets to prevent local recovery from causing congestion, and
- 5.
a selective acknowledgement scheme employed at SAs to prevent independent (i.e., non-congestion-related) packet loss from reducing the sender's transmission rate.
We have implemented MTCP both on UDP in SunOS 5.6 and on the simulator ns, and we have conducted extensive Internet experiments and simulation to test the scalability and inter-fairness properties of the protocol. The encouraging results from these experiments indicate that MTCP is an effective flow and congestion control protocol for reliable multicast.
Tree-based protocols are not new and have been studied by many researchers [5], [6], [7], [8], [9]. However, little work has been done on congestion control for these protocols. Instead, most previous work has focused on the issues of error recovery and feedback implosion. In Refs. [5], [10] it has been analytically shown that tree-based protocols can achieve higher throughput than any other class of protocols, and that their hierarchical structure is the key to reducing the processing load at each member of the multicast group. Tree-based protocols such as RMTP [6] and TMTP [8] do not incorporate end-to-end congestion control schemes and do not guarantee inter-fairness. In Refs. [9], [11] it was proposed to use a tree structure for feedback control, and a detailed description of how to construct such a tree was provided, but no details on congestion control were given. A more detailed discussion on related work can be found in Section 5.
This paper is organized as follows. In Section 2 we provide an overview of MTCP, and in Section 3 we present a detailed description of the protocol and its operation. In Section 4 we present results from Internet experiments and simulation. In Section 5 we discuss related work, and we conclude in Section 6.
Section snippets
Overview of MTCP
MTCP was designed with two goals in mind: TCP-compatibility and scalability. Compatibility with TCP traffic is needed because TCP is the most commonly used protocol in the Internet, and also because the utility of TCP depends on all other network flows being no more aggressive than TCP congestion control (i.e., multiplicative decrease on congestion occurrence, and linear increase at steady state). Non-TCP-compatible flows may lock out TCP traffic and monopolize the available bandwidth.
Selective acknowledgment scheme
In MTCP we use a selective acknowledgment (SACK) scheme in which each feedback contains information about all received packets. We also adopt a delayed acknowledgment scheme in which each ACK is delayed for a few tens of milliseconds before its transmission. Since an SA can quickly detect the packets lost by a receiver and retransmit them, these schemes reduce the number of acknowledgments and retransmissions. Also, our SACK scheme provides a good means to recover from independent, uncorrelated
Internet experiments and simulation
We have implemented MTCP on top of UDP in Posix Threads and C, in SunOS 5.6. For the routing of MTCP packets, we have implemented a special process called mcaster, whose function is similar to that of mroutd in the MBONE. An mcaster simply “tunnels” incoming packets by first multicasting them to its own subnet via IP-multicast, and then forwarding them to the mcasters of its child sites in the tree via UDP. The members of the multicast group in the Internet experiments were distributed in five
Related work
Many reliable multicast protocols have been proposed in the literature [6], [7], [8], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29]. For the purposes of our discussion, we classify these protocols into three broad categories: unstructured, structured and hybrid. We examine the protocols in each category with an emphasis on their congestion control techniques.
Unstructured protocols do not impose any structure among receivers, and Pingali et al. [30] further classify them into
Concluding remarks
We have presented MTCP, a set of congestion control mechanisms for tree-based reliable multicast protocols. MTCP was designed to effectively handle multiple instances of congestion occurring simultaneously at various parts of a multicast tree. We have implemented MTCP, and we have obtained encouraging results through Internet experiments and simulation. In particular, our results indicate that (1) MTCP can quickly respond to congestion anywhere in the tree, (2) MTCP is TCP-compatible, in the
Injong Rhee received the B.E. degree in Electrical Engineering from Kyung-Pook National University, Taegu, Korea, in 1989, and the Ph.D. degree in Computer Science from the University of North Carolina at Chapel Hill in 1994. He conducted postdoctoral research for two years at Warwick University, Warwick, U.K., and Emory University, Atlanta, GA. He joined the Department of Computer Science at North Carolina State University at Raleigh in 1997, where he is currently an Associate Professor. He
References (36)
Congestion avoidance and control
- B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge,...
- S. Floyd, K. Fall, Router mechanisms to support end-to-end congestion control, Technical report, Lawrence Berkeley...
- et al.
Inter-receiver fairness: a novel performance measure for multicast ABR sessions
- et al.
The case for reliable concurrent multicasting using shared ack trees
- S. Paul, K.K. Sabnani, J.C. Lin, S. Bhattacharyya, Reliable multicast transport protocol (RMTP), in: Proceedings of...
- et al.
Log-based receiver-reliable multicast for distributed interactive simulation
- et al.
A reliable dissemination protocol for interactive collaborative applications
A generic concept for large-scale multicast
- B.N. Levine, J.J. Garcia-Luna-Aceves, A comparision of known classes of reliable multicast protocols, in: Proceedings...
Adding scalability to transport level multicast
TCP Vegas: new techniques for congestion detection and avoidance
Random early detection gateways for congestion avoidance
IEEE/ACM Transactions on Networking
A delay-based approach for congestion avoidance in interconnected heterogeneous computer networks
Computer Communications Review
Cited by (12)
A novel congestion control model for interworking AAA in heterogeneous networks
2009, Journal of China Universities of Posts and TelecommunicationsEbbRT: A framework for building per-application library operating systems
2016, Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016A TCP-friendly congestion control scheme for multicast with network coding
2013, Journal of Computational Information SystemsFlexible application-layer multicast in heterogeneous networks
2013, Flexible Application-Layer Multicast in Heterogeneous NetworksPerformance modeling on congestion control of integrated AAA servers in 3GPP-WLAN interworking networks
2009, Proceedings of 2009 2nd IEEE International Conference on Broadband Network and Multimedia Technology, IEEE IC-BNMT2009Power allocation scheme for multiple spotbeam satellite communication systems with coexisting unicast and multicast flows
2008, Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition)
Injong Rhee received the B.E. degree in Electrical Engineering from Kyung-Pook National University, Taegu, Korea, in 1989, and the Ph.D. degree in Computer Science from the University of North Carolina at Chapel Hill in 1994. He conducted postdoctoral research for two years at Warwick University, Warwick, U.K., and Emory University, Atlanta, GA. He joined the Department of Computer Science at North Carolina State University at Raleigh in 1997, where he is currently an Associate Professor. He has published in the areas of computer networks, multimedia networking, distributed systems, and operating systems. Dr. Rhee received the NSF Faculty Career Development Award in 1999.
George N. Rouskas received the Diploma in Electrical Engineering from the National Technical University of Athens (NTUA), Athens, Greece, in 1989, and the M.S. and Ph.D. degrees in Computer Science from the College of Computing, Georgia Institute of Technology, Atlanta, GA, in 1991 and 1994, respectively. He joined the Department of Computer Science, North Carolina State University in August 1994, and he has been an Associate Professor since July 1999. During the 2000–2001 academic year he spent a sabbatical term at Vitesse Semiconductor, Morrisville, NC, and in May and June 2000 he was an Invited Professor at the University of Evry, France. His research interests include network architectures and protocols, optical networks, multicast communication, and performance evaluation. He is a recipient of a 1997 NSF Faculty Early Career Development (CAREER) Award, and a co-author of a paper that received the Best Paper Award at the 1998 SPIE conference on All-Optical Networking. He also received the 1995 Outstanding New Teacher Award from the Department of Computer Science, North Carolina State University, and the 1994 Graduate Research Assistant Award from the College of Computing, Georgia Tech. He was a co-guest editor for the IEEE Journal on Selected Areas in Communications, Special Issue on Protocols and Architectures for Next Generation Optical WDM Networks, published in October, 2000, and is on the editorial boards of the IEEE/ACM Transactions on Networking, Computer Networks, and the Optical Networks Magazine. He is a member of the IEEE, the ACM and of the Technical Chamber of Greece.
- ☆
This work was supported by a grant from the NCSU Center for Advanced Computing and Communications (CACC). An earlier version of this paper appeared in the Proceedings of Infocom'99.