Microscopic examination of TCP flows over transatlantic links

doi:10.1016/S0167-739X(03)00079-7

Future Generation Computer Systems

Volume 19, Issue 6, August 2003, Pages 1017-1029

https://doi.org/10.1016/S0167-739X(03)00079-7 Get rights and content

Abstract

Much of the recent research and development in the area of high-speed TCP is focused on the steady state behavior of TCP flows. However, our experience with the first research only transatlantic 2.5 Gbps Lambda link clearly demonstrates the need to focus on the initial stages of TCP. The work we present here examines the behavior of TCP flows at microscopic level over high-bandwidth long delay networks. This examination has led us to study the influence of the minute properties of the underlying network on bursty protocols such as TCP at these very high speeds combined with high latency. In this paper we briefly describe the requirements for such an extreme network environment to support high-speed TCP flows. We also present results collected using transatlantic links at iGrid2002 where we tuned various host parameters and used modified TCP stacks.

Introduction

Grid applications in general can be demanding in terms of bandwidth requirements.

A typical large scale scientific experiment involves at least two parties: a data producer and a data consumer. Many of the large High Energy Physics (HEP) experiments coming online in the near future such as LHC [16], D∅ [12], CDF [11] and BaBar [10] are all excellent examples of this model of computing. In these large experiments the data is mostly produced at a few locations (the data producers) and then many researchers analyze this data at their home institutes and universities (the data consumers). Often the researchers are geographically separated by large distances. The throughput requirements for these experiments are high.

For example, the European Data Grid (EDG) roughly estimates the peak bandwidth of the network traffic that will flow over it in the year 2005 to be 8000 Mbps from a single project (D∅), and that this link will have to be shared among several different HEP projects, all wishing to disseminate their data. Not only does this data need to be collected from the experiment but a large part of it needs to be transferred to various locations over the network. Network architectures are evolving to meet this unprecedented demand. Herein we present research on the scalability of protocols to allow these kinds of applications to reach the required high-bandwidths on new network architectures. One way to provide this bandwidth is by provisioning end-to-end paths, called Lambdas [5], up to several Gbps. In the fall of 2001 SURFnet [18] provisioned a 2.5 Gbps Lambda between Amsterdam and Chicago to be used only for research. Initial tests showed that only increasing the speed of links, switches and routers in the path was not sufficient to obtain a throughput at or near the available bandwidth. We conducted extensive experiments on this transatlantic link both to understand how transport protocols behave and what additional requirements high-speed flows, such as TCP, impose on optical networks. One architectural shift in our high-speed networking experiments was to minimize the number of routers (devices which process packets at layer 3 and above), and instead delegate packet forwarding to switching devices at layer 2 or below. Initial throughput measurements of a single TCP stream over such an extreme network infrastructure (i.e. the SURFnet Lambda) showed surprisingly poor results. This led us to further examination of the dynamics of TCP at the microscopic level to better understand its behavior. The primary motivation of this work is the demand from HEP community to obtain maximum throughput over long distance links using a single or utmost a few TCP streams. Currently in the HEP community there are several projects underway to try to over come these limitations. However, these projects focus primarily on increasing some of the default parameters of TCP (SSThreash [7], etc.). Simply increasing network capacity does not always improve end-to-end performance. The exclusive availability of the SURFnet Lambda for research has allowed us to investigate this problem.

The performance issues of TCP/IP for large data transfers over high-bandwidth long-latency path is a well known problem [6]. The problem is to discover the bottleneck of a TCP flow (the slowest link in a chain of networks) between two PCs connected using a long-latency high-bandwidth path. There are several issues related to this problem: network characteristics (router, switches, slow links), the implementation of the TCP stack and specific parameters passed to the TCP algorithm by the hosts. In Section 2 we examine the characteristics of the equipment and how this influences TCP and what requirements TCP imposes on the network. Section 3 briefly discusses some of the problems with the TCP algorithm on high-bandwidth long delay paths. In Section 4 we discuss the effect of host and operating system parameter tuning on performance. Section 5 shows test results from different modifications to the TCP/IP algorithms and particular those of HSTCP [4] implemented by the Net100 [17] project.

In the following sections we broadly classify the various stages of a TCP session into: bandwidth discovery phase (aka slow start), steady state [4], and congestion avoidance. In this work we focus mostly on the initial phase of a TCP flow, the bandwidth discovery, as we believe that this phase most influences the bandwidth obtained using TCP.

Section snippets

Properties of underlying network infrastructure

The initial configuration used for the SURFnet Lambda (2.5 Gbps) is shown in Fig. 1. Two high-end personal computers (PCs) were connected using Gigabit Ethernet via two Time Division Multiplexer (TDM) switches and a router. The TDM switches are capable of encapsulating Ethernet packets in SONET frames up to the rate of the specific SONET channel. The hosts were connected at 1 Gbps to a first version of the TDM switch. The linecard to backplane interface posed a 622 Mbps limitation on the datapath.

TCP

TCP is a sender-controlled sliding window protocol [2]. New data up to “window size” is sent when old data has been acknowledged by the receiver. The window size is limited by the host and application parameters such as socket buffer size and Cwnd [1]. TCP adjusts the Cwnd dynamically using different algorithms depending on which phase the flow is currently in. We will focus here on understanding the slow start phase. We have tested various modifications to the congestion avoidance algorithm

Host parameters

Implementations of the TCP algorithms vary between operating systems. The behavior of TCP depends on the particular implementation and architecture of the PC, such as host bus speed, devices sharing the bus, Network Interface Card (NIC), interrupt coalescing, inter-packet delay [20], etc. Thus using the same values as described in [19] on two different configurations can still produce varying results, especially during the bandwidth discovery phase. These differences may become less noticeable

Pacing out packets at device level

From the discussions in the preceding sections it is clear that some sort of pacing of the packets should improve performance. We implemented a delay at the device driver level, i.e. a blocking delay of 0 (μ)s. Results are shown in Fig. 11, Fig. 12. From the results we conclude that the sender should not burst packets, but try to shape the flow according to a leaky bucket algorithm. Though this may be hard to implement in the OS since it requires that the OS maintain a timer per TCP flow with μs

Conclusion

We have shown that tuning the host parameters and HSTCP are very important when trying to make best use of available bandwidth over high-bandwidth long-delay networks. The maximum throughput obtained over the transatlantic link (96 ms RTT) was 730 Mbps using a single TCP stream. Initial tests show that these modifications do not adversely affect other flows, but this still needs closer examination in an isolated environment with a large number of heterogeneous flows. Also we have shown that for

Uncited references

[9], [13], [14], [15], [21].

Acknowledgements

The transatlantic links used for conducting this research are provided to us by SURFnet, TYCO and LEVEL3. Antony Antony and Hans Blom are funded by the IST Program of the European Union (grant IST-2001-32459) via the DataTAG project. Jason Lee was supported in part by the Director, Office of Science, Office of Advanced Scientific Computing Research and Mathematical, Information and Computational Sciences Division under US Department of Energy Contract No. DE-AC03-76SF00098. The authors would

Antony Antony is researcher at the NIKHEF, The Netherlands. Antony received his Bachelor of Engineering in Electrical Engineering from University of Bombay. Over the past years he has been involved in several advanced networking projects. Current projects include DataTAG and NetherLight. His current research interests are dynamics of transport protocols over long fat networks, inter-domain routing protocols such as BGP and GMPLS.

References (21)

M. Allman, et al., TCP Congestion Control,...
T. Dunigan, M. Mathis, B. Tierney, A TCP Tuning Daemon....
S. Floyd, Limited Slow-Start for TCP with Large Congestion Windows....
S. Floyd, S. Ratnasamy, S. Shenker, Modifying TCP’s Congestion Control for High Speeds, 2001....
C. de Laat, E. Radius, S. Wallace, The rationale of the current optical networking initiatives, in: Special Issue on...
J. Lee, D. Gunter, B. Tierney, W. Allock, J. Bester, J. Bresnahan, S. Tuecke, Applied techniques for high bandwidth...
J.P. Martin-Flatin, S. Ravot, TCP congestion control in fast long-distance networks, Technical Report CALT-68-2398,...
RFC 793, in: J. Postel (Ed.), Transmission Control...
10Gbps, OC192 link to iGrid2002....
BaBar....

There are more references available in the full text version of this article.

Cited by (8)

Data mining middleware for wide-area high-performance networks
2006, Future Generation Computer Systems
Citation Excerpt :
Moving large data sets over high-speed wide-area networks has been recognized as a challenging task for many years. During iGrid 2002, various groups demonstrated prototypes of several different tools for high-performance data transport [2,3,9,16,18,21]. Since then, various new data transport protocols or related congestion control algorithms [8,10,12] have been designed and developed.
In this paper, we describe two distributed, data intensive applications that were demonstrated at iGrid 2005 (iGrid Demonstration US109 and iGrid Demonstration US121). One involves transporting astronomical data from the Sloan Digital Sky Survey (SDSS) and the other involves computing histograms from multiple high-volume data streams. Both rely on newly developed data transport and data mining middleware. Specifically, we describe a new version of the UDT network protocol called Composible-UDT, a file transfer utility based upon UDT called UDT-Gateway, and an application for building histograms on high-volume data flows called BESH (for Best Effort Streaming Histogram). For both demonstrations, we include a summary of the experimental studies performed at iGrid 2005.
Native 10 Gigabit Ethernet experiments over long distances
2005, Future Generation Computer Systems
Citation Excerpt :
TCP carries the bulk of today's Internet traffic. Several authors have shown that the most common implementations of this protocol have poor performance on high-bandwidth, high-latency connections when packet loss occurs [10–12]. Research in the area of higher-layer transport protocols adapted to long-haul, high-bandwidth connectivity is very active and produced a range of TCP implementations that are capable of efficiently exploiting the available bandwidth.
The current solutions for transmitting data over Wide Area Networks (WANs) are expensive and require protocol translation at layer 1. The IEEE recently standardized the 10 Gigabit Ethernet (10 GE) WAN PHY as a native gateway from the Local Area Networks (LAN) to the WAN. This opened a debate as to whether Ethernet is now a valid alternative to Synchronous Optical Network/Synchronous Digital Hierarchy (SONET/SDH) for WANs. In this article, we report on the experience gathered while building the first trans-European native 10 Gigabit Ethernet testbed based on WAN PHY. We describe and analyze network tests with a 1700 km Ethernet network. Our work validates this approach and indicates that Ethernet can offer a large bandwidth to long-distance bulk data transfers at a trans-European level.
The rationale of the current optical networking initiatives
2003, Future Generation Computer Systems
The future of networking is to move to an entirely optical infrastructure. Several leading National Research Networking organizations are creating test-beds to pilot the new paradigm. This paper explores some thoughts about the different usage models of optical networks. Different classes of users are identified. The services, required by the Internet traffic from those different classes of users, are analysed and a differentiated Internet architecture is proposed to minimize the cost per transported packet for the whole architecture.
TCP Rapid: From theory to practice
2017, Proceedings - IEEE INFOCOM
QTCP: An optimized and improved congestion control algorithm of high-speed TCP networks
2011, Communications in Computer and Information Science
Performance of competing high-speed TCP flows
2006, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

Johan Blom graduated at Utrecht University with a thesis entitled “Topological and Geometrical Aspects of Image Structure” in 1992. After that period he worked at the same university in the field of particle tracking using a Radon transform embedded in a multi-resolution structure. In 1996 the subject changed to computer guided education in a collaboration between the Universities of Amsterdam and Utrecht. Applications were developed that combined the Maple computer algebra application with educational groupware packets and a Web interface for grading, submitting documents and storage in a relational database. After that period he worked in the field of Network monitoring and testing. First in Utrecht and since 2001 in Amsterdam. A host oriented network monitor package had been developed with a Web based interface to view the results. Scripts were developed for automated testing of Gigabit networks, using many hosts running standard TCP/UDP generator applications, that were modified when required, and combined with SNMP monitoring of the network interfaces.

Cees de Laat is senior scientific staff member of the Informatics Institute at the University of Amsterdam. He received a PhD in physics from the University of Delft. He has been active in data acquisition systems, heavy ion experiments and virtual laboratories. Over the past 7 years he has been investigating applications for advanced research networks. Current projects include optical networking, lambda switching and provisioning, policy-based networking and Authorization, Authentication and Accounting Architecture Research. He participates in the European DataGrid project and the Dutch ASCII DAS project. He is responsible for the research on the Lambda switching facility (“NetherLight”), which is currently being built in Amsterdam as a peer to StarLight in Chicago. He implements research projects in the GigaPort Networks area in collaboration with SURFnet. He currently serves as Grid Forum Steering Group member, Area Director for the Peer to Peer area and GGF Liaison towards the IETF. He is co-chair of the IRTF Authentication, Authorization and Accounting Architecture Research Group and member of the Internet Research Steering Group (IRSG). http://www.science.uva.nl/∼delaat.

Jason Lee is a computer scientist currently on sabbatical with the University of Amsterdam’s Advanced Internet Research Group from his research at Lawrence Berkeley National Laboratory where he has worked for the last 10 years in the Data Intensive Distributed Computing Group on high-speed networking. Jason has many varied interests that include, but are not limited to the following: distributed computing, Gigabit networking, chaos theory, neural networks and security.

Wim Sjouw after graduating in physics (1971) with specialization in “medical physics” he joined a research group of physiological psychologists at the Utrecht University to study “evoked responses”. Introducing computer in this field resulted in very exiting results on relating human behavior with (mal)functioning of the brain on the single trial level. In 1987 a reorganization cause him to make the switch to designing and implementing LANs, the most exciting being the network of the Utrecht University (>10 000 connections). This led to doing research on innovative developments in networking. In 1999 he joined the Advanced Internet Research Group at the University of Amsterdam.

View full text