Prefetching the means for document transfer: a new approach for reducing Web latency
Introduction
The central performance problem of the Internet today is user-perceived latency, that is, the period from the time a user issues a request for a document till the time a response is received. A key realization is that latency is often NOT dominated by document transmission time, but rather by the setup process that precedes it. This realization was a central motivation for HTTP/1.1, which addressed connection–establishment time on subsequent hypertext transfer protocol (HTTP) requests [1] and for proposals such as prefix caching of multimedia streams [2]. Even users enjoying persistent HTTP and high-bandwidth connectivity, however, are still frequently afflicted with annoying long setup waits. We propose natural techniques to address dominating latency causes that precede document transfer.
Background: Communication between Web clients and servers uses the HTTP, which in turn utilizes transmission control protocol (TCP) as the de facto underlying reliable transport protocol. A TCP connection needs to be established and acknowledged prior to transporting HTTP messages. To facilitate connection–establishment, the Web server's host-name representation is translated to a numeric Internet protocol (IP) address. This translation is done by querying a domain name system (DNS) server that may consult a hierarchy of DNS servers. Determining factors of the user-perceived latency are name-to-address resolution, TCP connection–establishment time, HTTP request–response time, server processing, and finally, transmission time.
Web browsing sessions typically consist of many HTTP requests, each for a small-size document. Practice with HTTP/1.0 was to use a separate TCP connection for each HTTP request and response [3]. Hence, incurring connection–establishment and slow start1 latencies on each request [4]. Persistent connections [1] address that by reusing a single long-lived TCP connection for multiple HTTP requests. Persistent connections became a default with HTTP/1.1 [5], which gets increasingly deployed. Deployment of HTTP/1.1 reduces the latency incurred in subsequent requests to a server utilizing an existing connection, but longer perceived latency is still incurred when a request necessitates establishment of a new connection.
Document caching and prefetching are well studied techniques for latency reduction. Caching documents at browsers and proxies is an effective way to reduce latency on requests made to cache-able previously accessed documents. Studies suggest that due to the presence of cookies, CGI scripts, and limited locality of reference, caching is applicable to only about 30–50% of requests [6], [7], [8]. Furthermore, caching is also limited by copyright issues. To avoid serving stale content, cached resources are often validated by contacting the server (e.g., through an If-Modified-Since GET request). Hence, caching eliminates transmission time, but often, and unless pre-validation is used [9], [10], still incurs considerable latency. Document prefetching reduces latency by predicting requests and initiating document transfer prior to an actual request. The effectiveness of document prefetching is limited by accuracy of predictions and the availability of lead time and bandwidth [6], [10], [11]. The use of document prefetching is controversial due to its extensive overhead on network bandwidth.
Our contribution: We systematically measure several latency factors and study their sensitivity to reference locality. We propose techniques that address significant latency factors by prefetching the means to document transfer (rather than the document itself). Finally, we conduct performance evaluation of our techniques by replaying actual users request sequences. Next we overview conclusions from our measurements, our proposed techniques, and their performance evaluation.
(1) Measurements. We used a list of about 13,000 Web servers extracted from a proxy log. The measurements were conducted from several locations in order to ensure they are not skewed by unrepresentative local phenomena. Below we summarize our findings.
- •
DNS lookups: DNS lookup time (name-to-address translation) exceeded 3 seconds for over 10% of servers. Lookup time is highly dependent on reference locality to the server due to caching of query results at name servers.
- •
Cold- and warm-server state: We observed that request–response times of start-of-session HTTP requests are on average significantly longer than of subsequent requests (even when each request utilizes a separate TCP connections). Presumed start-of-session HTTP request–response time exceeded 1 second for over 10% of servers and exceeded 4 seconds for over 6% of servers. These fractions dropped by more than half for subsequent requests.
- •
Connection–establishment time: In agreement with many previous studies (e.g., [1], [8]), we observed that TCP connection–establishment time is significant relative to HTTP request–response times.
- •
Cold and warm route states: As we explain later in the paper the first IP datagram traversing a path to a destination (when the route is cold) is more likely to travel for a longer time period than subsequent datagrams (when the route is warm). This effect is visible on consecutive TCP connection–establishments times, but our study concludes that is not as significant a contributor to long latency as other factors.
Furthermore, our study shows that DNS lookup time, connection–establishment time, and HTTP request–response time all exhibit heavy-tail behavior (i.e. the average is considerably larger than the median).
(2) Solutions. We propose the following techniques to address the three latency factors described above.
- •
Pre-resolving: Browser or proxy performs DNS lookup before a request to the server is issued thereby eliminating DNS query time from user-perceived latency.
- •
Pre-connecting: Browser or proxy establishes a TCP connection to a server prior to the user's request. Pre-connecting addresses connection–establishment time on the first request utilizing a persistent connection.
- •
Pre-warming: Browser or proxy sends a “dummy” HTTP HEAD2 request to the server prior to the actual request. Pre-warming addresses start-of-session latency at the server.
We refer to the three techniques combined as pre-transfer prefetching.
Pre-connecting complements persistent HTTP and caching of TCP connections [8], [12]: it addresses wait due to connection–establishment incurred on the first request utilizing a connection. Pre-transfer prefetching techniques also complement caching of documents since (1) they are more beneficial for requests to servers which were not recently accessed and (2) their effectiveness does not depend on the URL being a cache-able document. They complement document prefetching by providing a range of overhead/benefit tradeoffs and utilizing less bandwidth. Similar to document prefetching, our pre-transfer techniques require a scheme to predict servers that are likely to be accessed.
(3) Performance. We evaluated the effectiveness of these techniques on two classes of requests across two locations.
The first class of requests was search engines referrals, namely follow-up requests on results returned by search engines or Web portals (directories). We choose to focus on these requests since search engines and Web portals sites typically invest considerable effort to provide low-latency high-quality service. Performance on follow-up requests is crucial in the overall perception of performance, however, we observed that follow-up requests are subjected to considerably longer latencies than average HTTP requests. Perceived latency on AltaVista [13] referrals exceeded 1 second for 22% of requests and exceeded 4 seconds for 12% of requests. With pre-resolving, pre-connecting, and pre-warming (with pre-connecting) applied, latency exceeded 1 second only for 10%, 6%, and 4% of requests, respectively. Latency exceeded 4 seconds for only 4%, 3%, and 2% of requests, respectively, yielding a dramatic decrease in longer wait times.
The second class of requests was considerably wider, containing all requests that were not preceded by a request to the server in the last 60 seconds. Latency exceeded 1 second on 14% of requests and exceeded 4 seconds on 7% of requests. With pre-resolving, pre-connecting, and pre-warming, latency exceeded 1 second only on 9%, 5%, and 3% of requests, respectively. Latency exceeded 4 seconds on 4%, 2.5%, and 1% of requests, respectively. This demonstrated significant potential improvement in performance.
Outline: The remainder of this paper consists of six sections. Section 2 describes our data and how we performed latency measurements. Section 3 describes our measurements of the different latency factors. Section 4 discusses pre-resolving, pre-connecting, and pre-warming and their deployment. Section 5 lists possible overheads and suggests ways to address them. Section 6 presents the performance evaluation. We conclude in Section 7 and propose directions for further research.
Section snippets
Data
Our user activity data was a log of the AT&T Research proxy server which is described in detail in [14]. The log provided, for each HTTP request, the time of the request, the user's (hashed) IP address, the requested URL and Web server, and the referring URL. The log contained 1.1 million requests issued between the 8th and the 25th of November 1996 by 463 different users (IP addresses). Requests were issued to 17,000 different Web servers, and for 521,000 different URLs. Our use of the log was
Study of latency factors
We measured the different latency factors and study the effect of reference locality on them. A measurement was taken for each server in a group and results were aggregated. We aggregated the measurements across the complete set of servers ocurring in the AT&T research proxy log described in Section 2, and across the subsets of servers accessed as a result of a referral of a particular search engine or Web portal. The dominating latency factors were the same for both sets of servers so we show
Combating latency
We propose three techniques to address the major latency factors identified in Section 3. In essence, our techniques perform the setup work associated with an HTTP request–response prior to the time the user issues the request. All our proposed techniques are applicable to servers rather than documents, and require minimal bandwidth. The general paradigm is to apply each technique to a set of servers with above-threshold likelihood to be accessed within the effective period of the technique.
Scalable deployment
Our pre-connecting techniques have minimal bandwidth overhead but they do impose other overheads: pre-resolving increases the number of DNS queries, pre-connecting increases the number of connection–establishments as well as the number of idle TCP connections, and pre-warming results in additional work for the HTTP application. Whereas the benefit of these techniques is to the end users the cost incurred mainly in the network and in the servers. This discrepancy between location of cost and
Performance evaluation
We compared perceived latency with and without applications of each of our techniques. We used the request sequence in the AT&T proxy log (see Section 2), and subsets of it consisting of follow-up requests after queries to search engines or Web portals. Our motivation for considering these subsets is described in detail in Section 1.
Conclusion and future research
Due to prevailing long user-perceived latency, the Web is pejoratively dubbed the “Word Wide Wait.” We observe that to a large extent, long waits experienced by users with high-bandwidth connectivity are dominated by the setup process preceding the actual transmission of contents. We propose pre-transfer prefetching techniques as a direct solution, and demonstrated their potential for significant decrease in long wait times. We view the deployment of pre-transfer prefetching as a natural
Acknowledgements
The pre-connecting technique was conceived in a joint discussion with Uri Zwick. We thank Jeff Mogul and Rob Calderbank for their comments on an early version of this manuscript. We are grateful to Menashe Cohen and Yishay Mansour for many good suggestions and pointers. We also thank Yehuda Afek, Ramon Caceres, Dan Duchamp, Sam Dworetsky, David Johnson, Balachander Krishnamurthy, Ben Lee, and Jennifer Rexford, for interesting discussions on our ideas and their implications.
Edith Cohen is a Researcher at AT&T Labs-Research. She did her Undergraduate and Masters studies at Tel-Aviv University, and received her Ph.D. in Computer Science from Stanford University in 1991. She joined Bell Laboratories in 1991. During 1997, she was in UC Berkeley as a Visiting Professor. Her research interests include design and analysis of algorithms, combinatorial optimization, Web performance, networking, and data mining.
References (32)
The case for persistent-connection HTTP
Computer Communication Review
(1995)- S. Sen, J. Rexford, D. Towsley, Proxy prefix caching for multimedia streams, in: Proceedings of the IEEE INFOCOM'99...
- T. Berners-Lee, R. Fielding, H. Frystyk, RFC 1945: hypertext transfer protocol, HTTP/1.0, May...
- H. Frystyk Nielsen, J. Gettys, A. Baird-Smith, E. Prud'hommeaux, H.W. Lie, C. Lilley, Network performance effects of...
- R. Fielding, J. Gettys, J. Mogul, H. Nielsen, L. Masinter, P. Leach, T. Berners-Lee, RFC 2616: hypertext transfer...
- et al.
Exploring the bounds of web latency reduction from caching and prefetching
- F. Douglis, A. Feldmann, B. Krishnamurthy, J. Mogul, Rate of change and other metrics: a live study of the world wide...
- A. Feldmann, R. Cáceres, F. Douglis, G. Glass, M. Rabinovich, Performance of Web proxy caching in heterogeneous...
- B. Krishnamurthy, C.E. Wills, Study of piggyback cache validation for proxy caches in the world wide web, in:...
- E. Cohen, B. Krishnamurthy, J. Rexford, Improving end-to-end performance of the Web using server volumes and proxy...
Cited by (26)
Understanding the latency to visit websites in China: An infrastructure perspective
2020, Computer NetworksCitation Excerpt :Internet latency has been a research focus for a long time because it has a significant influence on user experience of networking applications. As early as about twenty years ago, researchers had conducted measurements to understand the sources of latency in downloading web pages [11–14]. In [11], the authors present several sources, i.e., DNS, TCP, the Web server, and the network links and routers, and conclude that the bottleneck in accessing pages is due to the Internet latency and TCP mechanism instead of servers.
Gaze dependant prefetching of web content to increase speed and comfort of web browsing
2015, International Journal of Human Computer StudiesCitation Excerpt :The bulk of this time, authors concluded, was the round-trip delay, with the delay at the server constituting only a tiny portion of the total amount of time delay. Cohen and Kaplan (2002) identified user׳s perceived latency as the central performance problem in web browsing. Authors also systematically measured factors contributing to this latency.
A taxonomy of web prediction algorithms
2012, Expert Systems with ApplicationsWeb server performance analysis using histogram workload models
2009, Computer NetworksA user-focused evaluation of web prefetching algorithms
2007, Computer CommunicationsPiggybacking related domain names to improve DNS performance
2006, Computer NetworksCitation Excerpt :The two renewal approaches only reduce PS misses because they can only renew entries that have been seen before. The pre-resolving approach of [4] can reduce both FS and PS misses, but will only do so based on the immediate needs of the application. The PRN approach not only reduces FS misses based on server knowledge, but if those entries are already cached, it can be used to restore these entries to their full TTL duration, thus reducing PS misses.
Edith Cohen is a Researcher at AT&T Labs-Research. She did her Undergraduate and Masters studies at Tel-Aviv University, and received her Ph.D. in Computer Science from Stanford University in 1991. She joined Bell Laboratories in 1991. During 1997, she was in UC Berkeley as a Visiting Professor. Her research interests include design and analysis of algorithms, combinatorial optimization, Web performance, networking, and data mining.
Haim Kaplan received his Ph.D. degree from Princeton University at 1997. He was a member of technical stuff at AT&T research from 1996 to 1999. Since 1999 he is an Assistant Professor in the School of Computer Science at Tel-Aviv University. His research interests are design and analysis of algorithms and data structures.