Computer Networks and ISDN Systems
NetCache architecture and deployment
Introduction
When properly deployed, proxy caching reduces network bandwidth consumption and improves perceived network quality of service. However, when improperly deployed or inadequately sized, proxy caches degrade performance, need constant maintenance, and irritate users.
This paper summarizes four years of experience building and deploying proxy caches. NetCache springs from the Harvest cache project, which I led (the Harvest project still continues in the public domain under the moniker `Squid') [1]. What attracted us to Web caches was our research that predicted that a hierarchical FTP cache would reduce Internet backbone traffic by 35% [2]. We proposed a hierarchical Web caching protocol that became known as the Inter-Cache-Protocol [3]. ICP lets cooperating caches robustly detect failure and recover from it quickly. As a team, we resolved to make the Web scale more efficiently than the Domain Name System, which consumes 20 times more bandwidth than it really needs [4]. We developed the Harvest cache in the public domain, and cooperative Web caching swept through Europe, Asia, Latin America, and the Pacific.
To fill the need for a scalable, commercially supported, highly available Web cache, we developed NetCache. The NetCache software versions run on UNIX and NT; the NetCache Appliance runs on Network Appliance's own Data ONTAP microkernel [5]. All versions of NetCache are built from a common source tree so their features are nearly identical. The high-end NetCache Appliance is roughly four times faster than the NetCache software running on a 2-processor, 300 MHz, Ultra-SPARC. The NetCache Appliance sits on top of Network Appliance's WAFL file system [6]. The appliance achieves superior performance because: (1) WAFL achieves 2–3 times more disk operations per second than traditional file systems, (2) because the ONTAP microkernel eliminates most of the data copying between the file system and network stack, and (3) because the appliance's event handling efficiently demultiplexes network connections. The NetCache Appliance survives disk failure transparently (without losing its configuration files, its log files, or even its cached contents) because it is RAID-based.
We now have two years experience supporting NetCache for mission critical customers. As of December 1997, our biggest ISP customer had 500,000 dial-up customers and more than a hundred dial-in POPs. Our biggest enterprise customer had 100,000 desktop computer systems with browsers. Fig. 1 shows cache sizes and hit rates of an assortment of NetCache ISP and enterprise customers. Cache sizes range from 6 to 28 GB and WAN bandwidths range from multiple T1 to T3. Bandwidth savings at these sites average 35%.
During the past two years, hit rates have remained stable, despite the proliferation of dynamic Web content. The explanation for this is simple. Fig. 2 shows that 70% of Web traffic consists of graphic URLs and software downloads. Even if all `html' URLs were dynamic and non-cacheable, 80–90% of the Web's bytes would remain cacheable. In the coming year, as HTTP-1.1 gets deployed, we anticipate higher hit rates as Web servers begin exploiting cache-control headers.
After reviewing NetCache's architecture, we discuss how to scale NetCache to arbitrary WAN bandwidths and derive rules-of-thumb for sizing single nodes. Finally, we discuss the advantages and drawbacks of transparent caching.
Section snippets
Architecture
Fig. 3 illustrates NetCache's architecture. Functionally, NetCache consists of: separate state machines to fetch WWW, FTP, and Gopher pages from their respective servers, state machines to tunnel HTTPS conversations, state machines to parse HTTP-1.1 requests [7], and state machines to map objects from and to disk. These state machines are driven by network, disk, and timeout events. Each state machine is uniquely bound to a single client, remote server, or disk file.
From a programmer's
Scalable caching
You can make a Web cache scale by partitioning browser workload and by partitioning and aggregating cache-to-cache workload hierarchically.
Sizing individual caches
Sizing a Web cache's memory, disk, and CPU resources depends on its workload and WAN bandwidth. Below, we derive rules-of-thumb for scaling, based on conservative assumptions. Deployments scaled to these rules should be stable through one doubling of the applied workload.
Disk. Let's assume that the average size of a cacheable URL is 8 kB. (The real median is a bit smaller, and the real average is usually somewhat higher than this). We use 8 kB to be conservative. A 1-Mbit/s link can carry
Deploying caching
After an ISP deploys proxy caching, how does it train its users to configure caching? The most flexible way is to deploy new browser releases that are pre-configured with a proxy auto-configuration URL. After two or three browser releases, the majority of frequent users will be cache-ready. After caching is deployed, edit proxy.pac to partition URLs across the caches. Proxy auto-configuration maps URLs to specific proxy servers and identifies which URLs should be fetched directly.
Even ISPs with
Summary
NetCache software serves 100–200 URLs/s on the same hardware that public domain servers such as Squid can serve 25–50 URLs/s. The NetCache Appliance, because of its close integration with the file system and Data ONTAP microkernel, achieves four times the performance of the NetCache software, peaking at about 500 URLs/s.[15]
Peter Danzig is the chief architect of Internet products at Network Appliance. Peter lead the Harvest Web cache project from 1993 to 1995. In 1996, Peter formed Internet Middleware Corporation (IMC), the first commercial company aimed exclusively at building carrier-class Web caches. Network Appliance purchased Peter's company in 1997, and today, more than two dozen National Telecomms have standardized their Web cache deployments around Peter's products. Peter is an associate professor at the
References (15)
- A. Chankhunthod, P. Danzig et al., A hierarchical Internet object cache, in: Proc. 1996 USENIX Annual Technical Conf.,...
- P. Danzig, M. Schwartz, R. Hall, A case for caching file objects inside internetworks, in: Proc. 1993 ACM...
- K. Claffy, D. Wessels, Internet Cache Protocol (ICP), version 2 (RFC 2186), September...
- P. Danzig, K. Obraczka, A. Kumar, An analysis of wide-area name server traffic: a study of the domain name system, in:...
- A. Watson, Multiprotocol data access: NFS, CIFS, and HTTP (TR-3014), Network Appliance, Mountain View, CA,...
- D. Hitz, J. Lau, M. Malcolm, File system design for a file server appliance, in: Proc. 1994 Winter USENIX Technical...
- R. Fielding, J. Gettys, J.C. Mogul, H. Frystyk, T. Berners-Lee, Hypertext Transfer Protocol: HTTP/1.1, November 21,...
Cited by (8)
Cache Me if You Can: Capacitated Selfish Replication Games in Networks
2020, Theory of Computing SystemsComputer Networks: A Systems Approach
2011, Computer Networks: A Systems ApproachComputer Networks: A Systems Approach, Fifth Edition
2011, Computer Networks: a Systems Approach, Fifth EditionBandwidth requirement of links in a hierarchical caching network: A graph-based formulation, an algorithm and its performance evaluation
2007, International Journal of Computers and ApplicationsDesign and implementation of web cache-cluster system
2005, Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of TechnologySelfish caching in distributed systems: A game-theoretic analysis
2004, Proceedings of the Annual ACM Symposium on Principles of Distributed Computing
Peter Danzig is the chief architect of Internet products at Network Appliance. Peter lead the Harvest Web cache project from 1993 to 1995. In 1996, Peter formed Internet Middleware Corporation (IMC), the first commercial company aimed exclusively at building carrier-class Web caches. Network Appliance purchased Peter's company in 1997, and today, more than two dozen National Telecomms have standardized their Web cache deployments around Peter's products. Peter is an associate professor at the University of Southern California, and has authored many research papers on Internet information systems, traffic modeling, and flow and congestion control. He is a winner of the NYI award from the National Science foundation and innovative teaching ward from USC.