NetCache architecture and deployment

doi:10.1016/S0169-7552(98)00250-5

Computer Networks and ISDN Systems

Volume 30, Issues 22–23, 25 November 1998, Pages 2081-2091

Computer Networks an...

https://doi.org/10.1016/S0169-7552(98)00250-5 Get rights and content

Abstract

This paper describes the architecture of Network Appliance's NetCache proxy cache. It discusses sizing proxy caches, contrasts the advantages and disadvantages of transparent caching, and reviews mechanisms to provide highly available caches. It also reports cache hit rates and summarizes our experience deploying proxy caching at Internet Service Providers (ISP) and corporate firewalls and intranets.

Introduction

When properly deployed, proxy caching reduces network bandwidth consumption and improves perceived network quality of service. However, when improperly deployed or inadequately sized, proxy caches degrade performance, need constant maintenance, and irritate users.

This paper summarizes four years of experience building and deploying proxy caches. NetCache springs from the Harvest cache project, which I led (the Harvest project still continues in the public domain under the moniker `Squid') [1]. What attracted us to Web caches was our research that predicted that a hierarchical FTP cache would reduce Internet backbone traffic by 35% [2]. We proposed a hierarchical Web caching protocol that became known as the Inter-Cache-Protocol [3]. ICP lets cooperating caches robustly detect failure and recover from it quickly. As a team, we resolved to make the Web scale more efficiently than the Domain Name System, which consumes 20 times more bandwidth than it really needs [4]. We developed the Harvest cache in the public domain, and cooperative Web caching swept through Europe, Asia, Latin America, and the Pacific.

To fill the need for a scalable, commercially supported, highly available Web cache, we developed NetCache. The NetCache software versions run on UNIX and NT; the NetCache Appliance runs on Network Appliance's own Data ONTAP microkernel [5]. All versions of NetCache are built from a common source tree so their features are nearly identical. The high-end NetCache Appliance is roughly four times faster than the NetCache software running on a 2-processor, 300 MHz, Ultra-SPARC. The NetCache Appliance sits on top of Network Appliance's WAFL file system [6]. The appliance achieves superior performance because: (1) WAFL achieves 2–3 times more disk operations per second than traditional file systems, (2) because the ONTAP microkernel eliminates most of the data copying between the file system and network stack, and (3) because the appliance's event handling efficiently demultiplexes network connections. The NetCache Appliance survives disk failure transparently (without losing its configuration files, its log files, or even its cached contents) because it is RAID-based.

We now have two years experience supporting NetCache for mission critical customers. As of December 1997, our biggest ISP customer had 500,000 dial-up customers and more than a hundred dial-in POPs. Our biggest enterprise customer had 100,000 desktop computer systems with browsers. Fig. 1 shows cache sizes and hit rates of an assortment of NetCache ISP and enterprise customers. Cache sizes range from 6 to 28 GB and WAN bandwidths range from multiple T1 to T3. Bandwidth savings at these sites average 35%.

During the past two years, hit rates have remained stable, despite the proliferation of dynamic Web content. The explanation for this is simple. Fig. 2 shows that 70% of Web traffic consists of graphic URLs and software downloads. Even if all `html' URLs were dynamic and non-cacheable, 80–90% of the Web's bytes would remain cacheable. In the coming year, as HTTP-1.1 gets deployed, we anticipate higher hit rates as Web servers begin exploiting cache-control headers.

After reviewing NetCache's architecture, we discuss how to scale NetCache to arbitrary WAN bandwidths and derive rules-of-thumb for sizing single nodes. Finally, we discuss the advantages and drawbacks of transparent caching.

Section snippets

Architecture

Fig. 3 illustrates NetCache's architecture. Functionally, NetCache consists of: separate state machines to fetch WWW, FTP, and Gopher pages from their respective servers, state machines to tunnel HTTPS conversations, state machines to parse HTTP-1.1 requests [7], and state machines to map objects from and to disk. These state machines are driven by network, disk, and timeout events. Each state machine is uniquely bound to a single client, remote server, or disk file.

From a programmer's

Scalable caching

You can make a Web cache scale by partitioning browser workload and by partitioning and aggregating cache-to-cache workload hierarchically.

Sizing individual caches

Sizing a Web cache's memory, disk, and CPU resources depends on its workload and WAN bandwidth. Below, we derive rules-of-thumb for scaling, based on conservative assumptions. Deployments scaled to these rules should be stable through one doubling of the applied workload.

Disk. Let's assume that the average size of a cacheable URL is 8 kB. (The real median is a bit smaller, and the real average is usually somewhat higher than this). We use 8 kB to be conservative. A 1-Mbit/s link can carry

Deploying caching

After an ISP deploys proxy caching, how does it train its users to configure caching? The most flexible way is to deploy new browser releases that are pre-configured with a proxy auto-configuration URL. After two or three browser releases, the majority of frequent users will be cache-ready. After caching is deployed, edit proxy.pac to partition URLs across the caches. Proxy auto-configuration maps URLs to specific proxy servers and identifies which URLs should be fetched directly.

Even ISPs with

Summary

NetCache software serves 100–200 URLs/s on the same hardware that public domain servers such as Squid can serve 25–50 URLs/s. The NetCache Appliance, because of its close integration with the file system and Data ONTAP microkernel, achieves four times the performance of the NetCache software, peaking at about 500 URLs/s.[15]

References (15)

A. Chankhunthod, P. Danzig et al., A hierarchical Internet object cache, in: Proc. 1996 USENIX Annual Technical Conf.,...
P. Danzig, M. Schwartz, R. Hall, A case for caching file objects inside internetworks, in: Proc. 1993 ACM...
K. Claffy, D. Wessels, Internet Cache Protocol (ICP), version 2 (RFC 2186), September...
P. Danzig, K. Obraczka, A. Kumar, An analysis of wide-area name server traffic: a study of the domain name system, in:...
A. Watson, Multiprotocol data access: NFS, CIFS, and HTTP (TR-3014), Network Appliance, Mountain View, CA,...
D. Hitz, J. Lau, M. Malcolm, File system design for a file server appliance, in: Proc. 1994 Winter USENIX Technical...
R. Fielding, J. Gettys, J.C. Mogul, H. Frystyk, T. Berners-Lee, Hypertext Transfer Protocol: HTTP/1.1, November 21,...

There are more references available in the full text version of this article.

Cited by (8)

Cache Me if You Can: Capacitated Selfish Replication Games in Networks
2020, Theory of Computing Systems
Computer Networks: A Systems Approach
2011, Computer Networks: A Systems Approach
Computer Networks: A Systems Approach, Fifth Edition
2011, Computer Networks: a Systems Approach, Fifth Edition
Bandwidth requirement of links in a hierarchical caching network: A graph-based formulation, an algorithm and its performance evaluation
2007, International Journal of Computers and Applications
Design and implementation of web cache-cluster system
2005, Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
Selfish caching in distributed systems: A game-theoretic analysis
2004, Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

View all citing articles on Scopus

Peter Danzig is the chief architect of Internet products at Network Appliance. Peter lead the Harvest Web cache project from 1993 to 1995. In 1996, Peter formed Internet Middleware Corporation (IMC), the first commercial company aimed exclusively at building carrier-class Web caches. Network Appliance purchased Peter's company in 1997, and today, more than two dozen National Telecomms have standardized their Web cache deployments around Peter's products. Peter is an associate professor at the University of Southern California, and has authored many research papers on Internet information systems, traffic modeling, and flow and congestion control. He is a winner of the NYI award from the National Science foundation and innovative teaching ward from USC.

View full text