skip to main content
10.1145/1323548.1323562acmconferencesArticle/Chapter ViewAbstractPublication PagesancsConference Proceedingsconference-collections
research-article

Performance scalability of a multi-core web server

Published: 03 December 2007 Publication History

Abstract

Today's large multi-core Internet servers support thousands of concurrent connections or ows. The computation ability of future server platforms will depend on increasing numbers of cores. The key to ensure that performance scales with cores is to ensure that systems software and hardware are designed to fully exploit the parallelism that is inherent in independent network ows. This paper identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform. However, performance scaling on commercial web servers has proven elusive. We determined that on web server running a modified SPEC-web2005 Support workload, throughput scales only 4.8 x on eight cores. Our results show that the operating system, TCP/IP stack, and application exploited ow-level parallelism well with few exceptions, and that load imbalance and shared cache affected performance little. Having eliminated these potential bottlenecks, we determined that performance scaling was limited by the capacity of the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores.

References

[1]
CAIDA. Workload Characterization: Application Cross-Section, 2006. http://www.caida.org/analysis/workload/.
[2]
J. Chase, G. Gallatin, and K. Yocum. End-system optimizations for high-speed TCP. 39(4), 2001.
[3]
S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In SPECTS, 2004.
[4]
D. Clark, V. Jacobson, J. Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communications Magazine, 27(6), 1989.
[5]
A. Foong, J. Fung, D. Newell, A. Lopez-Estrada, S. Abraham, and P. Irelan. Architectural characterization of processor affinity in network processing. In ISPASS. IEEE, 2005.
[6]
R. Hariharan and N. Sun. Workload characterization of SPECweb2005. In SPEC Benchmark Workshop. SPEC, 2006.
[7]
Intel Corporation. Receive Side Scaling on Intel Network Adapters. http://support.intel.com/support/network/adapter/pro100/sb/CS-027574.htm.
[8]
R. Iyer. Characterization and evaluation of cache hierarchies for web servers. World Wide Web, 7(3):259--280, Sept. 2004.
[9]
L. Kencl and J.-Y. L. Boudec. Adaptive load sharing for network processors. In INFOCOM, volume 2, pages 545--554. IEEE, 2002.
[10]
É. Lemoine, C. Pham, and L. Lefèvre. Packet classification in the NIC for improved SMP-based Internet servers. In ICN. IEEE, 2004.
[11]
Linux Kernel. Linux IP Sysctl Documentation. Documentation/networking/ip-sysctl.txt.
[12]
C. MacCárthaigh. Scaling Apache 2.x beyond 20,000 concurrent downloads. In ApacheCon EU, July 2005.
[13]
M. Martin, P. Harper, D. Sorin, M. Hill, and D. Wood. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In ISCA, pages 206--217, 2003.
[14]
Microsoft Corporation. Scalable Networking with RSS, Apr. 2005.
[15]
D. Miller. How the Linux TCP Output Engine Works. http://vger.kernel.org/¿davem/tcp output.html.
[16]
I. Molnar. Goals, Design and Implementation of the New Ultra-Scalable O(1) Scheduler. Linux Kernel, Apr. 2002. Documentation/sched-design.txt.
[17]
J. Salehi, J. Kurose, and D. Towsley. The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version). Transactions on Networking, 4(4):516--530, Aug. 1996.
[18]
W. Shi and L. Kencl. Sequence-preserving adaptive load balancers. In ANCS, pages 143--152. IEEE/ACM, Dec. 2006.
[19]
W. Shi, M. MacGregor, and P. Gburzynski. Load balancing for parallel forwarding. Transactions on Networking, 13(4):790--801, Aug. 2005.
[20]
SPEC. SPECweb2005 Release 1.10 Benchmark Design Document, Apr. 2006.
[21]
SPEC. SPECweb2005 Result for the HP ProLiant DL380 G5, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00066.html.
[22]
SPEC. SPECweb2005 Result for the HP ProLiant DL385 G2, 2007. http://www.spec.org/web2005/results/res2007q3/web2005-20070828-00079.html.
[23]
SPEC. SPECweb2005 Result for the HP ProLiant DL580 G5, 2007. http://www.spec.org/web2005/results/res2007q3/web2005-20070828-00077.html.
[24]
SPEC. SPECweb2005 Result for the HP ProLiant DL585 G2, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00067.html.
[25]
SPEC. SPECweb2005 Result for the HP ProLiant ML360 G5, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00068.html.
[26]
S. Tripathi. FireEngine--a new networking architecture for the Solaris operating system. Whitepaper, Sun Microsystems, Nov. 2004.
[27]
V. Viswanathan. Intel front side bus architecture. Intel Software College course, 2006.
[28]
J. Walker. Pseudorandom Number Sequence Test Program. Fourmilab, Oct. 1998.
[29]
P. Willman, S. Rixner, and A. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In USENIX, pages 91--96, 2006.
[30]
W. Zhang and W. Zhang. Linux virtual server clusters. Linux Magazine, 5(11), 2003.

Cited By

View all
  • (2024)HydroRTCEnvironmental Modelling & Software10.1016/j.envsoft.2024.106068178:COnline publication date: 1-Jul-2024
  • (2023)Formal modelling and verification of scalable service composition in IoT environmentService Oriented Computing and Applications10.1007/s11761-023-00363-x17:3(213-231)Online publication date: 18-May-2023
  • (2019)The Impact of Event Processing Flow on Asynchronous Server EfficiencyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2938500(1-1)Online publication date: 2019
  • Show More Cited By

Recommendations

Reviews

Carlos Juiz

The Internet provides a computing scenario where clients communicate with Web servers through mutually independent connections. If Internet server application processing and the associated protocol processing of a connection (flow) are done exclusively on a single central processing unit (CPU) core, minimal data sharing and synchronization between flows is expected. The computation ability of future servers will depend on increasing the number of cores. This interesting paper "identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform." To test their hypothesis, the authors "set up [a] test server running a well-tuned Apache HTTP server and [the] Linux operating system. The server had eight cores with pairs of cores sharing L2 cache." The experiments show that the test server, running a modified SPECweb2005 Support workload, achieved only a 4.8 times speedup in throughput, compared to the ideal eight times-official SPECweb2005 results show similar scaling problems. This work provides "insights on the key causes of poor scalability of a Web server," and also provides "the analysis methodology leading to these insights." This latter feature makes the paper more interesting than the findings themselves, since the main bottleneck of the multicore server is the bus, and the snoopy protocol for sharing it. The authors determined that the main cause of poor scaling is the capacity of the bus. They confirmed that the address bus reached 77 percent utilization on eight cores, which is considered fully saturated. Other results showed that the number of cache misses remained nearly constant per byte as the number of cores increased, and that shared cache between cores on the same bus had little effect on performance. However, profiling revealed some scalability obstacles in software. "Increasing hash table capacities and reducing dependence on linked lists," as workload increases, should fix these scalability problems. "In the kernel, flow-level parallelism broke down in the file-system directory cache," which was widely shared. The authors propose that "a possible workaround would be to maintain alternate directory trees for each core." In conclusion, the remaining problem in scaling performance with the number of cores is address bus capacity. As stated, "directories (and directory caches) can be used to replace snoopy cache coherence," with paying the price of additional cost and latency. Further studies should be addressed to verify this last hypothesis for real workloads Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
December 2007
212 pages
ISBN:9781595939456
DOI:10.1145/1323548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache hierarchies
  2. load balancing
  3. network protocol stacks
  4. networks
  5. parallelism
  6. scalability
  7. web servers

Qualifiers

  • Research-article

Conference

ANCS07

Acceptance Rates

ANCS '07 Paper Acceptance Rate 20 of 70 submissions, 29%;
Overall Acceptance Rate 88 of 314 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)8
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HydroRTCEnvironmental Modelling & Software10.1016/j.envsoft.2024.106068178:COnline publication date: 1-Jul-2024
  • (2023)Formal modelling and verification of scalable service composition in IoT environmentService Oriented Computing and Applications10.1007/s11761-023-00363-x17:3(213-231)Online publication date: 18-May-2023
  • (2019)The Impact of Event Processing Flow on Asynchronous Server EfficiencyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2938500(1-1)Online publication date: 2019
  • (2018)Research Challenges for Network Function Virtualization - Re-Architecting Middlebox for High Performance and Efficient, Elastic and Resilient Platform to Create New Services -IEICE Transactions on Communications10.1587/transcom.2017EBI0001E101.B:1(96-122)Online publication date: 2018
  • (2018)Catching the Flow with Locality Sensitive Hashing in Programmable Data Planes2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS)10.1109/ICSESS.2018.8663861(1-5)Online publication date: Nov-2018
  • (2017)Real-time security services for SDN-based datacenters2017 13th International Conference on Network and Service Management (CNSM)10.23919/CNSM.2017.8256030(1-9)Online publication date: Nov-2017
  • (2017)Towards optimal adaptation of NFV packet processing to modern CPU memory architecturesProceedings of the 2nd Workshop on Cloud-Assisted Networking10.1145/3155921.3158429(7-12)Online publication date: 11-Dec-2017
  • (2017)A Performance Survey of Lightweight Virtualization TechniquesService-Oriented and Cloud Computing10.1007/978-3-319-67262-5_3(34-48)Online publication date: 1-Sep-2017
  • (2016)Performance evaluation of multi-core, multi-threaded SIP proxy servers (SPS)2016 IEEE International Conference on Communications (ICC)10.1109/ICC.2016.7511427(1-6)Online publication date: May-2016
  • (2015)Receive CPU Selection FrameworkProceedings of the 2015 12th Web Information System and Application Conference (WISA)10.1109/WISA.2015.71(334-339)Online publication date: 11-Sep-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media