research-article

Performance scalability of a multi-core web server

Authors:

Bryan Veal,

Annie FoongAuthors Info & Claims

ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

Pages 57 - 66

https://doi.org/10.1145/1323548.1323562

Published: 03 December 2007 Publication History

Get Access

Abstract

Today's large multi-core Internet servers support thousands of concurrent connections or ows. The computation ability of future server platforms will depend on increasing numbers of cores. The key to ensure that performance scales with cores is to ensure that systems software and hardware are designed to fully exploit the parallelism that is inherent in independent network ows. This paper identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform. However, performance scaling on commercial web servers has proven elusive. We determined that on web server running a modified SPEC-web2005 Support workload, throughput scales only 4.8 x on eight cores. Our results show that the operating system, TCP/IP stack, and application exploited ow-level parallelism well with few exceptions, and that load imbalance and shared cache affected performance little. Having eliminated these potential bottlenecks, we determined that performance scaling was limited by the capacity of the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores.

References

[1]

CAIDA. Workload Characterization: Application Cross-Section, 2006. http://www.caida.org/analysis/workload/.

Google Scholar

[2]

J. Chase, G. Gallatin, and K. Yocum. End-system optimizations for high-speed TCP. 39(4), 2001.

Digital Library

Google Scholar

[3]

S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In SPECTS, 2004.

Google Scholar

[4]

D. Clark, V. Jacobson, J. Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communications Magazine, 27(6), 1989.

Digital Library

Google Scholar

[5]

A. Foong, J. Fung, D. Newell, A. Lopez-Estrada, S. Abraham, and P. Irelan. Architectural characterization of processor affinity in network processing. In ISPASS. IEEE, 2005.

Digital Library

Google Scholar

[6]

R. Hariharan and N. Sun. Workload characterization of SPECweb2005. In SPEC Benchmark Workshop. SPEC, 2006.

Google Scholar

[7]

Intel Corporation. Receive Side Scaling on Intel Network Adapters. http://support.intel.com/support/network/adapter/pro100/sb/CS-027574.htm.

Google Scholar

[8]

R. Iyer. Characterization and evaluation of cache hierarchies for web servers. World Wide Web, 7(3):259--280, Sept. 2004.

Digital Library

Google Scholar

[9]

L. Kencl and J.-Y. L. Boudec. Adaptive load sharing for network processors. In INFOCOM, volume 2, pages 545--554. IEEE, 2002.

Crossref

Google Scholar

[10]

É. Lemoine, C. Pham, and L. Lefèvre. Packet classification in the NIC for improved SMP-based Internet servers. In ICN. IEEE, 2004.

Google Scholar

[11]

Linux Kernel. Linux IP Sysctl Documentation. Documentation/networking/ip-sysctl.txt.

Google Scholar

[12]

C. MacCárthaigh. Scaling Apache 2.x beyond 20,000 concurrent downloads. In ApacheCon EU, July 2005.

Google Scholar

[13]

M. Martin, P. Harper, D. Sorin, M. Hill, and D. Wood. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In ISCA, pages 206--217, 2003.

Digital Library

Google Scholar

[14]

Microsoft Corporation. Scalable Networking with RSS, Apr. 2005.

Google Scholar

[15]

D. Miller. How the Linux TCP Output Engine Works. http://vger.kernel.org/¿davem/tcp output.html.

Google Scholar

[16]

I. Molnar. Goals, Design and Implementation of the New Ultra-Scalable O(1) Scheduler. Linux Kernel, Apr. 2002. Documentation/sched-design.txt.

Google Scholar

[17]

J. Salehi, J. Kurose, and D. Towsley. The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version). Transactions on Networking, 4(4):516--530, Aug. 1996.

Digital Library

Google Scholar

[18]

W. Shi and L. Kencl. Sequence-preserving adaptive load balancers. In ANCS, pages 143--152. IEEE/ACM, Dec. 2006.

Digital Library

Google Scholar

[19]

W. Shi, M. MacGregor, and P. Gburzynski. Load balancing for parallel forwarding. Transactions on Networking, 13(4):790--801, Aug. 2005.

Digital Library

Google Scholar

[20]

SPEC. SPECweb2005 Release 1.10 Benchmark Design Document, Apr. 2006.

Google Scholar

[21]

SPEC. SPECweb2005 Result for the HP ProLiant DL380 G5, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00066.html.

Google Scholar

[22]

SPEC. SPECweb2005 Result for the HP ProLiant DL385 G2, 2007. http://www.spec.org/web2005/results/res2007q3/web2005-20070828-00079.html.

Google Scholar

[23]

SPEC. SPECweb2005 Result for the HP ProLiant DL580 G5, 2007. http://www.spec.org/web2005/results/res2007q3/web2005-20070828-00077.html.

Google Scholar

[24]

SPEC. SPECweb2005 Result for the HP ProLiant DL585 G2, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00067.html.

Google Scholar

[25]

SPEC. SPECweb2005 Result for the HP ProLiant ML360 G5, 2007. http://www.spec.org/web2005/results/res2007q2/web2005-20070507-00068.html.

Google Scholar

[26]

S. Tripathi. FireEngine--a new networking architecture for the Solaris operating system. Whitepaper, Sun Microsystems, Nov. 2004.

Google Scholar

[27]

V. Viswanathan. Intel front side bus architecture. Intel Software College course, 2006.

Google Scholar

[28]

J. Walker. Pseudorandom Number Sequence Test Program. Fourmilab, Oct. 1998.

Google Scholar

[29]

P. Willman, S. Rixner, and A. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In USENIX, pages 91--96, 2006.

Digital Library

Google Scholar

[30]

W. Zhang and W. Zhang. Linux virtual server clusters. Linux Magazine, 5(11), 2003.

Google Scholar

Cited By

View all

Erazo Ramirez CSermet YShahid MDemir I(2024)HydroRTCEnvironmental Modelling & Software10.1016/j.envsoft.2024.106068178:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.envsoft.2024.106068
Toman SHamel LToman ZGraiet MOuchani S(2023)Formal modelling and verification of scalable service composition in IoT environmentService Oriented Computing and Applications10.1007/s11761-023-00363-x17:3(213-231)Online publication date: 18-May-2023
https://doi.org/10.1007/s11761-023-00363-x
Zhang SWang QKanemasa YShan HHu L(2019)The Impact of Event Processing Flow on Asynchronous Server EfficiencyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2938500(1-1)Online publication date: 2019
https://doi.org/10.1109/TPDS.2019.2938500
Show More Cited By

Index Terms

Performance scalability of a multi-core web server

Recommendations

Improving the scalability of a multi-core web server
ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering

Improving the performance and scalability of Web servers enhances user experiences and reduces the costs of providing Web-based services. The advent of Multi-core technology motivates new studies to understand how efficiently Web servers utilize such ...
Comparing high-performance multi-core web-server architectures
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference

In this paper, we study how web-server architecture and implementation affect performance when trying to obtain high throughput on a 4-core system servicing static content. We focus on static content as a growing numbers of servers are dedicated to ...
Workshop on relaxing synchronization for multicore and manycore scalability (RACES 2012)
SPLASH '12: Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity

Massively-parallel systems are coming: core counts keep rising whether conventional cores as in multicore and manycore systems, or specialized cores as in GPUs. Conventional wisdom has been to utilize this parallelism by reducing synchronization to the ...

Reviews

Reviewer: Carlos Juiz

The Internet provides a computing scenario where clients communicate with Web servers through mutually independent connections. If Internet server application processing and the associated protocol processing of a connection (flow) are done exclusively on a single central processing unit (CPU) core, minimal data sharing and synchronization between flows is expected. The computation ability of future servers will depend on increasing the number of cores. This interesting paper "identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform." To test their hypothesis, the authors "set up [a] test server running a well-tuned Apache HTTP server and [the] Linux operating system. The server had eight cores with pairs of cores sharing L2 cache." The experiments show that the test server, running a modified SPECweb2005 Support workload, achieved only a 4.8 times speedup in throughput, compared to the ideal eight times-official SPECweb2005 results show similar scaling problems. This work provides "insights on the key causes of poor scalability of a Web server," and also provides "the analysis methodology leading to these insights." This latter feature makes the paper more interesting than the findings themselves, since the main bottleneck of the multicore server is the bus, and the snoopy protocol for sharing it. The authors determined that the main cause of poor scaling is the capacity of the bus. They confirmed that the address bus reached 77 percent utilization on eight cores, which is considered fully saturated. Other results showed that the number of cache misses remained nearly constant per byte as the number of cores increased, and that shared cache between cores on the same bus had little effect on performance. However, profiling revealed some scalability obstacles in software. "Increasing hash table capacities and reducing dependence on linked lists," as workload increases, should fix these scalability problems. "In the kernel, flow-level parallelism broke down in the file-system directory cache," which was widely shared. The authors propose that "a possible workaround would be to maintain alternate directory trees for each core." In conclusion, the remaining problem in scaling performance with the number of cores is address bus capacity. As stated, "directories (and directory caches) can be used to replace snoopy cache coherence," with paying the price of additional cost and latency. Further studies should be addressed to verify this last hypothesis for real workloads Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

December 2007

212 pages

ISBN:9781595939456

DOI:10.1145/1323548

General Chair:
Raj Yavatkar
Intel Corporation, USA
,
Program Chairs:
Dirk Grunwald
University of Colorado, USA
,
K. K. Ramakrishnan
AT&T Labs Research, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ANCS07

Sponsor:

ANCS07: Symposium on Architecture for Networking and Communications Systems

December 3 - 4, 2007

Florida, Orlando, USA

Acceptance Rates

ANCS '07 Paper Acceptance Rate 20 of 70 submissions, 29%;

Overall Acceptance Rate 88 of 314 submissions, 28%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
1,524
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Erazo Ramirez CSermet YShahid MDemir I(2024)HydroRTCEnvironmental Modelling & Software10.1016/j.envsoft.2024.106068178:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.envsoft.2024.106068
Toman SHamel LToman ZGraiet MOuchani S(2023)Formal modelling and verification of scalable service composition in IoT environmentService Oriented Computing and Applications10.1007/s11761-023-00363-x17:3(213-231)Online publication date: 18-May-2023
https://doi.org/10.1007/s11761-023-00363-x
Zhang SWang QKanemasa YShan HHu L(2019)The Impact of Event Processing Flow on Asynchronous Server EfficiencyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2938500(1-1)Online publication date: 2019
https://doi.org/10.1109/TPDS.2019.2938500
SHIOMOTO K(2018)Research Challenges for Network Function Virtualization - Re-Architecting Middlebox for High Performance and Efficient, Elastic and Resilient Platform to Create New Services -IEICE Transactions on Communications10.1587/transcom.2017EBI0001E101.B:1(96-122)Online publication date: 2018
https://doi.org/10.1587/transcom.2017EBI0001
Cao ZChen XSheng YNi H(2018)Catching the Flow with Locality Sensitive Hashing in Programmable Data Planes2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS)10.1109/ICSESS.2018.8663861(1-5)Online publication date: Nov-2018
https://doi.org/10.1109/ICSESS.2018.8663861
Varga PKathareios GMate AClauberg RAnghel AOrosz PNagy BTothfalusi TKovacs LGusat M(2017)Real-time security services for SDN-based datacenters2017 13th International Conference on Network and Service Management (CNSM)10.23919/CNSM.2017.8256030(1-9)Online publication date: Nov-2017
https://doi.org/10.23919/CNSM.2017.8256030
Sieber CDurner REhm MKellerer WSharma PSharma PHwang J(2017)Towards optimal adaptation of NFV packet processing to modern CPU memory architecturesProceedings of the 2nd Workshop on Cloud-Assisted Networking10.1145/3155921.3158429(7-12)Online publication date: 11-Dec-2017
https://dl.acm.org/doi/10.1145/3155921.3158429
Plauth MFeinbube LPolze A(2017)A Performance Survey of Lightweight Virtualization TechniquesService-Oriented and Cloud Computing10.1007/978-3-319-67262-5_3(34-48)Online publication date: 1-Sep-2017
https://doi.org/10.1007/978-3-319-67262-5_3
Krishnamurthy RRouskas G(2016)Performance evaluation of multi-core, multi-threaded SIP proxy servers (SPS)2016 IEEE International Conference on Communications (ICC)10.1109/ICC.2016.7511427(1-6)Online publication date: May-2016
https://doi.org/10.1109/ICC.2016.7511427
He JChen YZhang YXing C(2015)Receive CPU Selection FrameworkProceedings of the 2015 12th Web Information System and Application Conference (WISA)10.1109/WISA.2015.71(334-339)Online publication date: 11-Sep-2015
https://dl.acm.org/doi/10.1109/WISA.2015.71
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Improving the scalability of a multi-core web server

Comparing high-performance multi-core web-server architectures

Workshop on relaxing synchronization for multicore and manycore scalability (RACES 2012)

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations