skip to main content
article

Performance/Watt: the new server focus

Published: 01 November 2005 Publication History

Abstract

Transaction processing has emerged as the killer application for commercial servers. Most servers are engaged in transactional workloads such as processing search requests, serving middleware, evaluating decisions, managing databases, and powering online commerce. Currently, commercial servers are built from one or more high-performance superscalar processors. However, commercial server applications exhibit high cache miss rates, large memory footprints, and low instruction level parallelism (ILP), which leads to poor utilization on traditional ILP-focused superscalar processors [11]. In addition, these ILP-focused processors have been primarily optimized to deliver maximum performance by employing high clock rates and large amounts of speculation. As a result, we are now at the point where the performance/Watt of subsequent generations of traditional ILP-focused processors on server workloads has been flat [4] or even decreasing. The lack of increase in processor performance/Watt, coupled with the continued decrease in server hardware acquisition costs and likely increases in future power and cooling costs is leading to a situation where total cost of server ownership will soon be predominately determined by power [4]. In this paper, we argue that attacking thread-level parallelism (TLP) via a large number of simple cores on a chip multiprocessor (CMP) leads to much better performance/Watt for server workloads. As a case study, we compare Sun's TLP-oriented Niagara processor against the ILP-oriented dual-core Pentium Extreme Edition from Intel, showing that the Niagara processor has a significant performance/Watt advantage for throughput-oriented server applications.

References

[1]
A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-scale Multiprocessors," IEEE Micro June 1993, pages 48--61. 0.
[2]
L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads." Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages 3--14.
[3]
L. Barroso, K. Charachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing." Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.
[4]
L. Barroso, "The Price of Performance", ACM Queue, Vol 3, Number 7, September 2005.
[5]
S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay, "High-Performance Throughput Computing," IEEE Micro, May/June 2005, pages 32--45.
[6]
J. Clabes, J. Friedrich, and M. Sweet, "Design and Implementation of the POWER5#8482; Microprocessor" ISSCC Dig. Tech. Papers, Feb. 2004, pages 56--57.
[7]
J. D. Davis, et. al. "Maximizing CMT Throughput with Mediocre Cores" In Proceeedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2005, pages 51--62.
[8]
J. Dundas and T. Mudge, "Improving Data Cache Performance by Pre-executing Instructions Under a Cache Miss", In Proceedings of the 1997 International Conference on Supercomputing, July 1997, pages 68--75.
[9]
M. Hrishikesh, et. al. "The Optimal Logic Depth per Pipeline Stage Is 6 to 8 FO4 Inverter Delays". In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 14--24.
[10]
P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-way Multithreaded SPARC Processor," IEEE Micro, March/April 2005, pages 21--29.
[11]
S. Kunkel, R. Eickemeyer, M. Lip, T. Mullins, "A Performance Methodology for Commercial Servers," IBM Journal of Research and Development, Vol. 44, Number 6, 2000.
[12]
J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations," Proceedings of the 6th International Symposium on Architectural Support for Parallel Languages and Operating Systems, October 1994, pages 308--318.
[13]
J. Lo, L. Barroso, S. Eggers, K. Gharachorloo, et. al. "An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors," Proceeedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages39--50.
[14]
D. Marr, "Hyper-Threading Technology in the Netburst® Microarchitecture", 14th Hot Chips, August 2002.
[15]
S. Mukherjee, M. Kontz, and S. Reinhardt, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 99--110.
[16]
O. Mutlu, H. Kim, J. Stark, and Y. N. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," Proceedings of the 9th International Symposium on High Performance Computer Architecture, February 2003.
[17]
S. Naffzigerl, T. Grutkowski2, and B. Stackhouse, "The Implementation of a 2-core Multi-Threaded Itanium® Family Processor," IEEE Internation Solid-State Circuits Conference (ISSCC), Feb. 2005, pages 182--183
[18]
C. Poirier, R. McGowen2, C. Bostak1, and S. Naffziger, "Power and Temperature Control on a 90nm Itanium®-Family Processor," ISSCC, Feb. 2005, pages 304--305
[19]
Standard Performance Evaluation Corporation, SPEC*, http://www.spec.org, Warrenton, VA.
[20]
Transaction Processing Performance Council, TPC-*, http:/www.tpc.org, San Francisco, CA
[21]
D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallism," Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pages 392--403.
[22]
T. Vijaykumar, I. Pomeranz, and K. Cheng, "Transient-Fault Recovery Using Simultaneous Multithreading," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 87--98.
[23]
"XML Processing Performance in Java and .NET", http://java.sun.com/performance/reference/whitepapers/XML_Test-1_0.pdf

Cited By

View all
  • (2022)Improving ThroughputChip Multiprocessor Architecture10.1007/978-3-031-01720-9_2(21-59)Online publication date: 5-Mar-2022
  • (2019)Energy-efficient crypto acceleration with HW/SW co-design for HTTPSFuture Generation Computer Systems10.1016/j.future.2019.02.023Online publication date: Feb-2019
  • (2018)AEAS - Towards High Energy-efficiency Design for OpenSSL Encryption Acceleration through HW/SW Co-designProceedings of the 2018 Great Lakes Symposium on VLSI10.1145/3194554.3194584(171-176)Online publication date: 30-May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 33, Issue 4
Special issue: dasCMP'05
November 2005
130 pages
ISSN:0163-5964
DOI:10.1145/1105734
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2005
Published in SIGARCH Volume 33, Issue 4

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Improving ThroughputChip Multiprocessor Architecture10.1007/978-3-031-01720-9_2(21-59)Online publication date: 5-Mar-2022
  • (2019)Energy-efficient crypto acceleration with HW/SW co-design for HTTPSFuture Generation Computer Systems10.1016/j.future.2019.02.023Online publication date: Feb-2019
  • (2018)AEAS - Towards High Energy-efficiency Design for OpenSSL Encryption Acceleration through HW/SW Co-designProceedings of the 2018 Great Lakes Symposium on VLSI10.1145/3194554.3194584(171-176)Online publication date: 30-May-2018
  • (2016)HCloudACM SIGARCH Computer Architecture News10.1145/2980024.287236544:2(473-488)Online publication date: 25-Mar-2016
  • (2016)HCloudACM SIGOPS Operating Systems Review10.1145/2954680.287236550:2(473-488)Online publication date: 25-Mar-2016
  • (2016)HCloudACM SIGPLAN Notices10.1145/2954679.287236551:4(473-488)Online publication date: 25-Mar-2016
  • (2016)HCloudProceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872365(473-488)Online publication date: 25-Mar-2016
  • (2015)A State-Based Energy/Performance Model for Parallel Applications on Multicore ComputersProceedings of the 2015 44th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2015.33(230-239)Online publication date: 1-Sep-2015
  • (2014)Carbon nanotubes for high-performance logicMRS Bulletin10.1557/mrs.2014.16439:08(719-726)Online publication date: 14-Aug-2014
  • (2014)A survey on sustainability in ICTProceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation10.1145/2598394.2605695(1213-1220)Online publication date: 12-Jul-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media