article

Performance/Watt: the new server focus

Author:

James LaudonAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 33, Issue 4

Pages 5 - 13

https://doi.org/10.1145/1105734.1105737

Published: 01 November 2005 Publication History

Abstract

Transaction processing has emerged as the killer application for commercial servers. Most servers are engaged in transactional workloads such as processing search requests, serving middleware, evaluating decisions, managing databases, and powering online commerce. Currently, commercial servers are built from one or more high-performance superscalar processors. However, commercial server applications exhibit high cache miss rates, large memory footprints, and low instruction level parallelism (ILP), which leads to poor utilization on traditional ILP-focused superscalar processors [11]. In addition, these ILP-focused processors have been primarily optimized to deliver maximum performance by employing high clock rates and large amounts of speculation. As a result, we are now at the point where the performance/Watt of subsequent generations of traditional ILP-focused processors on server workloads has been flat [4] or even decreasing. The lack of increase in processor performance/Watt, coupled with the continued decrease in server hardware acquisition costs and likely increases in future power and cooling costs is leading to a situation where total cost of server ownership will soon be predominately determined by power [4]. In this paper, we argue that attacking thread-level parallelism (TLP) via a large number of simple cores on a chip multiprocessor (CMP) leads to much better performance/Watt for server workloads. As a case study, we compare Sun's TLP-oriented Niagara processor against the ILP-oriented dual-core Pentium Extreme Edition from Intel, showing that the Niagara processor has a significant performance/Watt advantage for throughput-oriented server applications.

References

[1]

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-scale Multiprocessors," IEEE Micro June 1993, pages 48--61. 0.

Digital Library

[2]

L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads." Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages 3--14.

Digital Library

[3]

L. Barroso, K. Charachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing." Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.

Digital Library

[4]

L. Barroso, "The Price of Performance", ACM Queue, Vol 3, Number 7, September 2005.

Digital Library

[5]

S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay, "High-Performance Throughput Computing," IEEE Micro, May/June 2005, pages 32--45.

Digital Library

[6]

J. Clabes, J. Friedrich, and M. Sweet, "Design and Implementation of the POWER5#8482; Microprocessor" ISSCC Dig. Tech. Papers, Feb. 2004, pages 56--57.

Digital Library

[7]

J. D. Davis, et. al. "Maximizing CMT Throughput with Mediocre Cores" In Proceeedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2005, pages 51--62.

Digital Library

[8]

J. Dundas and T. Mudge, "Improving Data Cache Performance by Pre-executing Instructions Under a Cache Miss", In Proceedings of the 1997 International Conference on Supercomputing, July 1997, pages 68--75.

Digital Library

[9]

M. Hrishikesh, et. al. "The Optimal Logic Depth per Pipeline Stage Is 6 to 8 FO4 Inverter Delays". In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 14--24.

Digital Library

[10]

P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-way Multithreaded SPARC Processor," IEEE Micro, March/April 2005, pages 21--29.

Digital Library

[11]

S. Kunkel, R. Eickemeyer, M. Lip, T. Mullins, "A Performance Methodology for Commercial Servers," IBM Journal of Research and Development, Vol. 44, Number 6, 2000.

Digital Library

[12]

J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations," Proceedings of the 6th International Symposium on Architectural Support for Parallel Languages and Operating Systems, October 1994, pages 308--318.

Digital Library

[13]

J. Lo, L. Barroso, S. Eggers, K. Gharachorloo, et. al. "An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors," Proceeedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages39--50.

Digital Library

[14]

D. Marr, "Hyper-Threading Technology in the Netburst® Microarchitecture", 14th Hot Chips, August 2002.

[15]

S. Mukherjee, M. Kontz, and S. Reinhardt, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 99--110.

Digital Library

[16]

O. Mutlu, H. Kim, J. Stark, and Y. N. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," Proceedings of the 9th International Symposium on High Performance Computer Architecture, February 2003.

Digital Library

[17]

S. Naffzigerl, T. Grutkowski2, and B. Stackhouse, "The Implementation of a 2-core Multi-Threaded Itanium® Family Processor," IEEE Internation Solid-State Circuits Conference (ISSCC), Feb. 2005, pages 182--183

[18]

C. Poirier, R. McGowen2, C. Bostak1, and S. Naffziger, "Power and Temperature Control on a 90nm Itanium®-Family Processor," ISSCC, Feb. 2005, pages 304--305

[19]

Standard Performance Evaluation Corporation, SPEC*, http://www.spec.org, Warrenton, VA.

[20]

Transaction Processing Performance Council, TPC-*, http:/www.tpc.org, San Francisco, CA

[21]

D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallism," Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pages 392--403.

Digital Library

[22]

T. Vijaykumar, I. Pomeranz, and K. Cheng, "Transient-Fault Recovery Using Simultaneous Multithreading," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 87--98.

Digital Library

[23]

"XML Processing Performance in Java and .NET", http://java.sun.com/performance/reference/whitepapers/XML_Test-1_0.pdf

Cited By

Olukotun KHammond LLaudon JOlukotun KHammond LLaudon J(2022)Improving ThroughputChip Multiprocessor Architecture10.1007/978-3-031-01720-9_2(21-59)Online publication date: 5-Mar-2022
https://doi.org/10.1007/978-3-031-01720-9_2
Xiao CZhang LLiu WBergmann NXie Y(2019)Energy-efficient crypto acceleration with HW/SW co-design for HTTPSFuture Generation Computer Systems10.1016/j.future.2019.02.023Online publication date: Feb-2019
https://doi.org/10.1016/j.future.2019.02.023
Xiao CXie YZhang LChen DHomayoun HTaskin B(2018)AEAS - Towards High Energy-efficiency Design for OpenSSL Encryption Acceleration through HW/SW Co-designProceedings of the 2018 Great Lakes Symposium on VLSI10.1145/3194554.3194584(171-176)Online publication date: 30-May-2018
https://dl.acm.org/doi/10.1145/3194554.3194584
Show More Cited By

Index Terms

Performance/Watt: the new server focus

Recommendations

Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing
Special section on adaptive power management for energy and temperature-aware computing systems

Asymmetric multi-core processors (AMPs) have been shown to outperform symmetric ones in terms of performance and performance/watt. Improved performance and power efficiency are achieved when the program threads are matched to their most suitable cores. ...
Dynamic Thread Scheduling in Asymmetric Multicores to Maximize Performance-per-Watt
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Recent trends in technology scaling have enabled the incorporation of multiple processor cores on a single die. Depending on the characteristics of the cores, the multicore may be either symmetric (SMP) or asymmetric (AMP). Several studies have shown ...
Efficient superscalar performance through boosting
ASPLOS V: Proceedings of the fifth international conference on Architectural support for programming languages and operating systems

The foremost goal of superscalar processor design is to increase performance through the exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 33, Issue 4

Special issue: dasCMP'05

November 2005

130 pages

ISSN:0163-5964

DOI:10.1145/1105734

Issue’s Table of Contents

Copyright © 2005 Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2005

Published in SIGARCH Volume 33, Issue 4

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
660
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Olukotun KHammond LLaudon JOlukotun KHammond LLaudon J(2022)Improving ThroughputChip Multiprocessor Architecture10.1007/978-3-031-01720-9_2(21-59)Online publication date: 5-Mar-2022
https://doi.org/10.1007/978-3-031-01720-9_2
Xiao CZhang LLiu WBergmann NXie Y(2019)Energy-efficient crypto acceleration with HW/SW co-design for HTTPSFuture Generation Computer Systems10.1016/j.future.2019.02.023Online publication date: Feb-2019
https://doi.org/10.1016/j.future.2019.02.023
Xiao CXie YZhang LChen DHomayoun HTaskin B(2018)AEAS - Towards High Energy-efficiency Design for OpenSSL Encryption Acceleration through HW/SW Co-designProceedings of the 2018 Great Lakes Symposium on VLSI10.1145/3194554.3194584(171-176)Online publication date: 30-May-2018
https://dl.acm.org/doi/10.1145/3194554.3194584
Delimitrou CKozyrakis C(2016)HCloudACM SIGARCH Computer Architecture News10.1145/2980024.287236544:2(473-488)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2980024.2872365
Delimitrou CKozyrakis C(2016)HCloudACM SIGOPS Operating Systems Review10.1145/2954680.287236550:2(473-488)Online publication date: 25-Mar-2016
https://doi.org/10.1145/2954680.2872365
Delimitrou CKozyrakis C(2016)HCloudACM SIGPLAN Notices10.1145/2954679.287236551:4(473-488)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2954679.2872365
Delimitrou CKozyrakis CConte TZhou Y(2016)HCloudProceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872365(473-488)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2872362.2872365
Chen YMair JHuang ZEyers DZhang H(2015)A State-Based Energy/Performance Model for Parallel Applications on Multicore ComputersProceedings of the 2015 44th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2015.33(230-239)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1109/ICPPW.2015.33
Chen ZPhilip Wong HMitra SBol APeng LHills GThissen N(2014)Carbon nanotubes for high-performance logicMRS Bulletin10.1557/mrs.2014.16439:08(719-726)Online publication date: 14-Aug-2014
https://doi.org/10.1557/mrs.2014.164
Tantar ATantar EArnold D(2014)A survey on sustainability in ICTProceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation10.1145/2598394.2605695(1213-1220)Online publication date: 12-Jul-2014
https://dl.acm.org/doi/10.1145/2598394.2605695
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents