research-article

Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Authors:

Shailender Chaudhry,

Martin Karlsson,

Marc TremblayAuthors Info & Claims

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Pages 484 - 495

https://doi.org/10.1145/1555754.1555814

Published: 20 June 2009 Publication History

Abstract

This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single sequential program (one consisting of a load miss and its dependents, and the other consisting of the instructions that are independent of the load miss) and executes them in parallel. SST uses an efficient checkpointing mechanism to eliminate the need for complex and power-inefficient structures such as register renaming logic, reorder buffers, memory disambiguation buffers, and large issue windows. Simulations of certain SST implementations show 18% better per-thread performance on commercial benchmarks than larger and higher-powered out-of-order cores. Sun Microsystems' ROCK processor, which is the first processor to use SST cores, has been implemented and is scheduled to be commercially available in 2009.

References

[1]

Akkary, H., Rajwar, R., and Srinivasan, S. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors. In Proceedings of the 36th IEEE/ACM International Symposium on Microarchitecture (MICRO-36) (Dec. 2003), pp. 423--434.

Digital Library

[2]

Barnes, R., Patel, S., Nystrom, E., Navarro, N., Sias, J., and Hwu, W. Beating In-Order Stalls with Flea-Flicker Two-Pass Pipelining. In Proceedings of the 36th IEEE/ACM International Symposium on Microarchitecture (MICRO-36) (Dec. 2003), pp. 287--398.

Digital Library

[3]

Barroso, L., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. Piranha: A Scalable Architecture based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00) (June 2000), pp. 282--293.

Digital Library

[4]

Chaudhry, S., Caprioli, P., Yip, S., and Tremblay, M. High-Performance Throughput Computing. IEEE Micro 25, 3 (2005), 32--45.

Digital Library

[5]

Chaudhry, S., Cypher, R., Ekman, M., Karlsson, M., Landin, A., Yip, S., Zeffer, H., and Tremblay, M. ROCK: A High-Performance SPARC CMT Processor. IEEE Micro 29, 2 (2009).

Digital Library

[6]

Cristal, A., Santana, O., Cazorla, F., Galluzzi, M., Ramirez, T., Pericas, M., and Valero, M. Kilo-Instruction Processors: Overcoming the Memory Wall. IEEE Micro 25, 3 (2005), 48--57.

Digital Library

[7]

Dundas, J., and Mudge, T. Improving Data Cache Performance by Pre-Executing Instructions Under a Cache Miss. In Proceedings of the 11th International Conference on Supercomputing (ICS'97) (July 1997), pp. 68--75.

Digital Library

[8]

Gallagher, D. M., Chen, W. Y., Mahlke, S. A., Gyllenhaal, J. C., and Hwu, W. W. Dynamic Memory Disambiguation Using the Memory Conflict Buffer. In ASPLOS (Oct. 1994), pp. 183--193.

Digital Library

[9]

Hammond, L., Willey, M., and Olukotun, K. Data Speculation Support For a Chip Multiprocessor. SIGOPS Operating Systems Review 32, 5 (1998), 58--69.

Digital Library

[10]

Herlihy, M., and Moss, J. E. B. Transactional Memory: Architectural Support for Lock-Free Data Structures. SIGARCH Computer Architecture News 21, 2 (1993), 289--300.

Digital Library

[11]

Kessler, R. The Alpha 21264 Microprocessor. IEEE Micro 19, 2 (1999), 24--36.

Digital Library

[12]

Kongetira, P., Aingaran, K., and Olukotun, K. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro 25, 2 (2005), 21--29.

Digital Library

[13]

Krishnan, V., and Torrellas, J. A Chip-Multiprocessor Architecture With Speculative Multithreading. IEEE Transactions on Computers 48, 9 (1999), 866--880.

Digital Library

[14]

Lebeck, A. R., Koppanalil, J., Li, T., Patwardhan, J., and Rotenberg, E. A Large, Fast Instruction Window for Tolerating Cache Misses. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA'02) (May 2002), pp. 59--70.

Digital Library

[15]

Martinez, J., Renau, J., Huang, M., Prvulovic, M., and Torrellas, J. Cherry: Checkpointed Early Resource Recycling in Out-of-Order Microprocessors. In Proceedings of the 35th IEEE/ACM International Symposium on Microarchitecture (MICRO--35) (Nov. 2002), pp. 3--14.

Digital Library

[16]

Mutlu, O., Kim, H., and Patt, Y. N. Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance. IEEE Micro 26, 1 (2006), 10--20.

Digital Library

[17]

Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architeture (HPCA-9) (Feb. 2003), pp. 129--140.

Digital Library

[18]

Olukotun, K., Nayfeh, B. A., Hammond, L., Wilson, K., and Chang, K. The Case for a Single-Chip Multiprocessor. SIGPLAN Notices 31, 9 (1996), 2--11.

Digital Library

[19]

Palacharla, S., Jouppi, N. P., and Smith, J. E. Complexity-Effective Superscalar Processors. SIGARCH Computer Architecture News 25, 2 (1997), 206--218.

Digital Library

[20]

QuiÜnones, C. G., Madriles, C., Sanchez, J., Marcuello, P., Gonzalez, A., and Tullsen, D. Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices. SIGPLAN Notices 40, 6 (2005), 269--279.

Digital Library

[21]

Rundberg, P., and Stenström, P. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. Journal of Instruction-Level Parallelism 3, 1 (2001), 2002.

[22]

Sohi, G. S., Breach, S. E., and Vijaykumar, T. N. Multiscalar Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA'95) (June 1995), pp. 414--425.

Digital Library

[23]

Srinivasan, S. T., Rajwar, R., Akkary, H., Gandhi, A., and Upton, M. Continual Flow Pipelines. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XI) (Oct. 2004), pp. 107--119.

Digital Library

[24]

Tremblay, M., Chan, J., Chaudhry, S., Conigliam, A., and Tse, S. The MAJC Architecture: A Synthesis of Parallelism and Scalability. IEEE Micro 20, 6 (2000), 12--25.

Digital Library

[25]

Tremblay, M., and Chaudhry, S. A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC Processor. In Proceedings of the 2008 International Solid-State Circuits Conference (Feb. 2008), pp. 82--83.

[26]

Wenisch, T., Wunderlich, R., Falsafi, B., and Hoe, J. Simulation Sampling with Live-Points. In Proceedings of the 2006 IEEE International Symposium on Performance Analysis of System and Software (Mar. 2006), pp. 2--12.

Cited By

Lakshminarasimhan KNaithani AFeliu JEeckhout L(2022)The Forward Slice Core: A High-Performance, Yet Low-Complexity MicroarchitectureACM Transactions on Architecture and Code Optimization10.1145/349942419:2(1-25)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3499424
Oliveira GGomez-Luna JOrosa LGhose SVijaykumar NFernandez ISadrosadati MMutlu O(2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110993
Ham TAragón JMartonosi M(2019)Efficient Data Supply for Parallel Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/331033216:2(1-23)Online publication date: 26-Apr-2019
https://dl.acm.org/doi/10.1145/3310332
Show More Cited By

Index Terms

Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor
1. Computer systems organization
  1. Architectures
2. General and reference
  1. Cross-computing tools and techniques
    1. Design

Recommendations

Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single ...
Tolerating Load Miss-Latency by Extending Effective Instruction Window with Low Complexity
ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing

An execute-ahead processor pre-executes instructions when a load miss would stall the processor. The typical design has several components that grow with the distance to execute ahead and need to be carefully balanced for optimal performance. This paper ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

June 2009

510 pages

ISBN:9781605585260

DOI:10.1145/1555754

General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.

ACM SIGARCH Computer Architecture News Volume 37, Issue 3
June 2009
495 pages
ISSN:0163-5964
DOI:10.1145/1555815
Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '09

Sponsor:

ISCA '09: The 36th Annual International Symposium on Computer Architecture

June 20 - 24, 2009

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
1,543
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lakshminarasimhan KNaithani AFeliu JEeckhout L(2022)The Forward Slice Core: A High-Performance, Yet Low-Complexity MicroarchitectureACM Transactions on Architecture and Code Optimization10.1145/349942419:2(1-25)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3499424
Oliveira GGomez-Luna JOrosa LGhose SVijaykumar NFernandez ISadrosadati MMutlu O(2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110993
Ham TAragón JMartonosi M(2019)Efficient Data Supply for Parallel Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/331033216:2(1-23)Online publication date: 26-Apr-2019
https://dl.acm.org/doi/10.1145/3310332
Kondguli SHuang MBahar IHerlihy MWitchel ELebeck A(2019)BootstrappingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304052(687-700)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304052
Tran KJimborean ACarlson TKoukos KSjälander MKaxiras S(2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresACM SIGPLAN Notices10.1145/3296979.319239353:4(328-343)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192393
Zhao THall MBasu PWilliams SJohansen H(2018)SIMD code generation for stencils on brick decompositionsACM SIGPLAN Notices10.1145/3200691.317853753:1(423-424)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178537
Mururu GGavrilovska APande S(2018)Quantifying and reducing execution variance in STM via model driven commit optimizationACM SIGPLAN Notices10.1145/3200691.317853053:1(409-410)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178530
Kerbl BMüller JKenzel MSchmalstieg DSteinberger M(2018)A scalable queue for work distribution on GPUsACM SIGPLAN Notices10.1145/3200691.317852653:1(401-402)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178526
Friedman MHerlihy MMarathe VPetrank E(2018)A persistent lock-free queue for non-volatile memoryACM SIGPLAN Notices10.1145/3200691.317849053:1(28-40)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178490
Tran KJimborean ACarlson TKoukos KSjälander MKaxiras SFoster JGrossman D(2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192393(328-343)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3192366.3192393
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten