ABSTRACT
In this paper a new architecture, Speculative-Aware Execution (SAE) is presented that employs speculative-awareness as a means of mitigating the drawbacks of speculative execution which are: useless work (uses speculative values so it produces incorrect results or is done on the wrong path) and redundant work (produces results previously obtained). In order to achieve this, SAE tries to partition the dynamic instruction stream into two disjoint parallel threads: A speculative thread that is partially speculative-aware (p-thread) as it records its speculative state and uses it to avoid useless work (using speculative values) but have no account for its control-flow violations; and a fully speculative-aware thread (f-thread) that has full record of p-thread's speculations, and so can steer p-thread away from incorrect control-flow paths and can accurately identify p-thread's correct work and avoid it, otherwise it would be redundant. By eliminating useless and redundant works, SAE outperforms existing architectures that share similar high-level micro-architecture while incurring only minor hardware additions/changes. Detailed experimental results confirm that SAE indeed reduces the number of useless and redundant computations. We also report an average performance improvement of 18% for the SPEC_INT2000 benchmarks.
- }}S. Srinivasan, H. Akkary, T. Holman, and K. Lai, A minimal dual-core speculative multithreading architecture, ICCD, 2004. Google ScholarDigital Library
- }}A. Roth and G. S. Sohi, Speculative data-driven multithreading, in Proc. HPCA-7, 2001. Google ScholarDigital Library
- }}J. Pierce and T. Mudge, Wrong-path instruction prefetching., in Proc. MICRO-94, 1994. Google ScholarDigital Library
- }}J. Collins, D. Tullsen, H. Wang, and J. P. Shen, Dynamic speculative precomputation, MICRO, 2001. Google ScholarDigital Library
- }}D. Kim and D. Yeung, Design and evaluation of compiler algorithms for pre-execution, in ASPLOS-X, 2002, 159--170. Google ScholarDigital Library
- }}S. S. W. Liao, P. H. Wang, G. Hoehner, D. Lavery, and J. P. Shen, Post-pass binary adaptation for software-based speculative precomputation, in ACM SIGPLAN PLDI, June 2002. Google ScholarDigital Library
- }}C. Zilles and G. Sohi, Execution-based-prediction using speculative slices, in Proc. ISCA-28, 2001. Google ScholarDigital Library
- }}M. Annavaram, J. Patel, and E. Davidson, Data prefetching by dependence graph precomputation, ISCA-28, June 2001. Google ScholarDigital Library
- }}J. Dundas and T. Mudge, Improving data cache performance by pre-executing instructions under a cache miss, ICS, 1997. Google ScholarDigital Library
- }}J. D. Collins, H. Wang, D. M. Tullsen, C. Hughes, Y. F. lee, D. Lavery, and J. P. Shen, Speculative precomputation: Long-range prefetching of delinquent loads, in ISCA-28, June 2001. Google ScholarDigital Library
- }}A. Roth and G. S. Sohi, Register integration: a simple and efficient implementation of squash reuse, MICRO-33, 2000. Google ScholarDigital Library
- }}K. Sundaramoorthy, Z. Purser, and E. Rotenburg, A study of slipstream processors, MICRO, 2000. Google ScholarDigital Library
- }}K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, & K. Chang, The case for a single chip multiprocessor, ASPLOS, 1996. Google ScholarDigital Library
- }}M. Franklin, Multiscalar processors (Kluwer Academic Publishers, 2002). Google ScholarDigital Library
- }}D. Burger, T. M. Austin, and S. Bennett, Evaluating future microprocessors: The simplescalar tool set, Tech. Rep. CS TR-1308, University of Wisconsin Madison, July 1996.Google Scholar
- }}L. Kurian, P. T. Hulina, and L. D. Coraor, Memory latency effects in decoupled architectures with a single data memory module, in Proc. ISCA-19, 1992, 236--245. Google ScholarDigital Library
- }}R. Canal, J. M. Parcerisa, and A. Gonzalez, Dynamic cluster assignment mechanisms, HPCA, 2000.Google Scholar
- }}C. Zilles and G. Sohi, Master/slave speculative parallelization, in Proc. MICRO-35, 2002. Google ScholarDigital Library
- }}S. Palacharla, N. Jouppi, and J. E. Smith, Complexity effective superscalar processors, ISCA, 1997. Google ScholarDigital Library
- }}O. Mutlu, J. Stark, C. Wilkerson, and Y. Patt, Runahead execution: An alternative to very large instruction window for out-of-order processors, in Proc. MICRO, December 2003. Google ScholarDigital Library
- }}O. Mutlu, H. Kim, J. Stark, and Y. Patt, On reusing the results of pre-executed instructions in a runahead execution processor, in Computer Architecture Lettters, V. 4, January 2005. Google ScholarDigital Library
- }}R. Mameesh and M. Franklin, Symbiotic Subordinate Threading, in Proc. ICCD, 2005. Google ScholarDigital Library
- }}H. Zhou, Dual-core execution: building a highly scalable single-thread instruction window, in. Proc. PACT-14, 2005. Google ScholarDigital Library
- }}T.Sherwood, E.Perelman, G.Hamerly, & B.Calder, Automatically parallelizing large scale program behavior, ASPLOS, 2002. Google ScholarDigital Library
- }}E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder, Using SimPoint for Accurate and Efficient Simulation, in Proc. SIGMETRICS, June 2003. Google ScholarDigital Library
- }}A. Sodani and G. S. Sohi, Dynamic instruction reuse, in Proc. ISCA-24, June 1997. Google ScholarDigital Library
- }}H. Akkary, R. Rajwar, and S. Srinivasan, Checkpoint processing and recovery: towards scalable large instruction window processors, in Proc. MICRO, 2003. Google ScholarDigital Library
- }}S. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton, Continual flow pipelines, in Proc. ASPLOS-11, 2004. Google ScholarDigital Library
- }}I.Ganusov and Burtschur, Future execution: a hardware pre-fetching technique for chip multiprocessors, in PACT-14, 2005. Google ScholarDigital Library
- }}R. Barnes, E. Nustrom, J. Sias, S. Patel, N. Navaroo, and W. Hwu, Beating in-order stalls with flea-flicker two--pass pipelining, IEEE Transactions on Computers V. 55 No. 1, 2006. Google ScholarDigital Library
- }}Alok Garg and Michael C. Huang, A performance-correctness explicitly-decoupled architecture, in Proc. MICRO, 2008. Google ScholarDigital Library
Index Terms
- Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performance
Recommendations
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
The impact of speculative execution on SMT processors
By executing two or more threads concurrently, Simultaneous MultiThreading (SMT) architectures are able to exploit both Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP) from the increased number of in-flight instructions that are ...
Comments