research-article

Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performance

Authors:
Rania H. Mameesh

University of Siena, Italy, Siena, Italy

University of Siena, Italy, Siena, Italy
View Profile

,
Manoj Franklin

University of Maryland in College Park, College Park, MD, USA

University of Maryland in College Park, College Park, MD, USA
View Profile

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesSeptember 2010Pages 421–430https://doi.org/10.1145/1854273.1854326

Published:11 September 2010Publication History

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Pages 421–430

ABSTRACT

In this paper a new architecture, Speculative-Aware Execution (SAE) is presented that employs speculative-awareness as a means of mitigating the drawbacks of speculative execution which are: useless work (uses speculative values so it produces incorrect results or is done on the wrong path) and redundant work (produces results previously obtained). In order to achieve this, SAE tries to partition the dynamic instruction stream into two disjoint parallel threads: A speculative thread that is partially speculative-aware (p-thread) as it records its speculative state and uses it to avoid useless work (using speculative values) but have no account for its control-flow violations; and a fully speculative-aware thread (f-thread) that has full record of p-thread's speculations, and so can steer p-thread away from incorrect control-flow paths and can accurately identify p-thread's correct work and avoid it, otherwise it would be redundant. By eliminating useless and redundant works, SAE outperforms existing architectures that share similar high-level micro-architecture while incurring only minor hardware additions/changes. Detailed experimental results confirm that SAE indeed reduces the number of useless and redundant computations. We also report an average performance improvement of 18% for the SPEC_INT2000 benchmarks.

References

}}S. Srinivasan, H. Akkary, T. Holman, and K. Lai, A minimal dual-core speculative multithreading architecture, ICCD, 2004. Google ScholarDigital Library
}}A. Roth and G. S. Sohi, Speculative data-driven multithreading, in Proc. HPCA-7, 2001. Google ScholarDigital Library
}}J. Pierce and T. Mudge, Wrong-path instruction prefetching., in Proc. MICRO-94, 1994. Google ScholarDigital Library
}}J. Collins, D. Tullsen, H. Wang, and J. P. Shen, Dynamic speculative precomputation, MICRO, 2001. Google ScholarDigital Library
}}D. Kim and D. Yeung, Design and evaluation of compiler algorithms for pre-execution, in ASPLOS-X, 2002, 159--170. Google ScholarDigital Library
}}S. S. W. Liao, P. H. Wang, G. Hoehner, D. Lavery, and J. P. Shen, Post-pass binary adaptation for software-based speculative precomputation, in ACM SIGPLAN PLDI, June 2002. Google ScholarDigital Library
}}C. Zilles and G. Sohi, Execution-based-prediction using speculative slices, in Proc. ISCA-28, 2001. Google ScholarDigital Library
}}M. Annavaram, J. Patel, and E. Davidson, Data prefetching by dependence graph precomputation, ISCA-28, June 2001. Google ScholarDigital Library
}}J. Dundas and T. Mudge, Improving data cache performance by pre-executing instructions under a cache miss, ICS, 1997. Google ScholarDigital Library
}}J. D. Collins, H. Wang, D. M. Tullsen, C. Hughes, Y. F. lee, D. Lavery, and J. P. Shen, Speculative precomputation: Long-range prefetching of delinquent loads, in ISCA-28, June 2001. Google ScholarDigital Library
}}A. Roth and G. S. Sohi, Register integration: a simple and efficient implementation of squash reuse, MICRO-33, 2000. Google ScholarDigital Library
}}K. Sundaramoorthy, Z. Purser, and E. Rotenburg, A study of slipstream processors, MICRO, 2000. Google ScholarDigital Library
}}K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, & K. Chang, The case for a single chip multiprocessor, ASPLOS, 1996. Google ScholarDigital Library
}}M. Franklin, Multiscalar processors (Kluwer Academic Publishers, 2002). Google ScholarDigital Library
}}D. Burger, T. M. Austin, and S. Bennett, Evaluating future microprocessors: The simplescalar tool set, Tech. Rep. CS TR-1308, University of Wisconsin Madison, July 1996.Google Scholar
}}L. Kurian, P. T. Hulina, and L. D. Coraor, Memory latency effects in decoupled architectures with a single data memory module, in Proc. ISCA-19, 1992, 236--245. Google ScholarDigital Library
}}R. Canal, J. M. Parcerisa, and A. Gonzalez, Dynamic cluster assignment mechanisms, HPCA, 2000.Google Scholar
}}C. Zilles and G. Sohi, Master/slave speculative parallelization, in Proc. MICRO-35, 2002. Google ScholarDigital Library
}}S. Palacharla, N. Jouppi, and J. E. Smith, Complexity effective superscalar processors, ISCA, 1997. Google ScholarDigital Library
}}O. Mutlu, J. Stark, C. Wilkerson, and Y. Patt, Runahead execution: An alternative to very large instruction window for out-of-order processors, in Proc. MICRO, December 2003. Google ScholarDigital Library
}}O. Mutlu, H. Kim, J. Stark, and Y. Patt, On reusing the results of pre-executed instructions in a runahead execution processor, in Computer Architecture Lettters, V. 4, January 2005. Google ScholarDigital Library
}}R. Mameesh and M. Franklin, Symbiotic Subordinate Threading, in Proc. ICCD, 2005. Google ScholarDigital Library
}}H. Zhou, Dual-core execution: building a highly scalable single-thread instruction window, in. Proc. PACT-14, 2005. Google ScholarDigital Library
}}T.Sherwood, E.Perelman, G.Hamerly, & B.Calder, Automatically parallelizing large scale program behavior, ASPLOS, 2002. Google ScholarDigital Library
}}E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder, Using SimPoint for Accurate and Efficient Simulation, in Proc. SIGMETRICS, June 2003. Google ScholarDigital Library
}}A. Sodani and G. S. Sohi, Dynamic instruction reuse, in Proc. ISCA-24, June 1997. Google ScholarDigital Library
}}H. Akkary, R. Rajwar, and S. Srinivasan, Checkpoint processing and recovery: towards scalable large instruction window processors, in Proc. MICRO, 2003. Google ScholarDigital Library
}}S. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton, Continual flow pipelines, in Proc. ASPLOS-11, 2004. Google ScholarDigital Library
}}I.Ganusov and Burtschur, Future execution: a hardware pre-fetching technique for chip multiprocessors, in PACT-14, 2005. Google ScholarDigital Library
}}R. Barnes, E. Nustrom, J. Sias, S. Patel, N. Navaroo, and W. Hwu, Beating in-order stalls with flea-flicker two--pass pipelining, IEEE Transactions on Computers V. 55 No. 1, 2006. Google ScholarDigital Library
}}Alok Garg and Michael C. Huang, A performance-correctness explicitly-decoupled architecture, in Proc. MICRO, 2008. Google ScholarDigital Library

Index Terms

Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performance
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Hardware validation

Recommendations

An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Read More
The impact of speculative execution on SMT processors

By executing two or more threads concurrently, Simultaneous MultiThreading (SMT) architectures are able to exploit both Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP) from the increased number of in-flight instructions that are ...
Read More
Support for speculative execution in high-performance processors
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
September 2010
596 pages
ISBN:9781450301787
DOI:10.1145/1854273
General Chair:
Valentina Salapura
IBM TJ Watson Research Center
,
Program Chairs:
Michael Gschwind
IBM Systems & Technology Group
,
Jens Knoop
Technische Universität Wien
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 September 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fully speculative-aware thread
memory speculation bitmap
partially speculative-aware thread
register speculation bitmap
speculative aware execution
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 287
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performance

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

An evaluation of speculative instruction execution on simultaneous multithreaded processors

The impact of speculative execution on SMT processors

Support for speculative execution in high-performance processors