skip to main content
10.1145/1183401.1183407acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

Published: 28 June 2006 Publication History

Abstract

Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of difficult-to-analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism, evaluation of the unique performance potential of TLS, i.e., performance gain that be achieved only through speculation, has not received much attention. In this paper, we evaluate this aspect, by separating the speedup achievable via true TLP (thread-level parallelism) and TLS, for the SPEC CPU2000 benchmark. Further, we dissect the performance potential of each type of speculation --- control speculation, data dependence speculation and data value speculation. To the best of our knowledge, this is the first dissection study of its kind. Assuming an oracle TLS mechanism --- which corresponds to perfect speculation and zero threading overhead --- whereby the execution time of a candidate program region (for speculative execution) can be reduced to zero, our study shows that, at the loop-level, the upper bound on the arithmetic mean and geometric mean speedup achievable via TLS across SPEC CPU2000 is 39.16% (standard deviation = 31.23) and 18.18% respectively.

References

[1]
A. Liles Jr. and B. Wilner. Branch prediction mechanism. IBM Technical Disclosure Bulletin, 22(7):3013--3016, December 1979.
[2]
J. A. Fisher. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling Beyond Basic Blocks. PhD thesis, New York University, 1979.
[3]
A. Kejariwal and A. Nicolau. Reading list of performance analysis, speculative execution. http://www.ics.uci.edu/~akejariw/SpeculativeExecutionReadingList.pdf.
[4]
J. A. Fisher. Trace Scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C-30(7):478--490, July 1981.
[5]
L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, pages 218--232, La Jolla, CA, 1995.
[6]
A. Nicolau. Run-time disambiguation: coping with statically unpredictable dependencies. IEEE Transactions on Computers, 38(5):633--678, 1989.
[7]
F. Gabbay and A. Mendelson. Speculative execution based on value prediction. Technical Report EE Department TR # 1080, Technion-Israel Institute of Technology, November 1996.
[8]
H. Wang, P. Wang, R. D. Weldon, S. M. Ettinger, H. Saito, M. Girkar, S. S-W. Liao, and J. P. Shen. Speculative precomputation: Exploring the use of multithreading for latency. In Intel Technology Journal, February 2002.
[9]
S. Lundstrom and G. Barnes. A controllable MIMD architectures. In Proceedings of the 1980 International Conference on Parallel Processing, St. Charles, IL, August 1980.
[10]
U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, MA, 1988.
[11]
J. Martinez and J. Torrellas. Speculative synchronization: Programmability and performance for parallel codes. IEEE Micro, 23(6):126--134, December 2003.
[12]
OpenMP Specification, version 2.5. http://www.openmp.org/drupal/mp-documents/spec25.pdf.
[13]
T. N. Vijaykumar, S. Gopal, J. E. Smith, and G. Sohi. Speculative versioning cache. IEEE Transactions on Parallel and Distributed Systems, 12(12):1305--1317, 2001.
[14]
K. M. Lepak, G. B. Bell, and M. H. Lipasti. Silent stores and store value locality. IEEE Transactions on Computers, 50(11):1174--1190, 2001.
[15]
R. R. Oehler and R. D. Groves. IBM RISC system/6000 system architecture. IBM Journal of Research and Development, 34(1):23--36, January 1990.
[16]
G. Estrin and R. Turn. Automatic assignment of computations in a variable structure computer system. IEEE Transactions on Electronic Computers, EC-12(5):755--773, December 1963.
[17]
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. SIGPLAN Notices, 31(9):138--147, 1996.
[18]
Y. Sazeides and J. E. Smith. Limits of data value predictability. International Journal of Parallel Programming, 27(4):229--256, 1999.
[19]
SPEC CPU2000. http://www.spec.org/cpu2000.
[20]
J. T. Oplinger, D. L. Heine, and M. S. Lam. In search of speculative thread-level parallelism. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 303--313, Newport Beach, CA, October 1999.
[21]
P. Marcuello and A. González. A quantitative assessment of thread-level speculation techniques. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, pages 595--604, Cancun, Mexico, May 2000.
[22]
F. Warg and P. Stenström. Limits on speculative module-level parallelism in imperative and object-oriented programs on cmp platforms. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 221--230, 2001.
[23]
A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian. Automatic detection of saturation and clipping idioms. In Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computing, pages 61--74, 2002.
[24]
SPEC CPU95. http://www.spec.org/cpu95/.
[25]
Makoto Kobayashi. Dynamic characteristics of loops. IEEE Transactions on Computers, 33(2):125--132, 1984.
[26]
J. Larus. Loop-level parallelism in numeric and symbolic programs. IEEE Transactions on Parallel and Distributed Systems, 4(7):812--826, 1993.
[27]
Intel®VTune#8482;Performance Analyzer 8.0 for Linux. http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vlin/index.htm.
[28]
F. W. Burton. Speculative computation, parallelism and functional programming. IEEE Transactions on Computers, 34(12):1190--1193, 1985.
[29]
R. H. Halstead Jr. and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer architecture, pages 443--451, Honolulu, Hawaii, 1988.
[30]
M. Franklin and G. S. Sohi. The expandable split window paradigm for exploiting fine-grain parallelism. In Proceedings of the 19th International Symposium on Computer Architecture, pages 58--67, Gold Coast, Australia, May 1992.
[31]
G. S. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414--425, Ligure, Italy, 1995.
[32]
D. Bruening, S. Devabhaktuni, and S. Amarasinghe. Softspec: Software-based speculative parallelism. In Proceedings of the 3rd ACM Workshop on Feedback-Directed and Dynamic Optimization, 2000.
[33]
P. H. Wang, J. D. Collins, H. Wang, D. Kim, B. Greene, K.-M. Chan, A. B. Yunus, T. Sych, S. F. Moore, and J. P. Shen. Helper threads via virtual multithreading. IEEE Micro, 24(6):74--82, 2004.
[34]
R. Gupta and M. L. Soffa. Region scheduling: An approach for detecting and redistributing parallelism. IEEE Transactions on Software Engineering, 16(4):421--431, 1990.
[35]
A. Bhowmik and M. Franklin. A general compiler framework for speculative multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 15(8):713--724, 2004.
[36]
W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: A TLS compiler that exploits program structure. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005.
[37]
J. Tubella and A. González. Control speculation in multithreaded processors through dynamic loop detection. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture, pages 14--23, Las Vegas, NV, February 1998.
[38]
A. S. Huang, G. Slavenburg, and J. P. Shen. Speculative disambiguation: A compilation technique for dynamic memory disambiguation. In Proceedings of the 21th International Symposium on Computer Architecture, pages 200--210, Chicago, IL, 1994.
[39]
J. Steffan and T. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture, pages 2--13, February 1998.
[40]
A. C. Klaiber and H. M. Levy. Architecture for software-controlled data prefetching. In Proceedings of the 18th International Symposium on Computer Architecture, pages 43--63, May 1991.
[41]
D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pages 40-52, Santa Clara, CA, April 1991.

Cited By

View all
  • (2024)Improving speculative query execution support by the use of the hypergraph representationFuture Generation Computer Systems10.1016/j.future.2023.07.030150(186-201)Online publication date: Jan-2024
  • (2023)RDBMS Speculative Support Improvement by the Use of the Query Hypergraph RepresentationParallel Processing and Applied Mathematics10.1007/978-3-031-30442-2_8(95-109)Online publication date: 28-Apr-2023
  • (2020)Speculative query execution in RDBMS based on analysis of query stream multigraphsProceedings of the 24th Symposium on International Database Engineering & Applications10.1145/3410566.3410604(1-10)Online publication date: 12-Aug-2020
  • Show More Cited By

Index Terms

  1. On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ICS '06: Proceedings of the 20th annual international conference on Supercomputing
          June 2006
          385 pages
          ISBN:1595932828
          DOI:10.1145/1183401
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 28 June 2006

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. DOALL loops
          2. control dependence
          3. data dependence
          4. performance evaluation
          5. speculative execution
          6. value dependence

          Qualifiers

          • Article

          Conference

          ICS06
          Sponsor:
          ICS06: International Conference on Supercomputing 2006
          June 28 - July 1, 2006
          Queensland, Cairns, Australia

          Acceptance Rates

          ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;
          Overall Acceptance Rate 629 of 2,180 submissions, 29%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)3
          • Downloads (Last 6 weeks)1
          Reflects downloads up to 05 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Improving speculative query execution support by the use of the hypergraph representationFuture Generation Computer Systems10.1016/j.future.2023.07.030150(186-201)Online publication date: Jan-2024
          • (2023)RDBMS Speculative Support Improvement by the Use of the Query Hypergraph RepresentationParallel Processing and Applied Mathematics10.1007/978-3-031-30442-2_8(95-109)Online publication date: 28-Apr-2023
          • (2020)Speculative query execution in RDBMS based on analysis of query stream multigraphsProceedings of the 24th Symposium on International Database Engineering & Applications10.1145/3410566.3410604(1-10)Online publication date: 12-Aug-2020
          • (2018)Exploring Parallelism in MiBench with Loop and Procedure Level Speculation2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00033(141-146)Online publication date: Dec-2018
          • (2017)Graph-Based Speculative Query Execution in Relational Databases2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)10.1109/ISPDC.2017.14(122-131)Online publication date: Jul-2017
          • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
          • (2016)Parallelizing Back Propagation Neural Network on Speculative Multicores2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0121(902-907)Online publication date: Dec-2016
          • (2016)Combining thread‐level speculation and just‐in‐time compilation in Google's V8 JavaScript engineConcurrency and Computation: Practice and Experience10.1002/cpe.382629:1Online publication date: 6-May-2016
          • (2015)Parallelizing Block Cryptography Algorithms on Speculative MulticoresAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27119-4_1(3-15)Online publication date: 16-Dec-2015
          • (2013)Access Annotation for Safe Program ParallelizationProceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 814710.1007/978-3-642-40820-5_2(13-26)Online publication date: 19-Sep-2013
          • Show More Cited By

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media