research-article

Thread-level speculation with kernel support

Authors:

Clemens Hammacher,

Andreas Zeller,

Sebastian HackAuthors Info & Claims

CC '16: Proceedings of the 25th International Conference on Compiler Construction

Pages 1 - 11

https://doi.org/10.1145/2892208.2892221

Published: 17 March 2016 Publication History

Abstract

Runtime systems for speculative parallelization can be substantially sped up by implementing them with kernel support. We describe a novel implementation of a thread-level speculation (TLS) system using virtual memory to isolate speculative state, implemented in a Linux kernel module. This design choice not only maximizes performance, but also allows to guarantee soundness in the presence of system calls, such as I/O. Its ability to maintain speedups even on programs with frequent mis-speculation, significantly extends its usability, for instance in speculative parallelization. We demonstrate the advantage of kernel-based TLS on a number of programs from the Cilk suite, where this approach is superior to the state of the art in each single case (7.28x on average). All systems described in this paper are made available as open source.

References

[1]

E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA ’09, pages 81–96, 2009.

Digital Library

[2]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPOPP ’95, pages 207–216, 1995.

Digital Library

[3]

C. Cascaval, C. Blundell, M. Michael, H. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6(5):46–58, nov 2008.

Digital Library

[4]

M. Cintra and D. R. Llanos. Toward efficient and robust software speculative parallelization on multiprocessors. In PPoPP ’03, pages 13–24, 2003.

Digital Library

[5]

D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In DISC ’06, pages 194–208, 2006.

Digital Library

[6]

C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI ’07, pages 223– 234, 2007.

Digital Library

[7]

P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In PPoPP ’08, pages 237–245, 2008.

Digital Library

[8]

P. Felber, C. Fetzer, P. Marlier, and T. Riegel. Time-Based Software Transactional Memory. IEEE TPDS, 21(12):1793–1807, Dec. 2010.

Digital Library

[9]

L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. Micro, 20(2):71–84, 2000.

Digital Library

[10]

T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing memory transactions. ACM SIGPLAN Notices, 41(6):14–25, June 2006.

Digital Library

[11]

M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA ’93, pages 289–300, 1993.

Digital Library

[12]

B. Hertzberg and K. Olukotun. Runtime automatic speculative parallelization. In CGO ’11, pages 64–73, 2011.

Digital Library

[13]

T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Speculative thread decomposition through empirical optimization. In PPoPP ’07, pages 205–214, 2007.

Digital Library

[14]

K. Kelsey, C. Zhang, and C. Ding. Fast Track: Supporting Unsafe Optimizations with Software Speculation. In PACT ’07, pages 414– 429, Sept. 2007.

[15]

H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, and D. I. August. Automatic speculative DOALL for clusters. In CGO ’12, pages 94–103, 2012.

Digital Library

[16]

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: A TLS Compiler that Exploits Program Structure. In PPoPP ’06, pages 158–167, Mar. 2006.

Digital Library

[17]

M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI ’09, pages 166–176, 2009.

Digital Library

[18]

C. E. Oancea, A. Mycroft, and T. Harris. A lightweight in-place implementation for software thread-level speculation.

[19]

J. Oplinger, D. Heine, S.-W. Liao, B. A. Nayfeh, M. S. Lam, and K. Olukotun. Software and hardware for exploiting speculative parallelism with a multiprocessor. Technical report, 1997.

Digital Library

[20]

H. K. Pyla, C. Ribbens, and S. Varadarajan. Exploiting coarse-grain speculative parallelism. In OOPSLA ’11, pages 555–574, 2011.

Digital Library

[21]

A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In ASPLOS ’10, pages 65–76, 2010.

Digital Library

[22]

R. Rangan, N. Vachharajani, M. Vachharajani, and D. I. August. Decoupled software pipelining with the synchronization array. In PACT ’04, pages 177–188, 2004.

[23]

L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE TPDS, 10(2):160–180, 1999.

Digital Library

[24]

T. Riegel, P. Felber, and C. Fetzer. A Lazy Snapshot Algorithm with Eager Validation. In S. Dolev, editor, DISC ’06, pages 284–298, 2006.

Digital Library

[25]

P. Rundberg and P. Stenström. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. Journal of Instruction-Level Parallelism, 3(2001), 2002.

[26]

B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. C. Minh, and B. Hertzberg. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP ’06, pages 187– 197, Mar. 2006.

Digital Library

[27]

N. Shavit and D. Touitou. Software transactional memory. In PODC ’95, pages 204–213, 1995.

Digital Library

[28]

G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. ACM SIGARCH Computer Architecture News, 23(2):414–425, May 1995.

Digital Library

[29]

J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In HPCA ’98, pages 2–13, 1998.

Digital Library

[30]

K. Streit, J. Doerfert, C. Hammacher, A. Zeller, and S. Hack. Generalized Task Parallelism. TACO, 12(1), 2015.

Digital Library

[31]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or Discard execution model for speculative parallelization on multicores. In Micro ’08, pages 330–341, Nov. 2008.

[32]

C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI ’10, pages 62–73, June 2010.

Digital Library

[33]

N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In PACT ’07, pages 49–59, 2007.

[34]

C. Wang, W.-Y. Chen, Y. Wu, B. Saha, and A.-R. Adl-Tabatabai. Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language. In CGO ’07, pages 34–48, Mar. 2007.

Digital Library

Cited By

Li YZhang ZLiu B(2020)An Adaptive Thread Partitioning Approach in Speculative MultithreadingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_6(78-91)Online publication date: 2-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60245-1_6
Wang XWang YLi LYang YBu DMusariri M(2020)Procedure and Loop Level Speculative Parallelism Analysis in HPECAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_4(47-60)Online publication date: 2-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60245-1_4
Liu BYang HYuancheng LYuxiang LDangdang NZhiming L(2019)An Improved Programming Model for Thread-Level Speculation2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00101(666-672)Online publication date: Dec-2019
https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00101

Index Terms

Thread-level speculation with kernel support
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Compiler-Driven Software Speculation for Thread-Level Parallelism

Current parallelizing compilers can tackle applications exercising regular access patterns on arrays or affine indices, where data dependencies can be expressed in a linear form. Unfortunately, there are cases that independence between statements of code ...
Combining thread level speculation helper threads and runahead execution
ICS '09: Proceedings of the 23rd international conference on Supercomputing

With the current trend toward multicore architectures, improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), but, instead, via multithreaded execution.Generating thread-parallel ...
The STAMPede approach to thread-level speculation

Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is relatively straightforward to use these architectures to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CC '16: Proceedings of the 25th International Conference on Compiler Construction

March 2016

270 pages

ISBN:9781450342414

DOI:10.1145/2892208

General Chair:
Ayal Zaks
Intel, Israel / Technion, Israel
,
Program Chair:
Manuel Hermenegildo
IMDEA SW Institute, Spain / T.U. Madrid-UPM, Spain

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 March 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CGO '16

Sponsor:

CGO '16: 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization

March 17 - 18, 2016

Barcelona, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
316
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YZhang ZLiu B(2020)An Adaptive Thread Partitioning Approach in Speculative MultithreadingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_6(78-91)Online publication date: 2-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60245-1_6
Wang XWang YLi LYang YBu DMusariri M(2020)Procedure and Loop Level Speculative Parallelism Analysis in HPECAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_4(47-60)Online publication date: 2-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60245-1_4
Liu BYang HYuancheng LYuxiang LDangdang NZhiming L(2019)An Improved Programming Model for Thread-Level Speculation2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00101(666-672)Online publication date: Dec-2019
https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00101

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten