skip to main content
10.1145/2892208.2892221acmconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article

Thread-level speculation with kernel support

Published: 17 March 2016 Publication History

Abstract

Runtime systems for speculative parallelization can be substantially sped up by implementing them with kernel support. We describe a novel implementation of a thread-level speculation (TLS) system using virtual memory to isolate speculative state, implemented in a Linux kernel module. This design choice not only maximizes performance, but also allows to guarantee soundness in the presence of system calls, such as I/O. Its ability to maintain speedups even on programs with frequent mis-speculation, significantly extends its usability, for instance in speculative parallelization. We demonstrate the advantage of kernel-based TLS on a number of programs from the Cilk suite, where this approach is superior to the state of the art in each single case (7.28x on average). All systems described in this paper are made available as open source.

References

[1]
E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA ’09, pages 81–96, 2009.
[2]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPOPP ’95, pages 207–216, 1995.
[3]
C. Cascaval, C. Blundell, M. Michael, H. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6(5):46–58, nov 2008.
[4]
M. Cintra and D. R. Llanos. Toward efficient and robust software speculative parallelization on multiprocessors. In PPoPP ’03, pages 13–24, 2003.
[5]
D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In DISC ’06, pages 194–208, 2006.
[6]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI ’07, pages 223– 234, 2007.
[7]
P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In PPoPP ’08, pages 237–245, 2008.
[8]
P. Felber, C. Fetzer, P. Marlier, and T. Riegel. Time-Based Software Transactional Memory. IEEE TPDS, 21(12):1793–1807, Dec. 2010.
[9]
L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. Micro, 20(2):71–84, 2000.
[10]
T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing memory transactions. ACM SIGPLAN Notices, 41(6):14–25, June 2006.
[11]
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA ’93, pages 289–300, 1993.
[12]
B. Hertzberg and K. Olukotun. Runtime automatic speculative parallelization. In CGO ’11, pages 64–73, 2011.
[13]
T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Speculative thread decomposition through empirical optimization. In PPoPP ’07, pages 205–214, 2007.
[14]
K. Kelsey, C. Zhang, and C. Ding. Fast Track: Supporting Unsafe Optimizations with Software Speculation. In PACT ’07, pages 414– 429, Sept. 2007.
[15]
H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, and D. I. August. Automatic speculative DOALL for clusters. In CGO ’12, pages 94–103, 2012.
[16]
W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: A TLS Compiler that Exploits Program Structure. In PPoPP ’06, pages 158–167, Mar. 2006.
[17]
M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI ’09, pages 166–176, 2009.
[18]
C. E. Oancea, A. Mycroft, and T. Harris. A lightweight in-place implementation for software thread-level speculation.
[19]
J. Oplinger, D. Heine, S.-W. Liao, B. A. Nayfeh, M. S. Lam, and K. Olukotun. Software and hardware for exploiting speculative parallelism with a multiprocessor. Technical report, 1997.
[20]
H. K. Pyla, C. Ribbens, and S. Varadarajan. Exploiting coarse-grain speculative parallelism. In OOPSLA ’11, pages 555–574, 2011.
[21]
A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In ASPLOS ’10, pages 65–76, 2010.
[22]
R. Rangan, N. Vachharajani, M. Vachharajani, and D. I. August. Decoupled software pipelining with the synchronization array. In PACT ’04, pages 177–188, 2004.
[23]
L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE TPDS, 10(2):160–180, 1999.
[24]
T. Riegel, P. Felber, and C. Fetzer. A Lazy Snapshot Algorithm with Eager Validation. In S. Dolev, editor, DISC ’06, pages 284–298, 2006.
[25]
P. Rundberg and P. Stenström. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. Journal of Instruction-Level Parallelism, 3(2001), 2002.
[26]
B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. C. Minh, and B. Hertzberg. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP ’06, pages 187– 197, Mar. 2006.
[27]
N. Shavit and D. Touitou. Software transactional memory. In PODC ’95, pages 204–213, 1995.
[28]
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. ACM SIGARCH Computer Architecture News, 23(2):414–425, May 1995.
[29]
J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In HPCA ’98, pages 2–13, 1998.
[30]
K. Streit, J. Doerfert, C. Hammacher, A. Zeller, and S. Hack. Generalized Task Parallelism. TACO, 12(1), 2015.
[31]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or Discard execution model for speculative parallelization on multicores. In Micro ’08, pages 330–341, Nov. 2008.
[32]
C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI ’10, pages 62–73, June 2010.
[33]
N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In PACT ’07, pages 49–59, 2007.
[34]
C. Wang, W.-Y. Chen, Y. Wu, B. Saha, and A.-R. Adl-Tabatabai. Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language. In CGO ’07, pages 34–48, Mar. 2007.

Cited By

View all
  • (2020)An Adaptive Thread Partitioning Approach in Speculative MultithreadingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_6(78-91)Online publication date: 2-Oct-2020
  • (2020)Procedure and Loop Level Speculative Parallelism Analysis in HPECAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_4(47-60)Online publication date: 2-Oct-2020
  • (2019)An Improved Programming Model for Thread-Level Speculation2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00101(666-672)Online publication date: Dec-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CC '16: Proceedings of the 25th International Conference on Compiler Construction
March 2016
270 pages
ISBN:9781450342414
DOI:10.1145/2892208
  • General Chair:
  • Ayal Zaks,
  • Program Chair:
  • Manuel Hermenegildo
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. kernel module
  2. shared memory
  3. speculative parallelization
  4. thread-level speculation
  5. virtual memory

Qualifiers

  • Research-article

Conference

CGO '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)An Adaptive Thread Partitioning Approach in Speculative MultithreadingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_6(78-91)Online publication date: 2-Oct-2020
  • (2020)Procedure and Loop Level Speculative Parallelism Analysis in HPECAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_4(47-60)Online publication date: 2-Oct-2020
  • (2019)An Improved Programming Model for Thread-Level Speculation2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00101(666-672)Online publication date: Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media