skip to main content
10.1145/1122971.1122997acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

POSH: a TLS compiler that exploits program structure

Published: 29 March 2006 Publication History

Abstract

As multi-core architectures with Thread-Level Speculation (TLS) are becoming better understood, it is important to focus on TLS compilation. TLS compilers are interesting in that, while they do not need to fully prove the independence of concurrent tasks, they make choices of where and when to generate speculative tasks that are crucial to overall TLS performance.This paper presents POSH, a new, fully automated TLS compiler built on top of gcc. POSH is based on two design decisions. First, to partition the code into tasks, it leverages the code structures created by the programmer, namely subroutines and loops. Second, it uses a simple profiling pass to discard ineffective tasks. With the code generated by POSH, a simulated TLS chip multiprocessor with 4 superscalar cores delivers an average speedup of 1.30 for the SPECint 2000 applications. Moreover, an estimated 26% of this speedup is a result of the implicit data prefetching provided by squashed tasks.

References

[1]
A. Bhowmik and M. Franklin. A General Compiler Framework for Speculative Multithreading. In Symposium on Parallel Algorithms and Architectures (SPAA), August 2002.
[2]
M. Chen and K. Olukotun. The Jrpm System for Dynamically Parallelizing Java Programs. In International Symposium on Computer Architecture, June 2003.
[3]
P. S. Chen, M. Y. Hung, Y. S. Hwang, R. D. Ju, and J. K. Lee. Compiler Support for Speculative Multithreading Architecture with Probabilistic Points-to Analysis. In Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 25--36, June 2003.
[4]
Z.-H. Du, C.-C. Lim, X.-F. Li, C. Yang, Q. Zhao, and T.-F. Ngai. A Cost-Driven Compilation Framework for Speculative Parallelization of Sequential Programs. In Conference on Programming Language Design and Implementation (PLDI), June 2004.
[5]
P. Dubey, K. O'Brien, K. M. O'Brien, and C. Barton. Single-Program Speculative Multithreading (SPSM) Architecture. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), 1995.
[6]
L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, October 1998.
[7]
T. Johnson, R. Eigenmann, and T. Vijaykumar. Min-Cut Program Decomposition for Thread-Level Speculation. In Conference on Programming Language Design and Implementation (PLDI), 2004.
[8]
V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, pages 866--880, September 1999.
[9]
X.-F. Li, Z.-H. Du, Q. Zhao, and T.-F. Ngai. Software Value Prediction for Speculative Parallel Threaded Computations. In First Value Prediction Workshop, pages 18--25, June 2003.
[10]
P. Marcuello and A. Gonzalez. Thread-Spawning Schemes for Speculative Multithreading. In International Symposium on High-Performance Computer Architecture (HPCA), February 2002.
[11]
D. Novillo. Design and Implementation of the TreeSSA. In Proceedings of the GCC Developer's Summit, June 2004.
[12]
J. T. Oplinger, D. L. Heine, and M. S. Lam. In Search of Speculative Thread-Level Parallelism. In International Conference on Parallel Architectures and Compilation Techniques (PACT), October 1999.
[13]
C. G. Quiñones, C. Madriles, J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices. In Conference on Programming Language Design and Implementation, pages 269--279, 2005.
[14]
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC Simulator, January 2005. http://sesc.sourceforge.net.
[15]
J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss, and J. Torrellas. Tasking with Out-of-Order Spawn in TLS Chip Multiprocessors: Microarchitecture and Compilation. In International Conference on Supercomputing (ICS), pages 179--188, June 2005.
[16]
G. Sohi, S. Breach, and T. Vijayakumar. Multiscalar Processors. In Intl. Symp. on Computer Architecture, pages 414--425, June 1995.
[17]
J. Steffan, C. Colohan, A. Zhai, and T. Mowry. A Scalable Approach to Thread-Level Speculation. In International Symposium on Computer Architecture, pages 1--12, June 2000.
[18]
M. Tremblay. MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.
[19]
J. Tsai, J. Huang, C. Amlo, D. Lilja, and P. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, 48(9):881--902, September 1999.
[20]
J. Y. Tsai, Z. Jiang, and P. C. Yew. Compiler Techniques for the Superthreaded Architecture. In International Journal of Parallel Programming, pages 27(1):1--19, 1999.
[21]
T. Vijaykumar and G. Sohi. Task Selection for a Multiscalar Processor. In International Symposium on Microarchitecture, pages 81--92, November 1998.
[22]
F. Warg and P. Stenström. Limits on Speculative Module-Level Parallelism in Imperative and Object-Oriented Programs on CMP Platforms. In International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2001.
[23]
J. Whaley and C. Kozyrakis. Heuristics for Profile-Driven Method-Level Speculative Parallelism. In International Conference on Parallel Processing, pages 147--156, 2005.
[24]
A. Zhai, C. Colohan, J. Steffan, and T. Mowry. Compiler Optimization of Scalar Value Communication Between Speculative Threads. In International Conference on Architectural Support for Programming Languages and Operating Systems, October 2002.

Cited By

View all
  • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
  • (2024)Representing Data Collections in an SSA Form2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
March 2006
258 pages
ISBN:1595931899
DOI:10.1145/1122971
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. TLS compiler
  2. multi-core architecture
  3. prefetching
  4. profiling
  5. thread-level speculation

Qualifiers

  • Article

Conference

PPoPP06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
  • (2024)Representing Data Collections in an SSA Form2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
  • (2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
  • (2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
  • (2021)Intermediate Representations for Explicitly Parallel ProgramsACM Computing Surveys10.1145/345229954:5(1-24)Online publication date: 25-May-2021
  • (2021)Loopapalooza: Investigating Limits of Loop-Level Parallelism with a Compiler-Driven Approach2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00030(128-138)Online publication date: Mar-2021
  • (2021)Quantifying the Semantic Gap Between Serial and Parallel Programming2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00024(151-162)Online publication date: Nov-2021
  • (2021)Towards parallelism detection of sequential programs with graph neural networkFuture Generation Computer Systems10.1016/j.future.2021.07.001125:C(515-525)Online publication date: 1-Dec-2021
  • (2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media