skip to main content
10.1145/1669112.1669131acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Published: 12 December 2009 Publication History

Abstract

A platform that supported Sequential Consistency (SC) for all codes --- not only the well-synchronized ones --- would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well.
This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a whole-system SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups.

References

[1]
S. V. Adve and K. Gharachorloo. Shared Memory Consistency Models: A Tutorial. Western Reseach Laboratory-Compaq. Research Report 95/7, September 1995.
[2]
E. Berk. JLex: A Lexical Analyzer Generator for Java. http://www.cs.princeton.edu/~appel/modern/java/JLex/.
[3]
C. Blundell, M. M. Martin, and T. F. Wenisch. InvisiFence: Performance-Transparent Memory Ordering in Conventional Multiprocessors. In International Symposium on Computer Architecture, June 2009.
[4]
H. Cain and M. Lipasti. Memory Ordering: A Value-Based Approach. In International Symposium on Computer Architecture, June 2004.
[5]
B. Carlstrom et al. Transactional Execution of Java Programs. In Workshop on Synchronization and Concurrency in Object-Oriented Languages, October 2005.
[6]
L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk Enforcement of Sequential Consistency. In International Symposium on Computer Architecture, June 2007.
[7]
S. Chaudhry et al. Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor. In International Symposium on Computer Architecture, June 2009.
[8]
J. Dehnert et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. In International Symposium on Code Generation and Optimization, March 2003.
[9]
D. Dice et al. Applications of the Adaptive Transactional Memory Test Platform. In Workshop on Transactional Computing, February 2008.
[10]
X. Fang, J. Lee, and S. P. Midkiff. Automatic Fence Insertion for Shared Memory Multiprocessing. In International Conference on Supercomputing, June 2003.
[11]
C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In International Symposium on Computer Architecture, May 1999.
[12]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional Memory Coherence and Consistency. In International Symposium on Computer Architecture, June 2004.
[13]
A. Kamil, J. Su, and K. A. Yelick. Making Sequential Consistency Practical in Titanium. In International Conference on Supercomputing, November 2005.
[14]
A. Krishnamurthy and K. A. Yelick. Analyses and Optimizations for Shared Address Space Programs. Journal of Parallel and Distributed Computing, November 1996.
[15]
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, September 1979.
[16]
K. Lee and S. P. Midkiff. A Two-Phase Escape Analysis for Parallel Java Programs. In International Conference on Parallel Architectures and Compilation Techniques, September 2006.
[17]
B. Liblit, A. Aiken, and K. A. Yelick. Type Systems for Distributed Data Sharing. In International Static Analysis Symposium, June 2003.
[18]
M. Martin, C. Blundell, and E. Lewis. Subtleties of Transactional Memory Atomicity Semantics. IEEE Computer Architecture Letters, July 2006.
[19]
M. Musuvathi and S. Qadeer. Iterative Context Bounding for Systematic Testing of Multithreaded Programs. In International Symposium on Programming Language Design and Implementation, June 2007.
[20]
G. Naumovich and G. Avrunin. A Conservative Data Flow Algorithm for Detecting All Pairs of Statements that May Happen in Parallel. In International Symposium on Foundations of Software Engineering, November 1998.
[21]
N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. Zilles. Hardware Atomicity for Reliable Software Speculation. In International Symposium on Computer Architecture, June 2007.
[22]
M. Paleczny, C. Vick, and C. Click. The Java HotspotTM Server Compiler. In Symposium on JavaTM Virtual Machine Research and Technology Symposium, April 2001.
[23]
R. Rajwar and J. Goodman. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In International Symposium on Microarchitecture, December 2001.
[24]
C. J. Rossbach, O. S. Hofmann, D. E. Porter, H. E. Ramadan, B. Aditya, and E. Witchel. TxLinux: Using and Managing Hardware Transactional Memory in an Operating System. In Symposium on Operating Systems Principles, October 2007.
[25]
K. Russell and D. Detlefs. Eliminating Synchronization-Related Atomic Operations with Biased Locking and Bulk Rebiasing. In Conference on Object-Oriented Programming Systems, Languages, and Applications, October 2006.
[26]
D. Shasha and M. Snir. Efficient and Correct Execution of Parallel Programs that Share Memory. Transactions on Programming Languages and Systems, April 1988.
[27]
Sun Microsystems. OpenJDK. http://openjdk.java.net/.
[28]
Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua. Compiler Techniques for High Performance Sequentially Consistent Java Programs. In Symposium on Principles and Practice of Parallel Programming, June 2005.
[29]
E. Vallejo, M. Galluzzi, A. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J. Smith, and M. Valero. Implementing Kilo-Instruction Multiprocessors. International Conference on Pervasive Services, July 2005.
[30]
Virtutech. Simics. http://www.simics.net/.
[31]
C. von Praun and T. R. Gross. Static Conflict Analysis for Multi-Threaded Object-Oriented Programs. In Conference on Programming Language Design and Implementation, June 2003.
[32]
T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for Store-Wait-Free Multiprocessors. In International Symposium on Computer Architecture, June 2007.
[33]
L. Ziarek et al. A Uniform Transactional Execution Environment for Java. In European Conference on Object-Oriented Programming, July 2008.

Cited By

View all
  • (2021)Safe-by-default Concurrency for Modern Programming LanguagesACM Transactions on Programming Languages and Systems10.1145/346220643:3(1-50)Online publication date: 3-Sep-2021
  • (2019)Dependence-aware, unbounded sound predictive race detectionProceedings of the ACM on Programming Languages10.1145/33606053:OOPSLA(1-30)Online publication date: 10-Oct-2019
  • (2019)Accelerating sequential consistency for Java with speculative compilationProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314611(16-30)Online publication date: 8-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
December 2009
601 pages
ISBN:9781605587981
DOI:10.1145/1669112
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. atomic region
  2. chunk-based architecture
  3. compiler optimization
  4. sequential consistency

Qualifiers

  • Research-article

Funding Sources

Conference

Micro-42
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Safe-by-default Concurrency for Modern Programming LanguagesACM Transactions on Programming Languages and Systems10.1145/346220643:3(1-50)Online publication date: 3-Sep-2021
  • (2019)Dependence-aware, unbounded sound predictive race detectionProceedings of the ACM on Programming Languages10.1145/33606053:OOPSLA(1-30)Online publication date: 10-Oct-2019
  • (2019)Accelerating sequential consistency for Java with speculative compilationProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314611(16-30)Online publication date: 8-Jun-2019
  • (2019)NoMap: Speeding-Up JavaScript Using Hardware Transactional Memory2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00054(412-425)Online publication date: Feb-2019
  • (2018)High-coverage, unbounded sound predictive race detectionACM SIGPLAN Notices10.1145/3296979.319238553:4(374-389)Online publication date: 11-Jun-2018
  • (2018)Unconventional Parallelization of Nondeterministic ApplicationsACM SIGPLAN Notices10.1145/3296957.317318153:2(432-447)Online publication date: 19-Mar-2018
  • (2018)High-coverage, unbounded sound predictive race detectionProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192385(374-389)Online publication date: 11-Jun-2018
  • (2018)Unconventional Parallelization of Nondeterministic ApplicationsProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173181(432-447)Online publication date: 19-Mar-2018
  • (2017)Legato: end-to-end bounded region serializability using commodity hardware transactional memoryProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049834(1-13)Online publication date: 4-Feb-2017
  • (2017)Avoiding consistency exceptions under strong memory modelsACM SIGPLAN Notices10.1145/3156685.309227152:9(115-127)Online publication date: 18-Jun-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media