research-article

DeNovoND: efficient hardware support for disciplined non-determinism

Authors:
Hyojin Sung

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Rakesh Komuravelli

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Sarita V. Adve

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsMarch 2013Pages 13–26https://doi.org/10.1145/2451116.2451119

Published:16 March 2013Publication History

ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Pages 13–26

ABSTRACT

Recent work has shown that disciplined shared-memory programming models that provide deterministic-by-default semantics can simplify both parallel software and hardware. Specifically, the DeNovo hardware system has shown that the software guarantees of such models (e.g., data-race-freedom and explicit side-effects) can enable simpler, higher performance, and more energy-efficient hardware than the current state-of-the-art for deterministic programs. Many applications, however, contain non-deterministic parts; e.g., using lock synchronization. For commercial hardware to exploit the benefits of DeNovo, it is therefore necessary to extend DeNovo to support non-deterministic applications.

This paper proposes DeNovoND, a system that supports lock-based, disciplined non-determinism, with the simplicity, performance, and energy benefits of DeNovo. We use a combination of distributed queue-based locks and access signatures to implement simple memory consistency semantics for safe non-determinism, with a coherence protocol that does not require transient states, invalidation traffic, or directories, and does not incur false sharing. The resulting system is simpler, shows comparable or better execution time, and has 33% less network traffic on average (translating directly into energy savings) relative to a state-of-the-art invalidation-based protocol for 8 applications designed for lock synchronization.

References

S. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, Aug. 2010. Google ScholarDigital Library
S. Adve and M. Hill. Weak Ordering - A New Definition. In ISCA, 1990. Google ScholarDigital Library
N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha. GARNET: A Detailed On-Chip Network Model inside a Full-System Simulator. In ISPASS, 2009.Google ScholarCross Ref
M. Allen, S. Sridharan, and G. Sohi. Serialization Sets: A Dynamic Dependence-based Parallel Execution Model. In PPoPP, 2009. Google ScholarDigital Library
Z. Anderson, D. Gay, R. Ennals, and E. Brewer. SharC: Checking Data Sharing Strategies for Multithreaded C. In PLDI, 2008. Google ScholarDigital Library
E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, 2009. Google ScholarDigital Library
B. Bershad, M. Zekauskas, and W. Sawdon. The Midway Distributed Shared Memory System. In Compcon Digest of Papers., 1993.Google ScholarCross Ref
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, 2011. Google ScholarDigital Library
B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM, 13:422--426, 1970. Google ScholarDigital Library
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP, 1995. Google ScholarDigital Library
R. Bocchino, Jr., V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, 2009. Google ScholarDigital Library
R. L. Bocchino, Jr., S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. Safe Nondeterminism in a Deterministic-by-Default Parallel Language. In POPL, 2011. Google ScholarDigital Library
F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An Improved Construction for Counting Bloom Filters. In ESA, 2006. Google ScholarDigital Library
Z. Budimlić, M. Burke, V. Cavé, K. Knobe, G. Lowney, R. Newton, J. Palsberg, D. Peixotto, V. Sarkar, F. Schlimbach, and S. Taşirlar. Concurrent Collections. Sci. Program., 18(3--4), Aug. 2010. Google ScholarDigital Library
J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In STOC, 1977. Google ScholarDigital Library
L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk Disambiguation of Speculative Threads in Multiprocessors. In ISCA, 2006. Google ScholarDigital Library
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism. In PACT, 2011. Google ScholarDigital Library
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009. Google ScholarDigital Library
A. Ghuloum, E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A Flexible Parallel Programming Model for Tera-Scale Architectures, 2007.Google Scholar
J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors. In ASPLOS, 1989. Google ScholarDigital Library
D. Hackenberg, D. Molka, and W. E. Nagel. Comparing Cache Architectures and Coherency Protocols on x86--64 Multicore SMP Systems. In MICRO. IEEE, 2009. Google ScholarDigital Library
L. Iftode, J. P. Singh, and K. Li. Scope Consistency: A Bridge between Release Consistency and Entry Consistency. In SPAA, 1996. Google ScholarDigital Library
Intel. The SCC Platform Overview, 2010.Google Scholar
A. Kagi, D. Burger, and J. R. Goodman. Efficient Synchronization: Let Them Eat QOLB. In ISCA, 1997. Google ScholarDigital Library
S. Kaxiras and G. Keramidas. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power. IEEE Micro, 2010. Google ScholarDigital Library
S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the Future of Parallel Computing. IEEE Micro, 31:7--17, 2011. Google ScholarDigital Library
P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory. In ISCA, 1992. Google ScholarDigital Library
J. H. Kelm, D. R. Johnson, M. R. Johnson, N. C. Crago, W. Tuohy, A. Mahesri, S. S. Lumetta, M. I. Frank, and S. J. Patel. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In ISCA, 2009. Google ScholarDigital Library
J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel. Cohesion: A Hybrid Memory Model for Accelerators. In ISCA, 2010. Google ScholarDigital Library
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI, 2007. Google ScholarDigital Library
A. Lebeck and D. Wood. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In ISCA, 1995. Google ScholarDigital Library
E. A. Lee. The Problem with Threads. IEEE Computer, 39(5), 2006. Google ScholarDigital Library
F. X. Lin, Z. Wang, R. LiKamWa, and L. Zhong. Reflex: Using Low-Power Processors in Smartphones without Knowing Them. In ASPLOS, 2012. Google ScholarDigital Library
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. Computer, 35:50--58, 2002. Google ScholarDigital Library
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News, 2005. Google ScholarDigital Library
S. L. Min and J.-L. Baer. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps. TPDS, 1992. Google ScholarDigital Library
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IISWC, 2008.Google Scholar
M. Mitzenmacher. Compressed Bloom Filters. In PODC, 2001. Google ScholarDigital Library
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009. Google ScholarDigital Library
Oracle. Java Language and Virtual Machine Specifications.Google Scholar
S. H. Pugsley, J. B. Spjut, D. W. Nellans, and R. Balasubramonian. SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches. In PACT, 2010. Google ScholarDigital Library
D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing Signatures for Transactional Memory. In MICRO, 2007. Google ScholarDigital Library
A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible Decoupled Transactional Memory Support. In ISCA, 2008. Google ScholarDigital Library
D. Vantrease, M. H. Lipasti, and N. Binkert. Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. In HPCA, 2011. Google ScholarDigital Library
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA, 1995. Google ScholarDigital Library
L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling Hardware Transactional Memory from Caches. In HPCA, 2007. Google ScholarDigital Library

Index Terms

DeNovoND: efficient hardware support for disciplined non-determinism
1. Computer systems organization
  1. Architectures
    1. Parallel architectures

Recommendations

DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

Current shared-memory hardware is complex and inefficient. Prior work on the DeNovo coherence protocol showed that disciplined shared-memory programming models can enable more complexity-, performance-, and energy-efficient hardware than the state-of-...
Read More
DeNovoND: efficient hardware support for disciplined non-determinism
ASPLOS '13

Recent work has shown that disciplined shared-memory programming models that provide deterministic-by-default semantics can simplify both parallel software and hardware. Specifically, the DeNovo hardware system has shown that the software guarantees of ...
Read More
DeNovoND: efficient hardware support for disciplined non-determinism
ASPLOS '13

Recent work has shown that disciplined shared-memory programming models that provide deterministic-by-default semantics can simplify both parallel software and hardware. Specifically, the DeNovo hardware system has shown that the software guarantees of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
March 2013
574 pages
ISBN:9781450318709
DOI:10.1145/2451116
General Chair:
Vivek Sarkar
Rice University, USA
,
Program Chair:
Rastislav Bodik
University of California, Berkeley, USA
ACM SIGARCH Computer Architecture News Volume 41, Issue 1
ASPLOS '13
March 2013
540 pages
ISSN:0163-5964
DOI:10.1145/2490301
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 48, Issue 4
ASPLOS '13
April 2013
540 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2499368
Issue’s Table of Contents
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 March 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache coherence
disciplined parallelism
memory consistency
non-determinism
shared memory
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Upcoming Conference
ASPLOS '24

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 27 - May 1, 2024

La Jolla , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 466
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DeNovoND: efficient hardware support for disciplined non-determinism

ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations

DeNovoND: efficient hardware support for disciplined non-determinism

DeNovoND: efficient hardware support for disciplined non-determinism