skip to main content
10.1145/2451116.2451119acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

DeNovoND: efficient hardware support for disciplined non-determinism

Authors Info & Claims
Published:16 March 2013Publication History

ABSTRACT

Recent work has shown that disciplined shared-memory programming models that provide deterministic-by-default semantics can simplify both parallel software and hardware. Specifically, the DeNovo hardware system has shown that the software guarantees of such models (e.g., data-race-freedom and explicit side-effects) can enable simpler, higher performance, and more energy-efficient hardware than the current state-of-the-art for deterministic programs. Many applications, however, contain non-deterministic parts; e.g., using lock synchronization. For commercial hardware to exploit the benefits of DeNovo, it is therefore necessary to extend DeNovo to support non-deterministic applications.

This paper proposes DeNovoND, a system that supports lock-based, disciplined non-determinism, with the simplicity, performance, and energy benefits of DeNovo. We use a combination of distributed queue-based locks and access signatures to implement simple memory consistency semantics for safe non-determinism, with a coherence protocol that does not require transient states, invalidation traffic, or directories, and does not incur false sharing. The resulting system is simpler, shows comparable or better execution time, and has 33% less network traffic on average (translating directly into energy savings) relative to a state-of-the-art invalidation-based protocol for 8 applications designed for lock synchronization.

References

  1. S. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Adve and M. Hill. Weak Ordering - A New Definition. In ISCA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha. GARNET: A Detailed On-Chip Network Model inside a Full-System Simulator. In ISPASS, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Allen, S. Sridharan, and G. Sohi. Serialization Sets: A Dynamic Dependence-based Parallel Execution Model. In PPoPP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Anderson, D. Gay, R. Ennals, and E. Brewer. SharC: Checking Data Sharing Strategies for Multithreaded C. In PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Bershad, M. Zekauskas, and W. Sawdon. The Midway Distributed Shared Memory System. In Compcon Digest of Papers., 1993.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM, 13:422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Bocchino, Jr., V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. L. Bocchino, Jr., S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. Safe Nondeterminism in a Deterministic-by-Default Parallel Language. In POPL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An Improved Construction for Counting Bloom Filters. In ESA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. Budimlić, M. Burke, V. Cavé, K. Knobe, G. Lowney, R. Newton, J. Palsberg, D. Peixotto, V. Sarkar, F. Schlimbach, and S. Taşirlar. Concurrent Collections. Sci. Program., 18(3--4), Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In STOC, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk Disambiguation of Speculative Threads in Multiprocessors. In ISCA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism. In PACT, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Ghuloum, E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A Flexible Parallel Programming Model for Tera-Scale Architectures, 2007.Google ScholarGoogle Scholar
  20. J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors. In ASPLOS, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Hackenberg, D. Molka, and W. E. Nagel. Comparing Cache Architectures and Coherency Protocols on x86--64 Multicore SMP Systems. In MICRO. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Iftode, J. P. Singh, and K. Li. Scope Consistency: A Bridge between Release Consistency and Entry Consistency. In SPAA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Intel. The SCC Platform Overview, 2010.Google ScholarGoogle Scholar
  24. A. Kagi, D. Burger, and J. R. Goodman. Efficient Synchronization: Let Them Eat QOLB. In ISCA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Kaxiras and G. Keramidas. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power. IEEE Micro, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the Future of Parallel Computing. IEEE Micro, 31:7--17, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory. In ISCA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. H. Kelm, D. R. Johnson, M. R. Johnson, N. C. Crago, W. Tuohy, A. Mahesri, S. S. Lumetta, M. I. Frank, and S. J. Patel. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel. Cohesion: A Hybrid Memory Model for Accelerators. In ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Lebeck and D. Wood. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. A. Lee. The Problem with Threads. IEEE Computer, 39(5), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. X. Lin, Z. Wang, R. LiKamWa, and L. Zhong. Reflex: Using Low-Power Processors in Smartphones without Knowing Them. In ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. Computer, 35:50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. L. Min and J.-L. Baer. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps. TPDS, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IISWC, 2008.Google ScholarGoogle Scholar
  38. M. Mitzenmacher. Compressed Bloom Filters. In PODC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Oracle. Java Language and Virtual Machine Specifications.Google ScholarGoogle Scholar
  41. S. H. Pugsley, J. B. Spjut, D. W. Nellans, and R. Balasubramonian. SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches. In PACT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing Signatures for Transactional Memory. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible Decoupled Transactional Memory Support. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Vantrease, M. H. Lipasti, and N. Binkert. Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. In HPCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling Hardware Transactional Memory from Caches. In HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DeNovoND: efficient hardware support for disciplined non-determinism

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
      March 2013
      574 pages
      ISBN:9781450318709
      DOI:10.1145/2451116
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
        ASPLOS '13
        March 2013
        540 pages
        ISSN:0163-5964
        DOI:10.1145/2490301
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 4
        ASPLOS '13
        April 2013
        540 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499368
        Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 March 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate535of2,713submissions,20%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader