skip to main content
10.1145/2464996.2465015acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Address-aware fences

Authors Info & Claims
Published:10 June 2013Publication History

ABSTRACT

Many modern multicore architectures support shared memory for ease of programming and relaxed memory models to deliver high performance. With relaxed memory models, memory accesses can be reordered dynamically and seen by other processors. Therefore, fence instructions are provided to enforce the memory orderings that are critical to the correctness of a program. However, fence instructions are costly as they cause the processor to stall. Prior works have observed that most of the executions of fence instructions are unnecessary. In this paper we propose address-aware fence, a hardware solution for reducing the overhead of fence instructions without resorting to speculation. Address-aware fence only enforces memory orderings that are necessary to maintain the effect that the traditional fence strives to enforce. This is achieved by dynamically checking a condition for when an execution of a fence must take effect and delay the memory accesses following the fence. When a fence instruction is encountered, first, necessary memory addresses are collected to form a watchlist, and then, only the memory accesses to addresses that are contained in the watchlist are delayed. The memory accesses whose addresses are not contained in the watchlist are allowed to complete without waiting for the completion of pending memory accesses from before the fence. Our experiments conducted on a group of concurrent lock-free algorithms and SPLASH-2 benchmarks show that address-aware fence eliminates nearly all the overhead due to fences and achieves an average improvement of 12.2\% on programs with traditional fences.

References

  1. S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66--76, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. Thin locks: featherweight synchronization for Java. PLDI '98, pages 258--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. ISCA '09, pages 233--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Burckhardt, R. Alur, and M. M. K. Martin. Checkfence: checking consistency of concurrent data types on relaxed memory models. PLDI '07, pages 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. ISCA '07, pages 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chase and Y. Lev. Dynamic circular work-stealing deque. SPAA '05, pages 21--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Dice, Y. Lev, M. Moir, and D. Nussbaum. Early experience with a commercial hardware transactional memory implementation. ASPLOS '09, pages 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. W. Dijkstra. Cooperating sequential processes. The origin of concurrent programming: from semaphores to remote procedure calls, pages 65--138, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Duan, X. Feng, L. Wang, C. Zhang, and P.-C. Yew. Detecting and eliminating potential violations of sequential consistency for concurrent C/C++ programs. CGO '09, pages 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Fang, J. Lee, and S. P. Midkiff. Automatic fence insertion for shared memory multiprocessing. ICS '03, pages 285--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Gharachorloo, A. Gupta, and J. Hennessy. Performance evaluation of memory consistency models for shared-memory multiprocessors. ASPLOS '91, pages 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. ISCA '91, pages 355--364.Google ScholarGoogle Scholar
  14. C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. PACT '02, pages 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC+ ILP = RC' ISCA '99, pages 162--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Kawachiya, A. Koseki, and T. Onodera. Lock reservation: Java locks can mostly do without atomic operations. OOPSLA '02, pages 130--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Ladan-Mozes, I.-T. A. Lee, and D. Vyukov. Location-based memory fences. SPAA '11, pages 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Lamport. Specifying concurrent program modules. ACM Trans. Program. Lang. Syst., 5(2):190--222, Apr. 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Lee and D. A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Trans. Comput., 50(8):824--833, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Lin, V. Nagarajan, and R. Gupta. Efficient sequential consistency using conditional fences. PACT '10, pages 295--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Lin, V. Nagarajan, R. Gupta, and B. Rajaram. Efficient sequential consistency via conflict ordering. ASPLOS '12, pages 273--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Liu, N. Nedev, N. Prisadnikov, M. Vechev, and E. Yahav. Dynamic synthesis for relaxed memory models. PLDI '12, pages 429--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H.-J. Boehm. Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races. ISCA '10, pages 210--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. PLDI '05, pages 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: a simple and efficient memory model for concurrent programming languages. PLDI '10, pages 351--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. M. Michael. Scalable lock-free dynamic memory allocation. PLDI '04, pages 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. PODC '96, pages 267--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Ogasawara, H. Komatsu, and T. Nakatani. To-lock: Removing lock overhead using the owners' temporal locality. PACT '04, pages 255--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Rajwar and J. R. Goodman. Speculative lock elision: enabling highly concurrent multithreaded execution. MICRO '01, pages 294--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282--312, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Singh, D. Marino, S. Narayanasamy, T. Millstein, and M. Musuvathi. Efficient processor support for DRFx, a memory model with exceptions. ASPLOS '11, pages 53--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi. End-to-end sequential consistency. ISCA '12, pages 524--535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. PACT '09, pages 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Vasudevan, K. S. Namjoshi, and S. A. Edwards. Simple and fast biased locks. PACT '10, pages 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. von Praun, H. W. Cain, J.-D. Choi, and K. D. Ryu. Conditional memory ordering. ISCA '06, pages 41--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. ISCA '07, pages 266--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. ISCA '95, pages 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Address-aware fences

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
      June 2013
      512 pages
      ISBN:9781450321303
      DOI:10.1145/2464996

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICS '13 Paper Acceptance Rate43of202submissions,21%Overall Acceptance Rate584of2,055submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader