skip to main content
10.1145/1854273.1854294acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

SPACE: sharing pattern-based directory coherence for multicore scalability

Published:11 September 2010Publication History

ABSTRACT

An important challenge in multicore processors is the maintenance of cache coherence in a scalable manner. Directory-based protocols save bandwidth and achieve scalability by associating information about sharer cores with every cache block. As the number of cores and cache sizes increase, the directory itself adds significant area and energy overheads.

In this paper, we propose SPACE, a directory design based on recognizing and representing the subset of sharing patterns present in an application. SPACE takes advantage of the observation that many memory locations in an application are accessed by the same set of processors, resulting in a few sharing patterns that occur frequently. The sharing pattern of a cache block is the bit vector representing the processors that share the block. SPACE decouples the sharing pattern from each cache block and holds them in a separate directory table. Multiple cache lines that have the same sharing pattern point to a common entry in the directory table. In addition, when the table capacity is exceeded, patterns that are similar to each other are dynamically collated into a single entry.

Our results show that overall, SPACE is within 2% of the performance of a conventional directory. When compared to coarse vector directories, dynamically collating similar patterns eliminates more false sharers. Our experimentation also reveals that a small directory table (256-512 entries) can handle the access patterns in many applications, with the SPACE directory table size being O(P) and requiring a pointer per cache line whose size is O(log2P). Specifically, SPACE requires E44% of the area of a conventional directory at 16 processors and 25% at 32 processors.

References

  1. }}M. E. Acacio, J. Gonzalez, J. M. Garcia, and J. Duato. A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Trans. Parallel Distrib. Syst., 16(1):67--79, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz. An evaluation of directory schemes for cache coherence. In ISCA '88: Proceedings of the 15th Annual International Symposium on Computer architecture, pages 280--298, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}A. Ahmed, P. Conway, B. Hughes, and F. Weber. AMD opteron shared memory mp systems. In Proceedings of the 14th HotChips Symposium, 2002.Google ScholarGoogle Scholar
  4. }}A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, M. D. Hill, D. A. Wood, and D. J. Sorin. Simulating a $2m commercial server on a $2k pc. Computer, 36(2):50--57, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}G. Buehrer, S. Parthasarathy, and Y. Chen. Adaptive parallel graph mining for CMP architectures. In Proceedings of the Sixth International Conference on Data Mining, pages 97--106, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}L. M. Censier and P. Feautrier. A new solution to coherence problems in multicache systems. IEEE Transactions on Computers, 27:1112--1118, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}D. Chaiken, J. Kubiatowicz, and A. Agarwal. LimitLESS directories: A scalable cache coherence scheme. In Proceedings of the 4th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 224--234, Apr. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}J. H. Choi and K. H. Park. Segment directory enhancing the limited directory cache coherence schemes. In Proc. 13th International Parallel and Distributed Processing Symp., pages 258--267, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}A. Gupta, W. dietrich Weber, and T. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In International Conference on Parallel Processing, pages 312--321, 1990.Google ScholarGoogle Scholar
  10. }}Intel Corporation. Intel Core Duo Processor and Intel Core Solo Processor on 65 nm Process. http://download.intel.com/design/mobile/datashts/30922106.pdf, Jan 2007.Google ScholarGoogle Scholar
  11. }}J. Laudon and D. Lenoski. The SGI origin: a ccNUMA highly scalable server. SIGARCH Comput. Archit. News, 25(2):241--251, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th International Symposium on Microarchitecture, pages 3--14, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}B. W. O'Krafka and A. R. Newton. An empirical evaluation of two memory-efficient directory methods. In ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pages 138--147, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}R. T. Simoni, Jr. Cache coherence directories for scalable multiprocessors. PhD thesis, Stanford University, Stanford, CA, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}Sun Microsystems, Inc. Opensparc T2 system-on-chip (SoC) microarchitecture specification. http://www.opensparc.net/opensparc-t2/index.html, May 2008.Google ScholarGoogle Scholar
  18. }}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. Methodological considerations and characterization of the SPLASH-2 parallel application suite. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995.Google ScholarGoogle ScholarCross RefCross Ref
  19. }}J. Zebchuk, V. Srinivasan, M. K. Qureshi, and A. Moshovos. A tagless coherence directory. In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--434, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}C. Zilles. Brief announcement: Transactional memory and the birthday paradox. In 19th ACM Symposium on Parallelism in Algorithms and Architectures, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SPACE: sharing pattern-based directory coherence for multicore scalability

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
        September 2010
        596 pages
        ISBN:9781450301787
        DOI:10.1145/1854273

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 September 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate121of471submissions,26%

        Upcoming Conference

        PACT '24
        International Conference on Parallel Architectures and Compilation Techniques
        October 14 - 16, 2024
        Southern California , CA , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader