skip to main content
10.1145/2333660.2333715acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
poster

BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

Authors Info & Claims
Published:30 July 2012Publication History

ABSTRACT

As the number of on-chip accelerators grows rapidly to improve power-efficiency, the buffer size required by accelerators drastically increases. Existing solutions allow the accelerators to share a common pool of buffers or/and allocate buffers in cache. In this paper we propose a Buffer-in-NUCA (BiN) scheme with the following contributions: (1) a dynamic interval-based global buffer allocation method to assign shared buffer spaces to accelerators that can best utilize the additional buffer space, and (2) a flexible and low-overhead paged buffer allocation method to limit the impact of buffer fragmentation in a shared buffer, especially when allocating buffers in a non-uniform cache architecture (NUCA) with distributed cache banks. Experimental results show that, when compared to two representative schemes from the prior work, BiN improves performance by 32% and 35% and reduces energy by 12% and 29%, respectively.

References

  1. C. Johnson et al. A wire-speed powerTM processor: 2.3ghz 45nm soi with 16 cores and 64 threads. ISSCC 2010.Google ScholarGoogle Scholar
  2. L. Seiler et al. Larrabee: a many-core x86 architecture for visual computing. IEEE Micro, 29(1):10--21, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Cong et al. AXR-CMP: architecture support in accelerator-rich CMPs. Workshop on SoC Architecture, Accelerators and Workloads 2011.Google ScholarGoogle Scholar
  4. J. Cong et al. Architecture support for accelerator-rich CMPs. DAC 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ITRS 2007 system drivers. http://www.itrs.net/.Google ScholarGoogle Scholar
  6. M. J. Lyonsy et al. The Accelerator Store: a shared memory framework for accelerator-based systems. ACM Trans. Architecture and Code Optimization, 8(4):48, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. F. Fajardo et al. Buffer-Integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms. DAC 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cong et al. An energy-efficient adaptive hybrid cache. ISLPED 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cong et al. Combined loop transformation and hierarchy allocation for data reuse optimization. ICCAD 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Bui et al. Platform characterization for domain-specific computing. ASPDAC 2012.Google ScholarGoogle Scholar
  11. B. M. Beckmann et al. ASR: adaptive selective replication for CMP Caches. MICRO 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. MICRO 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cong et al. A shared Buffer-in-NUCA management scheme for accelerator-rich CMPs. University of California, Los Angeles Computer Science Department Technical Report 120012, 2012.Google ScholarGoogle Scholar
  14. M. Qureshi and Y. Patt. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. MICRO 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. HP Cacti, http://quid.hpl.hp.com:9081/cacti/.Google ScholarGoogle Scholar
  16. P. S. Magnusson et al. Simics: a full system simulation platform. IEEE Trans. Computer, 35(2):50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. M. K. Martin et al. Multifacet's general execution-driven multiprocessor simulator toolset. ACM SIGARCH Computer Architecture News, 33(4):92--99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Cong et al. High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 30(4):473--491, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. MICRO 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISLPED '12: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
      July 2012
      438 pages
      ISBN:9781450312493
      DOI:10.1145/2333660

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 July 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate398of1,159submissions,34%

      Upcoming Conference

      ISLPED '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader