poster

BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

Authors:

Mohammad Ali Ghodrat,

Glenn ReinmanAuthors Info & Claims

ISLPED '12: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

Pages 225 - 230

https://doi.org/10.1145/2333660.2333715

Published: 30 July 2012 Publication History

Abstract

As the number of on-chip accelerators grows rapidly to improve power-efficiency, the buffer size required by accelerators drastically increases. Existing solutions allow the accelerators to share a common pool of buffers or/and allocate buffers in cache. In this paper we propose a Buffer-in-NUCA (BiN) scheme with the following contributions: (1) a dynamic interval-based global buffer allocation method to assign shared buffer spaces to accelerators that can best utilize the additional buffer space, and (2) a flexible and low-overhead paged buffer allocation method to limit the impact of buffer fragmentation in a shared buffer, especially when allocating buffers in a non-uniform cache architecture (NUCA) with distributed cache banks. Experimental results show that, when compared to two representative schemes from the prior work, BiN improves performance by 32% and 35% and reduces energy by 12% and 29%, respectively.

References

[1]

C. Johnson et al. A wire-speed power^TM processor: 2.3ghz 45nm soi with 16 cores and 64 threads. ISSCC 2010.

[2]

L. Seiler et al. Larrabee: a many-core x86 architecture for visual computing. IEEE Micro, 29(1):10--21, 2009.

Digital Library

[3]

J. Cong et al. AXR-CMP: architecture support in accelerator-rich CMPs. Workshop on SoC Architecture, Accelerators and Workloads 2011.

[4]

J. Cong et al. Architecture support for accelerator-rich CMPs. DAC 2012.

Digital Library

[5]

ITRS 2007 system drivers. http://www.itrs.net/.

[6]

M. J. Lyonsy et al. The Accelerator Store: a shared memory framework for accelerator-based systems. ACM Trans. Architecture and Code Optimization, 8(4):48, 2012.

Digital Library

[7]

C. F. Fajardo et al. Buffer-Integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms. DAC 2011.

Digital Library

[8]

J. Cong et al. An energy-efficient adaptive hybrid cache. ISLPED 2011.

Digital Library

[9]

J. Cong et al. Combined loop transformation and hierarchy allocation for data reuse optimization. ICCAD 2011.

Digital Library

[10]

A. Bui et al. Platform characterization for domain-specific computing. ASPDAC 2012.

[11]

B. M. Beckmann et al. ASR: adaptive selective replication for CMP Caches. MICRO 2006.

Digital Library

[12]

S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. MICRO 2006.

Digital Library

[13]

J. Cong et al. A shared Buffer-in-NUCA management scheme for accelerator-rich CMPs. University of California, Los Angeles Computer Science Department Technical Report 120012, 2012.

[14]

M. Qureshi and Y. Patt. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. MICRO 2006.

Digital Library

[15]

HP Cacti, http://quid.hpl.hp.com:9081/cacti/.

[16]

P. S. Magnusson et al. Simics: a full system simulation platform. IEEE Trans. Computer, 35(2):50--58, 2002.

Digital Library

[17]

M. M. K. Martin et al. Multifacet's general execution-driven multiprocessor simulator toolset. ACM SIGARCH Computer Architecture News, 33(4):92--99, 2005.

Digital Library

[18]

J. Cong et al. High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 30(4):473--491, 2011.

Digital Library

[19]

S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. MICRO 2009.

Digital Library

Cited By

Sinha MHarsha GBhattacharyya PDeb S(2021)Design Space Optimization of Shared Memory Architecture in Accelerator-rich SystemsACM Transactions on Design Automation of Electronic Systems10.1145/344600126:4(1-31)Online publication date: 13-Mar-2021
https://dl.acm.org/doi/10.1145/3446001
Sinha MBhattacharyya PRout SPrakriya NDeb S(2021)Securing an Accelerator-rich System from Flooding-based Denial-of-Service AttacksIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.3049826(1-1)Online publication date: 2021
https://doi.org/10.1109/TETC.2021.3049826
Dadu VLiu SNowatzki TMartínez JDuato JJohn L(2021)PolyGraphProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00053(595-608)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00053
Show More Cited By

Index Terms

BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Reactive NUCA: near-optimal block placement and replication in distributed caches

Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...
Reactive NUCA: near-optimal block placement and replication in distributed caches
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

In chip multiprocessors (CMPs), data access latency depends on the memory hierarchy organization, the on-chip interconnect (NoC), and the running workload. Reducing data access latency is vital to achieving performance improvements and scalability of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '12: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

July 2012

438 pages

ISBN:9781450312493

DOI:10.1145/2333660

General Chairs:
Naresh Shanbhag
University of Illinois at Urbana-Champaign, USA
,
Massimo Poncino
Politecnico di Torino, Italy
,
Program Chairs:
Pai H. Chou
University of California, Irvine / NTHU
,
Ajith Amerasekera
Texas Instruments, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

ISLPED'12

Sponsor:

SIGDA

ISLPED'12: International Symposium on Low Power Electronics and Design

July 30 - August 1, 2012

California, Redondo Beach, USA

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
295
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sinha MHarsha GBhattacharyya PDeb S(2021)Design Space Optimization of Shared Memory Architecture in Accelerator-rich SystemsACM Transactions on Design Automation of Electronic Systems10.1145/344600126:4(1-31)Online publication date: 13-Mar-2021
https://dl.acm.org/doi/10.1145/3446001
Sinha MBhattacharyya PRout SPrakriya NDeb S(2021)Securing an Accelerator-rich System from Flooding-based Denial-of-Service AttacksIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.3049826(1-1)Online publication date: 2021
https://doi.org/10.1109/TETC.2021.3049826
Dadu VLiu SNowatzki TMartínez JDuato JJohn L(2021)PolyGraphProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00053(595-608)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00053
Teimouri NTabkhi HSchirner G(2019)Alleviating Scalability Limitation of Accelerator-Based PlatformsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.284663238:7(1317-1330)Online publication date: Jul-2019
https://doi.org/10.1109/TCAD.2018.2846632
Cong JFang ZHuang MWei PWu DYu C(2019)Customizable Computing—From Single Chip to DatacentersProceedings of the IEEE10.1109/JPROC.2018.2876372107:1(185-203)Online publication date: Jan-2019
https://doi.org/10.1109/JPROC.2018.2876372
Teimouri NTabkhi HSchirner GFanucci LTeich J(2016)Improving scalability of CMPs with dense ACCs coverageProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972183(1610-1615)Online publication date: 14-Mar-2016
https://dl.acm.org/doi/10.5555/2971808.2972183
Cota EMantovani PCarloni L(2016)Exploiting Private Local Memories to Reduce the Opportunity Cost of Accelerator IntegrationProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926258(1-12)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1145/2925426.2926258
Kalra PHussain SChaturvedi N(2016)An Investigation of Power-Performance Aware Accelerator/Core Allocation Challenges in Dark Silicon Heterogeneous Systems2016 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS)10.1109/iNIS.2016.023(52-55)Online publication date: Dec-2016
https://doi.org/10.1109/iNIS.2016.023
Chen YCong JGill MReinman GXiao B(2015)Customizable ComputingSynthesis Lectures on Computer Architecture10.2200/S00650ED1V01Y201505CAC03310:3(1-118)Online publication date: 6-Jul-2015
https://doi.org/10.2200/S00650ED1V01Y201505CAC033
Komuravelli RSinclair MAlsop JHuzaifa MKotsifakou MSrivastava PAdve SAdve V(2015)StashACM SIGARCH Computer Architecture News10.1145/2872887.275037443:3S(707-719)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750374
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten