research-article

Locality aware memory assignment and tiling

Authors:

Hamed TabkhiAuthors Info & Claims

DAC '18: Proceedings of the 55th Annual Design Automation Conference

Article No.: 130, Pages 1 - 6

https://doi.org/10.1145/3195970.3196070

Published: 24 June 2018 Publication History

Abstract

With the trend toward specialization, an efficient memory-path design is vital to capitalize customization in data-path. A monolithic memory hierarchy is often highly inefficient for irregular applications, traditionally targeted for CPUs. New approaches and tools are required to offer application-specific memory customization combining the benefits of cache and scratchpad memory simultaneously.

This paper introduces a novel approach for automated application-specific on-chip memory assignment and tiling. The approach offers two major tools: (1) static memory access analysis and (2) variable-level memory assignment. Static memory analysis performs at the LLVM abstraction. It extracts target-independent pointer behaviors, measures the access strides and analyze the prefetchability of variables. (2) variable-level memory assignment creates a memory allocation graph for memory assignment (cache vs. scratchpad) based on the variables size and their estimated locality. It also explores the opportunity for tiling memory access. For the exploration and results, this paper uses Machsuite benchmarks (with both regular & irregular memory access behaviors), and gem5-Aladdin tool for performance & power evaluation. The proposed approach optimizes the memory hierarchy by automatically combining the benefits of cache, (tiled-) scratchpad at variable level granularity per individual applications. The results demonstrate more than 45% improvement in our power-stall product, on average, over the monolithic cache or scratchpad design.

References

[1]

D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley, G. Haugou, F. Clermidy, and D. Dutoit, "Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications," in Proceedings of the 49th Annual Design Automation Conference. ACM, 2012, pp. 1137--1142.

Digital Library

[2]

H. Tabkhi, R. Bushey, and G. Schirner, "Function-level processor (flp): A novel processor class for efficient processing of streaming applications," Journal of Signal Processing Systems, vol. 85, no. 3, pp. 287--306, 2016.

Digital Library

[3]

J. Cong, M. Ghodrat, M. Gill, B. Grigorian, and G. Reinman, "Architecture support for accelerator-rich CMPs," in Design Automation Conference (DAC), 2012, pp. 843--849.

Digital Library

[4]

Y. S. Shao, S. Xi, V. Srinivasan, G.-Y. Wei, and D. Brooks, "Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin," in The 49th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.

Digital Library

[5]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011. {Online}. Available

Digital Library

[6]

Y. S. Shao, B. Reagan, G.-Y. Wei, and D. Brooks, "Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures," in ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), 2014.

Digital Library

[7]

J. Cong, Z. Fang, M. Gill, and G. Reinman, "Parade: A cycle-accurate full-system simulation platform for accelerator-rich architectural design and exploration," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015.

Digital Library

[8]

B. Reagen, R. Adolf, Y. S. Shao, G.-Y. Wei, and D. Brooks, "MachSuite: Benchmarks for accelerator design and customized architectures," in Proceedings of the IEEE International Symposium on Workload Characterization, Raleigh, North Carolina, October 2014.

[9]

F. Piovezan, T. E. M. Crocomo, and L. C. V. dos Santos, "Cache sizing for low-energy elliptic curve cryptography," in 29th Symposium on Integrated Circuits and Systems Design (SBCCI), 2016.

Digital Library

[10]

G. Wang, L. Ju, Z. Jia, and X. Li, "Data allocation for embedded systems with hybrid on-chip scratchpad and caches," in IEEE International Conference on High Performance Computing and Communications, 2013, pp. 366--373.

[11]

J. Sancho and D. Kerbyson, "Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE," in International Symposium on Parallel and Distributed Processing (ISPDP), 2008, pp. 1--12.

[12]

L. Wu and W. Zhang, "Cache-aware spm allocation algorithms for hybrid spmcache architectures," in Sixteenth International Symposium on Quality Electronic Design, March 2015, pp. 123--129.

[13]

R. Hou, L. Zhang, M. Huang, K. Wang, H. Franke, Y. Ge, and X. Chang, "Efficient data streaming with on-chip accelerators: Opportunities and challenges," in High Performance Computer Architecture (HPCA), 2011, pp. 312--320.

Digital Library

[14]

B. Reagan, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernandez-Lobato, G.-Y. Wei, and D. Brooks, "Minerva: Enabling low-power, highly-accurate deep neural network accelerators," in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016.

Digital Library

[15]

M. Qiu, Z. Chen, Z. Ming, and J. Niu, "Energy-Aware Data Allocation With Hybrid Memory for Mobile Cloud Systems," in IEEE SYSTEMS JOURNAL, VOL. 11, NO. 2, 2017, pp. 813--822.

[16]

C. Song, L. Ju, and Z. Jia, "Hybrid scratchpad and cache memory management for energy-efficient parallel hevc encoding," in 33rd IEEE International Conference on Computer Design (ICCD), 2015, pp. 712--719.

Digital Library

[17]

J. Cong, P. Li, B. Xiao, and P. Zhang, "An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers," in 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), June 2014, pp. 1--6.

Digital Library

[18]

Y. T. Chen, J. Cong, J. Lei, and P. Wei, "A novel high-throughput acceleration engine for read alignment," in 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, May 2015, pp. 199--202.

Digital Library

[19]

J. Weinberg, M. O. McCracken, E. Strohmaier, and A. Snavely, "Quantifying locality in the memory access patterns of hpc applications," in Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, Nov 2005, pp. 50--50.

Digital Library

Cited By

Kandemir MTang XKotra JKarakoy MMitra TYoung EXiong J(2022)Fine-Granular Computation and Data Layout Reorganization for Improving LocalityProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549386(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549386
Purkayastha ARogers SShiddibhavi STabkhi H(2020)LLVM-based automation of memory decoupling for OpenCL applications on FPGAsMicroprocessors & Microsystems10.1016/j.micpro.2019.10290972:COnline publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1016/j.micpro.2019.102909
Rogers SSlycord JRaheja RTabkhi H(2019)Scalable LLVM-Based Accelerator Modeling in gem5IEEE Computer Architecture Letters10.1109/LCA.2019.289393218:1(18-21)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/LCA.2019.2893932

Recommendations

Locality Aware Memory Assignment and Tiling
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
With the trend toward specialization, an efficient memory-path design is vital to capitalize customization in data-path. A monolithic memory hierarchy is often highly inefficient for irregular applications, traditionally targeted for CPUs. New approaches ...
Locality aware management on NAND flash-based main memory for in-memory database systems
EDB '16: Proceedings of the Sixth International Conference on Emerging Databases: Technologies, Applications, and Theory

Conventional database systems manage all data on hard disks, but due to a hard disk's frequent I/O operations, this kind of management exposes critical problems when data is huge or operations are complex and frequent. As the size of the main memory ...
Locality and Duplication-Aware Garbage Collection for Flash Memory-Based Virtual Memory Systems
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology

As embedded systems adopt monolithic kernels, NAND flash memory is used for swap space of virtual memory systems. While flash memory has the advantages of low-power consumption, shock-resistance and non-volatility, it requires garbage collections due to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '18: Proceedings of the 55th Annual Design Automation Conference

June 2018

1089 pages

ISBN:9781450357005

DOI:10.1145/3195970

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE Council on Electronic Design Automation (CEDA)

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

DAC '18

Sponsor:

EDAC
SIGDA

DAC '18: The 55th Annual Design Automation Conference 2018

June 24 - 29, 2018

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
215
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kandemir MTang XKotra JKarakoy MMitra TYoung EXiong J(2022)Fine-Granular Computation and Data Layout Reorganization for Improving LocalityProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549386(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549386
Purkayastha ARogers SShiddibhavi STabkhi H(2020)LLVM-based automation of memory decoupling for OpenCL applications on FPGAsMicroprocessors & Microsystems10.1016/j.micpro.2019.10290972:COnline publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1016/j.micpro.2019.102909
Rogers SSlycord JRaheja RTabkhi H(2019)Scalable LLVM-Based Accelerator Modeling in gem5IEEE Computer Architecture Letters10.1109/LCA.2019.289393218:1(18-21)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/LCA.2019.2893932

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten