skip to main content
10.1145/3195970.3196070acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Locality aware memory assignment and tiling

Published:24 June 2018Publication History

ABSTRACT

With the trend toward specialization, an efficient memory-path design is vital to capitalize customization in data-path. A monolithic memory hierarchy is often highly inefficient for irregular applications, traditionally targeted for CPUs. New approaches and tools are required to offer application-specific memory customization combining the benefits of cache and scratchpad memory simultaneously.

This paper introduces a novel approach for automated application-specific on-chip memory assignment and tiling. The approach offers two major tools: (1) static memory access analysis and (2) variable-level memory assignment. Static memory analysis performs at the LLVM abstraction. It extracts target-independent pointer behaviors, measures the access strides and analyze the prefetchability of variables. (2) variable-level memory assignment creates a memory allocation graph for memory assignment (cache vs. scratchpad) based on the variables size and their estimated locality. It also explores the opportunity for tiling memory access. For the exploration and results, this paper uses Machsuite benchmarks (with both regular & irregular memory access behaviors), and gem5-Aladdin tool for performance & power evaluation. The proposed approach optimizes the memory hierarchy by automatically combining the benefits of cache, (tiled-) scratchpad at variable level granularity per individual applications. The results demonstrate more than 45% improvement in our power-stall product, on average, over the monolithic cache or scratchpad design.

References

  1. D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley, G. Haugou, F. Clermidy, and D. Dutoit, "Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications," in Proceedings of the 49th Annual Design Automation Conference. ACM, 2012, pp. 1137--1142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Tabkhi, R. Bushey, and G. Schirner, "Function-level processor (flp): A novel processor class for efficient processing of streaming applications," Journal of Signal Processing Systems, vol. 85, no. 3, pp. 287--306, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Cong, M. Ghodrat, M. Gill, B. Grigorian, and G. Reinman, "Architecture support for accelerator-rich CMPs," in Design Automation Conference (DAC), 2012, pp. 843--849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. S. Shao, S. Xi, V. Srinivasan, G.-Y. Wei, and D. Brooks, "Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin," in The 49th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011. {Online}. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. S. Shao, B. Reagan, G.-Y. Wei, and D. Brooks, "Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures," in ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cong, Z. Fang, M. Gill, and G. Reinman, "Parade: A cycle-accurate full-system simulation platform for accelerator-rich architectural design and exploration," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Reagen, R. Adolf, Y. S. Shao, G.-Y. Wei, and D. Brooks, "MachSuite: Benchmarks for accelerator design and customized architectures," in Proceedings of the IEEE International Symposium on Workload Characterization, Raleigh, North Carolina, October 2014.Google ScholarGoogle Scholar
  9. F. Piovezan, T. E. M. Crocomo, and L. C. V. dos Santos, "Cache sizing for low-energy elliptic curve cryptography," in 29th Symposium on Integrated Circuits and Systems Design (SBCCI), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Wang, L. Ju, Z. Jia, and X. Li, "Data allocation for embedded systems with hybrid on-chip scratchpad and caches," in IEEE International Conference on High Performance Computing and Communications, 2013, pp. 366--373.Google ScholarGoogle Scholar
  11. J. Sancho and D. Kerbyson, "Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE," in International Symposium on Parallel and Distributed Processing (ISPDP), 2008, pp. 1--12.Google ScholarGoogle Scholar
  12. L. Wu and W. Zhang, "Cache-aware spm allocation algorithms for hybrid spmcache architectures," in Sixteenth International Symposium on Quality Electronic Design, March 2015, pp. 123--129.Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Hou, L. Zhang, M. Huang, K. Wang, H. Franke, Y. Ge, and X. Chang, "Efficient data streaming with on-chip accelerators: Opportunities and challenges," in High Performance Computer Architecture (HPCA), 2011, pp. 312--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Reagan, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernandez-Lobato, G.-Y. Wei, and D. Brooks, "Minerva: Enabling low-power, highly-accurate deep neural network accelerators," in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Qiu, Z. Chen, Z. Ming, and J. Niu, "Energy-Aware Data Allocation With Hybrid Memory for Mobile Cloud Systems," in IEEE SYSTEMS JOURNAL, VOL. 11, NO. 2, 2017, pp. 813--822.Google ScholarGoogle Scholar
  16. C. Song, L. Ju, and Z. Jia, "Hybrid scratchpad and cache memory management for energy-efficient parallel hevc encoding," in 33rd IEEE International Conference on Computer Design (ICCD), 2015, pp. 712--719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Cong, P. Li, B. Xiao, and P. Zhang, "An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers," in 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), June 2014, pp. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. T. Chen, J. Cong, J. Lei, and P. Wei, "A novel high-throughput acceleration engine for read alignment," in 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, May 2015, pp. 199--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Weinberg, M. O. McCracken, E. Strohmaier, and A. Snavely, "Quantifying locality in the memory access patterns of hpc applications," in Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, Nov 2005, pp. 50--50. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAC '18: Proceedings of the 55th Annual Design Automation Conference
    June 2018
    1089 pages
    ISBN:9781450357005
    DOI:10.1145/3195970

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 June 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

    Upcoming Conference

    DAC '24
    61st ACM/IEEE Design Automation Conference
    June 23 - 27, 2024
    San Francisco , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader