ABSTRACT
Software managed scratchpad memories (SPMs) provide improved performance and power in embedded processors by reducing required hardware resources. Performance depends strongly on the scheme used to map code and data onto the SPM, but generating optimal mappings can be extremely difficult. Here we address instruction mapping on SPMs and present a performance model and algorithm, "Code Overlay Generator" (COG), for producing high performance dynamic SPM code mappings. Our heuristic does not require profiling information, and is suitable for generating mapping solutions for large programs which are otherwise infeasible using previously proposed Integer Linear Programming (ILP) techniques.
We compare our algorithm with a published heuristic and the code overlay mapping algorithm provided with the Cell Broadband Engine (CBE) Synergistic Processing Unit (SPU) compiler from IBM, spu-gcc. We find an average performance advantage of 34% compared to the previous algorithm, and 87% with respect to spugcc. We additionally show that our performance model enables improved tools for offline evaluation of code overlay performance and mapping selection.
- Cell Broadband Engine Architecture. IBM Systems and Technology Group, 2007.Google Scholar
- Software Development Kit for Multicore Acceleration Version 3.1 Programmer's Guide. IBM Systems and Technology Group, 2008.Google Scholar
- F. Angiolini, F. Menichelli, A. Ferrero, L. Benini, and M. Olivieri. A post-compiler approach to scratchpad mapping of code. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 259--267, Washington, DC, 2004. Google ScholarDigital Library
- R. Cytron and P. G. Loewner. An automatic overlay generator. IBM Journal of Research and Development, 30:603--608, Nov. 1986. Google ScholarDigital Library
- B. Egger, J. Lee, and H. Shin. Scratchpad memory management for portable systems with a memory management unit. In International Conference On Embedded Software, pages 321--330, Seoul, Korea, 2006. Google ScholarDigital Library
- gnu.org. GCC online documentation. http://gcc.gnu.org/onlinedocs/.Google Scholar
- A. Janapsatya, A. Ignjatovic, and S. Parameswaran. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proc. Asia and South Pacific Design Automation Conference, pages 612--617, Yokohama, Japan, 2006. Google ScholarDigital Library
- A. Pabalkar, A. Shrivastava, A. Kannan, and J. Lee. SDRM: Simultaneous Determination of Regions and Function-to-Region Mapping for Scratchpad Memories. Lecture Notes in Computer Science. Berlin, 2008. Google ScholarDigital Library
- D. Pham et al. Overview of the Architecture, Circuit Design, and Physical Implementation of a First-generation Cell Processor. In IEEE Journal of Solid-State Circuits, volume 41, pages 179--196. IBM, 2006.Google ScholarCross Ref
- Power Architecture editors. An introduction to compiling for the Cell Broadband Engine architecture. IBM, developerWorks, 2006.Google Scholar
- T. R. Spacek. A proposal to establish a pseudo virtual memory via writable overlays. Communications of the ACM, 15:421--426, June 1972. Google ScholarDigital Library
- S. Steinke, N. Grunwald, L. Wehmeyer, R. Banakar, M. Balakrishnan, and P. Marwedel. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In ISSS '02: Proceedings of the 15th international symposium on System Synthesis, pages 213--218, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- S. Udayakumaran, A. Dominguez, and R. Barua. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Transactions on Embedded Computing Systems (TECS), 5:472--511, May 2006. Google ScholarDigital Library
- M. Verma and P. Marwedel. Overlay techniques for scratchpad memories in low power embedded processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14:802--815, Aug. 2006. Google ScholarDigital Library
Index Terms
- A performance model and code overlay generator for scratchpad enhanced embedded processors
Recommendations
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors
Special Section on ESTIMedia'10In this article, we propose a heuristic algorithm for scheduling synchronous data flow (SDF) models on scratch pad memory (SPM) enhanced processors with the objective of minimizing its steady-state execution time. The task involves partitioning the ...
Overlay techniques for scratchpad memories in low power embedded processors
Energy consumption is one of the important parameters to be optimized during the design of portable embedded systems. Thus, most of the contemporary portable devices feature low-power processors coupled with on-chip memories (e.g., caches, scratchpads). ...
A dynamic code placement technique for scratchpad memory using postpass optimization
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systemsIn this paper, we propose a fully automatic dynamic scratch-pad memory (SPM) management technique for instructions. Our technique loads required code segments into the SPM on demand at runtime. Our approach is based on postpass analysis and optimization ...
Comments