ABSTRACT
Software Managed Multicore (SMM) architectures have been proposed as a solution for scaling the memory architecture. In an SMM architecture, there are no caches, and each core has only a local scratchpad memory. If all the code and data of the task to be executed on an SMM core cannot fit on the local memory, then data must be managed explicitly in the program through DMA instructions. While all code and data need to be managed, an efficient technique to manage stack data is of utmost importance since an average of 64% of all accesses may be to stack variables [16]. In this paper, we formulate the problem of stack data management optimization on an SMM core. We then develop both an ILP and a heuristic - SSDM (Smart Stack Data Management) to find out where to insert stack data management calls in the program. Experimental results demonstrate SSDM can reduce the overhead by 13X over the state-of-the-art stack data management technique [10].
- Intel Core i7 Processor Extreme Edition and Intel Core i7 Processor Datasheet, Volume 1. In White paper. Intel.Google Scholar
- Raw Performance: SiSoftware Sandra 2010 Pro (GFLOPS).Google Scholar
- The SCC Programmer's Guide. Technical report.Google Scholar
- A. V. Aho, M. S. Lam, R. Sethi, J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, 1986. Google ScholarDigital Library
- F. Angiolini et al. A Post-Compiler Approach to Scratchpad Mapping of Code. In Proc. of CASES, pages 259--267, 2004. Google ScholarDigital Library
- K. Bai, and A. Shrivastava. A Software-Only Scheme for Managing Heap Data on Limited Local Memory (LLM) Multi-core Processors. ACM TECS, 2013.Google Scholar
- K. Bai, D. Lu, and A. Shrivastava. Vector Class on Limited Local Memory (LLM) Multi-core Processors. In Proc. of CASES, 2011. Google ScholarDigital Library
- K. Bai and A. Shrivastava. Heap Data Management for Limited Local Memory (LLM) Multi-core Processors. In Proc. of CODES+ISSS, 2010. Google ScholarDigital Library
- K. Bai and A. Shrivastava. Automatic and Efficient Heap Data Management for Limited Local Memory Multicore Architectures. In Proc. of DATE, 2013. Google ScholarDigital Library
- K. Bai, A. Shrivastava, and S. Kudchadker. Stack Data Management for Limited Local Memory (LLM) Multi-core Processors. In Proc. of ASAP, pages 231--234, 2011. Google ScholarDigital Library
- R. Banakar et al. Scratchpad Memory: Design Alternative for Cache on-chip Memory in Embedded Systems. In Proc. of CODES+ISSS, pages 73--78, 2002. Google ScholarDigital Library
- A. Dominguez, S. Udayakumaran, and R. Barua. Heap Data Allocation to Scratch-pad Memory in Embedded Systems. J. Embedded Comput., 1(4):521--540, 2005. Google ScholarDigital Library
- B. Egger et al. A Dynamic Code Placement Technique for Scratchpad Memory Using Postpass Optimization. In Proc. of CASES, pages 223--233, 2006. Google ScholarDigital Library
- B. Flachs et al. The Microarchitecture of the Synergistic Processor for A Cell Processor. IEEE Solid-state circuits, 41(1):63--70, 2006.Google ScholarCross Ref
- L. Gauthier and T. Ishihara. Implementation of Stack Data Placement and Run Time Management Using a Scratch-Pad Memory for Energy Consumption Reduction of Embedded Applications. IEICE, 94--A(12):2597--2608, 2011.Google Scholar
- M. R. Guthaus et al. Mibench: A Free, Commercially Representative Embedded Benchmark Suite. Proc. of Workload Characterization, pages 3--14, 2001. Google ScholarDigital Library
- A. Janapsatya et al. A Novel Instruction Scratchpad Memory Optimization Method Based on Concomitance Metric. In Proc. of ASP-DAC, pages 612--617, 2006. Google ScholarDigital Library
- S. C. Jung, A. Shrivastava, and K. Bai. Dynamic Code Mapping for Limited Local Memory Systems. In Proc. of ASAP, pages 13--20, 2010.Google ScholarCross Ref
- M. Kandemir and A. Choudhary. Compiler-directed Scratch pad Memory Hierarchy Design and Management. In Proc. of DAC, pages 628--633, 2002. Google ScholarDigital Library
- M. Kandemir et al. Dynamic Management of Scratch-pad Memory Space. In Proc. of DAC, pages 690--695, 2001. Google ScholarDigital Library
- M. Kistler et al. Cell Multiprocessor Communication Network: Built for Speed. IEEE Micro, 26(3):10--23, May 2006. Google ScholarDigital Library
- L. Li, L. Gao, and J. Xue. Memory Coloring: A Compiler Approach for Scratchpad Memory Management. In Proc. of PACT, pages 329--338, 2005. Google ScholarDigital Library
- M. Mamidipaka and N. Dutt. On-chip Stack Based Memory Organization for Low Power Embedded Architectures. In Proc. of DATE, pages 1082--1087, 2003. Google ScholarDigital Library
- R. Mcllroy et al. Efficient Dynamic Heap Allocation of Scratch-pad Memory. In ISMM, pages 31--40, 2008. Google ScholarDigital Library
- N. Nguyen, A. Dominguez, and R. Barua. Memory Allocation for Embedded Systems with A Compile-time-unknown Scratch-pad Size. In Proc. of CASES, pages 115--125, 2005. Google ScholarDigital Library
- P. Panda et al. On-chip vs. Off-chip Memory: the Data Partitioning Problem in Embedded Processor-based Systems. In ACM TODAES, pages 682--704, 2000. Google ScholarDigital Library
- S. Park et al. A Novel Technique to Use Scratch-pad Memory for Stack Management. In Proc. of DATE, pages 1478--1483, 2007. Google ScholarDigital Library
- F. Poletti et al. An Integrated Hardware/Software Approach for Run-time Scratchpad Management. In Proc. of DAC, pages 238--243, 2004. Google ScholarDigital Library
- A. Shrivastava et al. A Software-only Solution to Use Scratch Pads for Stack Data. IEEE TCAD, 28(11):1719--1728, 2009. Google ScholarDigital Library
- S. Udayakumaran, A. Dominguez, and R. Barua. Dynamic Allocation for Scratch-pad Memory Using Compile-time Decisions. ACM TECS, 5(2):472--511, 2006. Google ScholarDigital Library
Index Terms
- SSDM: smart stack data management for software managed multicores (SMMs)
Recommendations
Vector class on limited local memory (LLM) multi-core processors
CASES '11: Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systemsLimited Local Memory (LLM) multi-core architecture is a promising solution for scalable memory hierarchy. LLM architecture, e.g., IBM Cell/B.E. is a purely distributed memory architecture in which each core can directly access only its small local ...
CMSM: an efficient and effective code management for software managed multicores
CODES+ISSS '13: Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System SynthesisAs we scale the number of cores in a multicore processor, scaling the memory hierarchy is a major challenge. Software Managed Multicore (SMM) architectures are one of the promising solutions. In an SMM architecture, there are no caches, and each core ...
Efficient Code Assignment Techniques for Local Memory on Software Managed Multicores
Scaling the memory hierarchy is a major challenge when we scale the number of cores in a multicore processor. Software Managed Multicore (SMM) architectures come up as one of the promising solutions. In an SMM architecture, there are no caches, and each ...
Comments