skip to main content
research-article

Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

Published: 10 March 2014 Publication History

Abstract

The recent emergence of various Non-Volatile Memories (NVMs), with many attractive characteristics such as low leakage power and high-density, provides us with a new way of addressing the memory power consumption problem. In this article, we target embedded CMPs, and propose a novel Hybrid Scratch Pad Memory (HSPM) architecture which consists of SRAM and NVM to take advantage of the ultra-low leakage power, high density of NVM, and fast access of SRAM. A novel data allocation algorithm as well as an algorithm to determine the NVM/SRAM ratio for the novel HSPM architecture are proposed. The experimental results show that the data allocation algorithm can reduce the memory access time by 33.51% and the dynamic energy consumption by 16.81% on average for the HSPM architecture when compared with a greedy algorithm. The NVM/SRAM size determination algorithm can further reduce the memory access time by 14.7% and energy consumption by 20.1% on average.

References

[1]
O. Avissar, R. Barua, and D. Stewart. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'01). 34--43.
[2]
O. Avissar, R. Barua, and D. Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26.
[3]
R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'02). 73--78.
[4]
C. Bienia. 2011. Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University.
[5]
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. 2006. The m5 simulator: Modeling networked systems. IEEE Micro 26, 52--60.
[6]
W. Che, A. Panda, and K. S. Chatha. 2010. Compilation of stream programs for multicore processors that incorporate scratchpad memories. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 1118--1123.
[7]
Y. Chen, H. Li, X. Wang, W. Zhu, W. Xu, and T. Zhang. 2010. A nondestructive self-reference scheme for spin-transfer torque random access memory (stt-ram). In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 148--153.
[8]
K. C. Chun, P. Jain, and C. H. Kim. 2009. A 0.9v, 65nm logic-compatible embedded dram with > 1ms data retention time and 53% less static power than a power-gated sram. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'09). 119--120.
[9]
D. Culler, J. P. Singh, and A. Gupta. 1998. Parallel Computer Architecture: A Hardware/Software Approach. 1st Ed. Morgan Kaufmann.
[10]
G. Dhiman, R. Ayoub, and T. Rosing. 2009. Pdram: a hybrid pram and dram main memory system. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'09). 664--469.
[11]
A. Dominguez, S. Udayakumaran, and R. Barua. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4, 521--540.
[12]
X. Dong, N. P. Jouppi, and Y. Xie. 2009. Pcramsim: System-level performance, energy, and area modeling for phase-change ram. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'09). 269--275.
[13]
X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic ram (mram) as a universal memory replacement. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'08). 554--559.
[14]
J. Du, Y. Wang, Q. Zhuge, J. Hu, and E. H.-M. Sha. 2013. Efficient loop scheduling for chip-multiprocessors with non-volatile main memory. J. Signal Proces. Syst., 1--13.
[15]
A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Mossé. 2010. Increasing pcm main memory lifetime. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 914--919.
[16]
Y. Guo, Q. Zhuge, J. Hu, M. Qiu, and E.-M. Sha. 2011. Optimal data allocation for scratch-pad memory on embedded multi-core systems. In Proceedings of the International Conference on Parallel Processing (ICPP'11). 464--471.
[17]
M. Hosomi, H. Yamagishi. et al. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'09). 459--462.
[18]
J. Hu, W.-C. Tseng, C. J. Xue, Q. Zhuge, Y. Zhao, and E. H.-M. Sha. 2011. Write activity minimization for non-volatile main memory via scheduling and recomputation. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 30, 4, 584--592.
[19]
J. Hu, C. J. Xue, W.-C. Tseng, Y. He, M. Qiu, and E. H.-M. Sha. 2010a. Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'10). 350--355.
[20]
J. Hu, C. J. Xue, W.-C. Tseng, Q. Zhuge, and E. H.-M. Sha. 2010b. Minimizing write activities to non-volatile memory via scheduling and recomputation. In Proceedings of the IEEE 8th Symposium on Application Specific Processors (SASP'10). 7--12.
[21]
J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2011. Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'11). 1--6.
[22]
J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2012a. Data allocation optimization for hybrid scratch pad memory with sram and non-volatile memory. IEEE Trans. VLSI Syst., 1--9.
[23]
J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2012b. Write activity reduction on non-volatile main memories for embedded chip multi-processors. ACM Trans. Embed. Comput. Syst. 12, 3, 1--25.
[24]
J. Hu, Q. Zhuge, C. Xue, W.-C. Tseng, and E. Sha. 2012. Optimizing data allocation and memory configuration for non-volatile memory based hybrid spm on embedded cmps. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'12). 982--989.
[25]
Z. Hu, G. Gerfin, B. Dobry, and G. R. Gao. 2006. Programming experience on cyclops-64 multi-core chip architecture. In Proceedings of the 1st Workshop on Software Tools for Multi-Core Systems (STMCS'06).
[26]
L. Jiang, Y. Du, Y. Zhang, B. Childers, and J. Yang. 2011. Lls: Cooperative integration of wear-leveling and salvaging for pcm main memory. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'11). 221--232.
[27]
Y. Joo, D. Niu, X. Dong, G. Sun, N. Chang, and Y. Xie. 2010. Energy- and endurance-aware design of phase change memory caches. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 136--141.
[28]
M. Kandemir, M. J. Irwin, G. Chen, and I. Kolcu. 2004. Banked scratch-pad memory management for reducing leakage energy consumption. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'04). 120--124.
[29]
M. Kandemir, M. J. Irwin, G. Chen, and I. Kolcu. 2005. Compiler-guided leakage optimization for banked scratch-pad memories. IEEE Trans. VLSI Syst. 13, 10, 1136--1146.
[30]
M. Kandemir, J. Ramanujam, and A. Choudhary. 2002. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'02). 219--224.
[31]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 2--13.
[32]
Q. Li, J. Li, L. Shi, C. J. Xue, and Y. He. 2012. Mac: migration-aware compilation for stt-ram based hybrid cache in embedded systems. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'12). 351--356.
[33]
Q. Li, Y. Zhao, J. Hu, C. J. Xue, E. H.-M. Sha, and Y. He. 2012. Mgc: Multiple graph-coloring for non-volatile memory based hybrid scratchpad memory. In Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures. 17--24.
[34]
T. Liu, Y. Zhao, C. Xue, and M. Li. 2011. Power-aware variable partitioning for dsps with hybrid pram and dram main memory. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'11). 405--410.
[35]
P. Mangalagiri, K. Sarpatwari, A. Yanamandra, V. Narayanan, Y. Xie, M. J. Irwin, and O. A. Karim. 2008. A low-power phase change memory based hybrid cache architecture. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI'08). 395--398.
[36]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. 2009. Cacti 6.0: A tool to model large caches. Tech. Rep. HPL-2009-85, HP Laboratories.
[37]
O. Ozturk, M. Kandemir, and I. Kolcu. 2006. Shared scratch-pad memory space management. In Proceedings of the International Symposium on Quality Electronic Design (ISQED'06). 576--584.
[38]
P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Design and Test Conference (EDTC'97).
[39]
M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 24--33.
[40]
Y. Shang, W. Fei, and H. Yu. 2012. Analysis and modeling of internal state variables for dynamic effects of nonvolatile memory devices. IEEE Trans. Circuits Syst. Regul. Pap. 59, 9, 1.
[41]
L. Shi, C. J. Xue, J. Hu, W.-C. Tseng, and E. H.-M. Sha. 2010. Write activity reduction on flash main memory via smart victim cache. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI'10). 91--94.
[42]
J. Sjödin, B. Fröderberg, and L. Thomas. 1998. Allocation of global data objects in on-chip ram. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'98). 1--5.
[43]
J. Sjödin, and C. Von Platen. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'01). 15--23.
[44]
W.-C. Tseng, C. J. Xue, Q. Zhuge, J. Hu, and E. H.-M. Sha. 2010. Optimal scheduling to minimize non-volatile memory access time with hardware cache. In Proceedings of the 18th IEEE/IFIP VLSI System on Chip Conference (VLSI-SOC'10). 131--136.
[45]
S. Udayakumaran, and R. Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). 276--286.
[46]
S. Udayakumaran, A. Dominguez, and R. Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511.
[47]
Y. Wang, J. Du, J. Hu, Q. Zhuge, and E.-M. Sha. 2012. Loop scheduling optimization for chip-multiprocessors with non-volatile main memory. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'12). 1553--1556.
[48]
X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 34--45.
[49]
X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie. 2009. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'09). 737--742.
[50]
Y. Xie, G. H. Loh, B. Black, and K. Bernstein. 2006. Design space exploration for 3D architectures. J. Emerg. Technol. Comput. Syst. 2, 2, 65--103.
[51]
P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 14--23.

Cited By

View all
  • (2023)COMPADJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2023.103022145:COnline publication date: 1-Dec-2023
  • (2022)MASTER: Reclamation of Hybrid Scratchpad Memory to Maximize Energy Saving in Multi-Core Edge SystemsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2021.30494477:4(749-760)Online publication date: 1-Oct-2022
  • (2022)Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systemsFuture Generation Computer Systems10.1016/j.future.2022.05.005135(270-282)Online publication date: Oct-2022
  • Show More Cited By

Index Terms

  1. Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 13, Issue 4
    Regular Papers
    November 2014
    647 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2592905
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 10 March 2014
    Accepted: 01 October 2012
    Revised: 01 April 2012
    Received: 01 December 2011
    Published in TECS Volume 13, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data allocation
    2. MRAM
    3. NVM
    4. PCM
    5. SPM
    6. energy
    7. multicore processors
    8. on-chip memory

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)COMPADJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2023.103022145:COnline publication date: 1-Dec-2023
    • (2022)MASTER: Reclamation of Hybrid Scratchpad Memory to Maximize Energy Saving in Multi-Core Edge SystemsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2021.30494477:4(749-760)Online publication date: 1-Oct-2022
    • (2022)Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systemsFuture Generation Computer Systems10.1016/j.future.2022.05.005135(270-282)Online publication date: Oct-2022
    • (2021)High-Performance Predictable NVM-Based Instruction Memory for Real-Time Embedded SystemsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2018.28580209:1(441-455)Online publication date: 1-Jan-2021
    • (2019)Power-mode-aware Memory Subsystem Optimization for Low-power System-on-Chip DesignACM Transactions on Embedded Computing Systems10.1145/335658318:5(1-25)Online publication date: 9-Oct-2019
    • (2018)Writing-aware data variable allocation on hybrid SRAM+NVM SPMProceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems10.5555/3283552.3283563(1-2)Online publication date: 30-Sep-2018
    • (2018)Energy Optimization for Data Allocation With Hybrid SRAM+NVM SPMIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2017.272067865:1(307-318)Online publication date: Jan-2018
    • (2018)Fast write operations in non-volatile memories using latency masking2018 Real-Time and Embedded Systems and Technologies (RTEST)10.1109/RTEST.2018.8397072(1-7)Online publication date: May-2018
    • (2018)Work-in-Progress: Writing-Aware Data Variable Allocation on Hybrid SRAM+NVM SPM2018 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES)10.1109/CASES.2018.8516763(1-2)Online publication date: Sep-2018
    • (2018)TTEC: Data Allocation Optimization for Morphable Scratchpad Memory in Embedded SystemsIEEE Access10.1109/ACCESS.2018.28727626(54701-54712)Online publication date: 2018
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media