skip to main content
research-article

Write activity reduction on non-volatile main memories for embedded chip multiprocessors

Published: 08 April 2013 Publication History

Abstract

Recent advances in circuit and semiconductor technologies have pushed Non-Volatile Memory (NVM) technologies into a new era. These technologies exhibit appealing properties such as low power consumption, non-volatility, shock-resistivity, and high density. However, there are challenges to which we need answers in the road of applying non-volatile memories as main memory in embedded computer systems. First, when compared with DRAM, NVMs have a limited number of write/erase cycles. Second, write activities on NVM are more expensive than DRAM memory in terms of energy consumption and access latency. Both challenges will benefit from the reduction of the write activities on the NVMs.
In this paper, we target embedded Chip Multiprocessors (CMPs) with Scratch Pad Memory (SPM) and non-volatile main memory. We introduce scheduling, data migration, and recomputation techniques to reduce the number of write activities on NVMs. Experimental results show that the proposed methods can reduce the number of writes by 58.46% on average, which means that the NVM can last 2.8 times as long as before. For Phase Change Memory (PCM), the lifetime is extended from 2.5 years to about 7 years on average and 15 years at the most. Also, the finish time of the tested programs is reduced by an average of 38.07%, and the energy consumption is reduced by an average of 51.23%.

References

[1]
Baiocchi, J. and Childers, B. 2009. Heterogeneous code cache: Using scratchpad and main memory in dynamic binary translators. In Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC'09). 744--749.
[2]
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). 73--78.
[3]
Beckmann, B. M. and Wood, D. A. 2004. Managing wire delay in large chip-multiprocessor caches. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'37). 319--330.
[4]
Chang, Y.-H., Jen-Wei, H., and Kuo, T.-W. 2007. Endurance enhancement of flash-memory storage systems: An efficient static wear leveling design. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). 212--217.
[5]
Chang, Y.-H. and Kuo, T.-W. 2009. A commitment-based management strategy for the performance and reliability enhancement of flash-memory storage systems. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). 858--863.
[6]
Chen, G., Ozturk, O., Kandemir, M., and Karakoy, M. 2006. Dynamic scratch-pad memory management for irregular array access patterns. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'06). 931--936.
[7]
Chen, Y., Wang, X., Li, H., Liu, H., and Dimitrov, D. 2008. Design margin exploration of spin-torque transfer ram (spram). In Proceedings of the International Symposium on Quality Electronic Design (ISQED'08). 684--690.
[8]
Dhiman, G., Ayoub, R., and Rosing, T. 2009. Pdram: A hybrid pram and dram main memory system. In Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC'09) 664--669.
[9]
Dong, X., Jouppi, N. P., and Xie, Y. 2009. Pcramsim: System-level performance, energy, and area modeling for phase-change ram. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'09). 269--275.
[10]
Dong, X., Wu, X., Sun, G., Xie, Y., Li, H., and Chen, Y. 2008. Circuit and microarchitecture evaluation of 3d stacking magnetic ram (mram) as a universal memory replacement. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 554--559.
[11]
Eisley, N., Peh, L.-S., and Shang, L. 2008. Leveraging on-chip networks for data cache migration in chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 197--207.
[12]
Ferreira, A. P., Zhou, M., Bock, S., Childers, B., Melhem, R., and Mosse, D. 2010. Increasing pcm main memory lifetime. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'10). 914--919.
[13]
Fredman, M. L. and Tarjan, R. E. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 596--615.
[14]
Hofstee, H. P. 2005. Power efficient processor architecture and the cell processor. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'05). 258--262.
[15]
Hu, J., Tseng, W.-C., Xue, C. J., Zhuge, Q., Zhao, Y., and Sha, E. H.-M. 2011a. Write activity minimization for non-volatile main memory via scheduling and recomputation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 4, 584--592.
[16]
Hu, J., Xue, C. J., Tseng, W.-C., He, Y., Qiu, M., and Sha, E. H.-M. 2010. Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation. In Proceedings of the 47th Annual Design Automation Conference (DAC'10). 350--355.
[17]
Hu, J., Xue, C. J., Tseng, W.-C., Zhuge, Q., and Sha, E. H.-M. 2010. Minimizing write activities to non-volatile memory via scheduling and recomputation. In Proceedings of the 8th IEEE Symposium on Application Specific Processors (SASP'10). 7--12.
[18]
Hu, J., Xue, C. J., Zhuge, Q., Tseng, W.-C., and Sha, E. H.-M. 2011b. Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition.
[19]
Huang, Y., Liu, T., and Xue, C. 2011. Register allocation for write activity minimization on non-volatile main memory. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'11).
[20]
Joo, Y., Choi, Y., Park, C., Chung, S. W., Chung, E.-Y., and Chang, N. 2006. Demand paging for onenandTM flash execute-in-place. In Proceedings of the Internatinal Conference on Hardware/Software Codesign and System Synthesis. 229--234.
[21]
Joo, Y., Niu, D., Dong, X., Sun, G., Chang, N., and Xie, Y. 2010. Energy- and endurance-aware design of phase change memory caches. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'10). 136--141.
[22]
Kandemir, M., Chen, G., Li, F., and Demirkiran, I. 2005. Using data replication to reduce communication energy on chip multiprocessors. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'05). 769--772.
[23]
Kandemir, M. and Choudhary, A. 2002. Compiler-directed scratch pad memory hierarchy design and management. In Proceedings of the 39th Annual Design Automation Conference (DAC'02). 628--633.
[24]
Kandemir, M., Kadayif, I., and Sezer, U. 2001. Exploiting scratch-pad memory using presburger formulas. In Proceedings of the 14th International Symposium on Systems Synthesis (ISSS'01). 7--12.
[25]
Kandemir, M., Ramanujam, J., and Choudhary, A. 2002. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proceedings of the 39th Annual Design Automation Conference (DAC'02). 219--224.
[26]
Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). 690--695.
[27]
Kandemir, M. T., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2004. A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Trans. on CAD of Integrated Circuits and Systems 23, 2, 243--260.
[28]
Kaneko, S., Kondo, H., Masui, N., Ishimi, K., Itou, T., Satou, M., Okumura, N., Takata, Y., Takata, H., Sakugawa, M., Higuchi, T., Ohtani, S., Sakamoto, K., Ishikawa, N., Nakajima, M., Iwata, S., Hayase, K., Nakano, S., Nakazawa, S., Yamada, K., and Shimizu, T. 2004. A 600-mhz single-chip multiprocessor with 4.8-gb/s internal shared pipelined bus and 512-kb internal memory. IEEE Journal of Solid-State Circuits 39, 1, 184--193.
[29]
Kanellos, M. 2007. Ibm changes directions in magnetic memory. http://news.cnet.com/IBM-changes-directions-in-magnetic-memory/2100-1004_3-6203198.
[30]
Kang, D.-H., Lee, J.-H., Kong, J., Ha, D., Yu, J., Um, C., Park, J., Yeung, F., Kim, J., Park, W., Jeon, Y., Lee, M., Song, Y., Oh, J., Jeong, G., and Jeong, H. 2008. Two-bit cell operation in diode-switch phase change memory cells with 90nm technology. In Proceedings of the Symposium on VLSI Technology. 98--99.
[31]
Koc, H., Kandemir, M., Ercanli, E., and Ozturk, O. 2007. Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). 224--229.
[32]
Lee, B. C., Ipek, E., Mutlu, O., and Burger, D. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09).
[33]
Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'97). 330--335.
[34]
Lee, K. and Orailoglu, A. 2008. Application specific non-volatile primary memory for embedded systems. In Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 31--36.
[35]
Li, H. and Chen, Y. 2009. An overview of non-volatile memory technology and the implication for tools and architectures. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'09). 731--736.
[36]
Li, J., Ndai, P., Goel, A., Liu, H., and Roy, K. 2009. An alternate design paradigm for robust spin-torque transfer magnetic ram (stt mram) from circuit/architecture perspective. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'09). 841--846.
[37]
Liao, G. 1994. A comparative study of dsp multiprocessor list scheduling heuristics. In Proceedings of the 27th Annual Hawaii International Conference on System Sciences.
[38]
Liu, T., Xue, C., Zhao, Y., and Li, M. 2011. Power-ware variable partitioning for dsps with hybrid pram and dram main memory. In Proceedings of the 48th Annual Design Automation Conference (DAC'11).
[39]
Mangalagiri, P., Sarpatwari, K., Yanamandra, A., Narayanan, V., Xie, Y., Irwin, M. J., and Karim, O. A. 2008. A low-power phase change memory based hybrid cache architecture. In Proceedings of the 18th ACM Great Lakes symposium on VLSI (GLSVLSI'08). 395--398.
[40]
Ozturk, O., Kandemir, M., and Kolcu, I. 2006. Shared scratch-pad memory space management. In Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED'06). 576--584.
[41]
Ozturk, O., Kandemir, M., and Narayanan, S. H. K. 2008. A scratch-pad memory aware dynamic loop scheduling algorithm. In Proceedings of the 9th International Symposium on Quality Electronic Design (ISQED'08). 738--743.
[42]
Park, C., Lim, J., Kwon, K., Lee, J., and Min, S. L. 2004. Compiler-assisted demand paging for embedded systems with flash memory. In Proceedings of the 4th ACM International Conference on Embedded Software (EMSOFT'04). 114--124.
[43]
Park, C., Seo, J., Bae, S., Kim, H., Kim, S., and Kim, B. 2003. A low-cost memory architecture with nand xip for mobile embedded systems. In Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'03). 138--143.
[44]
Park, H., Fan, K., Mahlke, S. A., Oh, T., Kim, H., and Kim, H.-s. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 166--176.
[45]
Park, S.-Y., Jung, D., Kang, J.-U., Kim, J.-S., and Lee, J. 2006. Cflru: A replacement algorithm for flash memory. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). 234--241.
[46]
Qureshi, M. K., Srinivasan, V., and Rivers, J. A. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09). 24--33.
[47]
Roberts, D., Kgil, T., and Mudge, T. N. 2009. Using non-volatile memory to save energy in servers. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'09). 743--748.
[48]
Shi, L., Xue, C. J., Hu, J., Tseng, W.-C., and Sha, E. H.-M. 2010. Write activity reduction on flash main memory via smart victim cache. In Proceedings of the 20th ACM/IEEE Great Lakes Symposium on VLSI (GLVLSI'10). 91--94.
[49]
Suhendra, V., Mitra, T., Roychoudhury, A., and Chen, T. 2005. Wcet centric data allocation to scratchpad memory. In Proceedings of the 26th IEEE International Real-Time Systems Symposium (RTSS'05). 223--232.
[50]
Suhendra, V., Raghavan, C., and Mitra, T. 2006. Integrated scratchpad memory optimization and task scheduling for mpsoc architectures. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). 401--410.
[51]
Tseng, W.-C., Xue, C. J., Zhuge, Q., Hu, J., and Sha, E. H.-M. 2010. Optimal scheduling to minimize non-volatile memory access time with hardware cache. In Proceedings of the VLSI-SOC'10. 131--136.
[52]
Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). 276--286.
[53]
Williams, I. 2009. Phase change memory is another step closer. http://www.hpcwire.com/news/Phase-Change-Memory-is-Another-Step-Closer.html.
[54]
Wu, M. and Zwaenepoel, W. 1994. Envy: A non-volatile, main memory storage system. ACM SIGOPS Operating System Revew 28, 5, 86--97.
[55]
Wu, P.-L., Chang, Y.-H., and Kuo, T.-W. 2009. A file-system-aware ftl design for flash-memory storage systems. In Proceedings of the ACM/IEEE Design, Automation and Test in Europe (DATE'09). 393--398.
[56]
Wu, X., Li, J., Zhang, L., Speight, E., Rajamony, R., and Xie, Y. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). 34--45.
[57]
Wu, X., Li, J., Zhang, L., Speight, E., and Xie, Y. 2009. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'09). 737--742.
[58]
Yeung, F. and et al. 2005. ge2sb2te5 confined structures and integration of 64mb phase-change random access memory. Japanese Journal of Applied Physics, 2691--2695.
[59]
Zhang, W. and Li, T. 2009. Exploring phase change memory and 3d die-stacking for power/thermal friendly, fast and durable memory architectures. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT'09). 101--112.
[60]
Zhou, P., Zhang, B., Yang, J., and Zhang, Y. 2009. Energy reduction for stt-ram using early write termination. In Proceedings of the IEEE/ACM 2009 International Conference on Computer-Aided Design (ICCAD'09). 264--268.
[61]
Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09). 14--23.
[62]
Zivojnovic, V., Martinez, J., Schlager, C., and Meyr, H. 1994. Dspstone: A dsp-oriented benchmarking methodology. Tech. rep., Aachen Univeristy, Aachen, Germany.

Cited By

View all
  • (2023)Effective Stack Wear Leveling for NVMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.324087342:10(3250-3263)Online publication date: 30-Jan-2023
  • (2022)Dataflow Driven Partitioning of Machine Learning Applications for Optimal Energy Use in Batteryless SystemsACM Transactions on Embedded Computing Systems10.1145/352013521:5(1-29)Online publication date: 9-Dec-2022
  • (2022)Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systemsFuture Generation Computer Systems10.1016/j.future.2022.05.005135(270-282)Online publication date: Oct-2022
  • Show More Cited By

Index Terms

  1. Write activity reduction on non-volatile main memories for embedded chip multiprocessors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 12, Issue 3
    March 2013
    463 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2442116
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 08 April 2013
    Accepted: 01 October 2011
    Revised: 01 September 2011
    Received: 01 April 2011
    Published in TECS Volume 12, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CMP
    2. Magnetic RAM (MRAM)
    3. Non-volatile memory (NVM)
    4. SPM
    5. data migration
    6. data recomputation
    7. flash memory
    8. phase change memory (PCM)
    9. scheduling

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Effective Stack Wear Leveling for NVMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.324087342:10(3250-3263)Online publication date: 30-Jan-2023
    • (2022)Dataflow Driven Partitioning of Machine Learning Applications for Optimal Energy Use in Batteryless SystemsACM Transactions on Embedded Computing Systems10.1145/352013521:5(1-29)Online publication date: 9-Dec-2022
    • (2022)Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systemsFuture Generation Computer Systems10.1016/j.future.2022.05.005135(270-282)Online publication date: Oct-2022
    • (2020)Exploiting inter- and intra-memory asymmetries for data mapping in hybrid tiered-memoriesProceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management10.1145/3381898.3397215(100-114)Online publication date: 16-Jun-2020
    • (2020)Improving phase change memory performance with data content aware accessProceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management10.1145/3381898.3397210(30-47)Online publication date: 16-Jun-2020
    • (2019)Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change MemoriesACM Transactions on Embedded Computing Systems10.1145/335818018:5s(1-25)Online publication date: 7-Oct-2019
    • (2019)BRLoop: Constructing balanced retimed loop to architect STT-RAM-based hybrid cache for VLIW processorsMicroelectronics Journal10.1016/j.mejo.2018.11.01183(137-146)Online publication date: Jan-2019
    • (2018)Write Energy Reduction for PCM via Pumping Efficiency ImprovementACM Transactions on Storage10.1145/320013914:3(1-21)Online publication date: 26-Nov-2018
    • (2017)Efficient, Long-Term Logging of Rich Data Sensors Using Transient Sensor NodesACM Transactions on Embedded Computing Systems10.1145/304749917:1(1-23)Online publication date: 20-Sep-2017
    • (2017)Building NVRAM-Aware Swapping Through Code Migration in Mobile DevicesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.271378028:11(3089-3099)Online publication date: 6-Oct-2017
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media