research-article

Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology

Authors:

Youtao ZhangAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 20, Issue 4

Article No.: 54, Pages 1 - 24

https://doi.org/10.1145/2764903

Published: 28 September 2015 Publication History

Abstract

Modern mobile processors integrating an increasing number of cores into one single chip demand large-capacity, on-chip, last-level caches (LLCs) in order to achieve scalable performance improvements. However, adopting traditional memory technologies such as SRAM and embedded DRAM (eDRAM) leakage and scalability problems. Spin-transfer torque magnetic RAM (STT-MRAM) is a novel nonvolatile memory technology that has emerged as a promising alternative for constructing on-chip caches in high-end mobile processors. STT-MRAM has many advantages, such as short read latency, zero leakage from the memory cell, and better scalability than eDRAM and SRAM. Multilevel cell (MLC) STT-MRAM further enlarges capacity and reduces per-bit cost by storing more bits in one cell.

However, MLC STT-MRAM has long write latency which limits the effectiveness of MLC STT-MRAM-based LLCs. In this article, we address this limitation with three novel designs: line pairing (LP), line swapping (LS), and dynamic LP/LS enabler (DLE). LP forms fast cache lines by reorganizing MLC soft bits which are faster to write. LS dynamically stores frequently-written data into these fast cache lines. We then propose a dynamic LP/LS enabler (DLE) to enable LP and LS only if they help to improve the overall cache performance. Our experimental results show that the proposed designs improve system performance by 9--15% and reduce energy consumption by 14--21% for various types of mobile processors.

References

[1]

Mohammad Alizadeh, Adel Javanmard, Shang-Tse Chuang, Sundar Iyer, and Yi Lu. 2012. Versatile refresh: Low complexity refresh scheduling for high-throughput multi-banked eDRAM. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems. 247--258.

Digital Library

[2]

ARM. 2012a. Cortex-A15. http://www.arm.com/products/processors/cortex-a/cortex-a15.php.

[3]

ARM. 2012b. Cortex-A7. http://www.arm.com/products/processors/cortex-a/cortex-a7.php.

[4]

ARM. 2011. ARM big.LITTLE technology. http://www.arm.com/products/processors/technologies/biglittleprocessing.php.

[5]

Xiuyuan Bi, Mengjie Mao, Danghui Wang, and Hai Li. 2013. Unleashing the potential of MLC STT-RAM caches. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 429--436.

Digital Library

[6]

Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L³Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 143--154.

Digital Library

[7]

Yiran Chen, Xiaobin Wang, Wenzhong Zhu, Hai Li, Zhenyu Sun, Guangyu Sun, and Yuan Xie. 2010. Access scheme of multi-level cell spin-transfer torque random access memory and its optimization. In Proceedings of the IEEE International Midwest Symposium on Circuits and Systems. 1109--1112.

[8]

Yiran Chen, Weng-Fai Wong, Hai Li, and Cheng-Kok Koh. 2011. Processor caches built using multi-level spin-transfer torque RAM cells. In Proceedings of the International Symposium on Low Power Electronics and Design. 73--78.

Digital Library

[9]

Ping Chi, Cong Xu, Tao Zhang, Xiangyu Dong, and Yuan Xie. 2014. Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 301--308.

Digital Library

[10]

Suock Chung, K.-M. Rho, S.-D. Kim, H.-J. Suh, D.-J. Kim, H. J. Kim, S. H. Lee, J.-H. Park, H.-M. Hwang, S.-M. Hwang, J.-Y. Lee, Y.-B. Au, J.-U. Yi, Y.-H. Seo, D.-H. Jung, M.-S. Lee, S.-H. Cho, J.-N. Kim, G.-J. Park, J. Gyuan, A. Driskill-Smith, V. Nikitin, A. Ong, X. Tang, Y. Kim, J.-S. Rho, S.-K. Park, S. W. Chung, J. G. Jeong, and S. I. Hong. 2010. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application. In Proceedings of the IEEE International Electron Devices Meeting. 12--7.

[11]

Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, H. Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the ACM/IEEE Design Automation Conference. IEEE, 554--559.

Digital Library

[12]

Fujitsu. 2012. LOOX. http://solutions.us.fujitsu.com/LOOX/.

[13]

Preston Gralla. 2011. Motorola Xoom: The Missing Manual. O'Reilly Media, Inc.

Digital Library

[14]

Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 24--33.

Digital Library

[15]

M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proceedings of the IEEE International Electron Devices Meeting Technical Digest. 459--462.

[16]

HP. 2010. CACTI. http://www.hpl.hp.com/research/cacti/.

[17]

HTC. 2014. Desire 820. http://blog.htc.com/2014/09/htc-desire-820/.

[18]

Intel. 2013. Atom C2000. http://ark.intel.com/products/71269.

[19]

Intel. 2014. Atom Z3795. http://ark.intel.com/products/80267.

[20]

Intel. 2015. Core i7-5557U. http://ark.intel.com/products/84993/.

[21]

T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, K. Ito, H. Matsuoka, and H. Ohno. 2010. A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions. In Proceedings of the Symposium on VLSI Technology. 47--48.

[22]

Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. 2006. Impact of NBTI on SRAM read stability and design for reliability. In Proceedings of the IEEE International Symposium on Quality Electronic Design. 210--218.

Digital Library

[23]

Jianhua Li, Liang Shi, Qingan Li, Chun Jason Xue, Yiran Chen, Yinlong Xu, and Wei Wang. 2013. Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh. ACM Trans. Des. Automat. Electron. Syst. 19, 1 (2013), 5:1--5:23.

Digital Library

[24]

Xiaohua Lou, Zheng Gao, Dimitar V. Dimitrov, and Michael X. Tang. 2008. Demonstration of multilevel cell spin transfer switching in MgO magnetic tunnel junctions. Appl. Phys. Lett. 93, 24 (2008), 242502--242503.

[25]

Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2 (2002), 50--58.

Digital Library

[26]

MediaTek. 2013. MT5692. http://event.mediatek.com/_en_octacore/.

[27]

nVIDIA. 2012. Tegra 2. http://www.nvidia.com/object/tegra-superchip.html.

[28]

nVIDIA. 2013. Tegra 4. http://www.nvidia.com/object/tegra-4-processor.html.

[29]

Qualcomm. 2013. Snapdragon 615. https://www.qualcomm.com/products/snapdragon/processors/615.

[30]

R. Sbiaa, R. Law, S. Y. H. Lua, E. L. Tan, T. Tahmasebi, C. C. Wang, and S. N. Piramanayagam. 2011. Spin transfer torque switching for multi-bit per cell magnetic memory with perpendicular anisotropy. Appl. Phys. Lett. 99, 9 (2011).

[31]

Mrigank Sharad, Rangharajan Venkatesan, Anand Raghunathan, and Kaushik Roy. 2013. Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches. In Proceedings of the International Symposium on Low Power Electronics and Design. 64--69.

Digital Library

[32]

Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 50--61.

Digital Library

[33]

Guangyu Sun, Huazhong Yang, and Yuan Xie. 2012. Performance/thermal-aware design of 3D-stacked L2 caches for CMPs. ACM Trans. Des. Autom. Electron. Syst. 17, 2 (2012), 13:1--13:20.

Digital Library

[34]

Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 239--249.

[35]

Zhenyu Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Proceedings of the IEEE/ACM Design Automation Conference. 1--6.

Digital Library

[36]

Dean M. Tullsen and Jeffery A. Brown. 2001. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 318--327.

Digital Library

[37]

Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2013. i²WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 234--245.

Digital Library

[38]

Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, USA, 34--45.

Digital Library

[39]

Wei Xu, Yiran Chen, Xiaobin Wang, and Tong Zhang. 2009. Improving STT MRAM storage density through smaller-than-worst-case transistor sizing. In Proceedings of the ACM/IEEE Design Automation Conference. 87--90.

Digital Library

[40]

Bo Zhao, Jun Yang, Youtao Zhang, Yiran Chen, and Hai Li. 2013. Common-source-line array: An area efficient memory architecture for bipolar nonvolatile devices. ACM Trans. Des. Autom. Electron. Syst. 18, 4 (2013), 57:1--57:18.

Digital Library

[41]

Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design. 264--268.

Digital Library

Cited By

Qoutb AFriedman E(2023)Double magnetic tunnel junction two bit memory and nonvolatile logic for in situ computingMicroelectronics Journal10.1016/j.mejo.2022.105635131(105635)Online publication date: Jan-2023
https://doi.org/10.1016/j.mejo.2022.105635
Choi JGong YChung S(2021)A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAMElectronics10.3390/electronics1005062310:5(623)Online publication date: 8-Mar-2021
https://doi.org/10.3390/electronics10050623
Shreya SKaushik B(2020)Modeling of Voltage-Controlled Spin–Orbit Torque MRAM for Multilevel Switching ApplicationIEEE Transactions on Electron Devices10.1109/TED.2019.295168467:1(90-98)Online publication date: Jan-2020
https://doi.org/10.1109/TED.2019.2951684

Index Terms

Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors
DAC '12: Proceedings of the 49th Annual Design Automation Conference

MLC STT-MRAM (Multi-level Cell Spin-Transfer Torque Magnetic RAM), an emerging non-volatile memory technology, has become a promising candidate to construct L2 caches for high-end embedded processors. However, the long write latency limits the ...
Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories

New phase-change memory (PCM) devices have low-access latencies (like DRAM) and high capacities (i.e., low cost per bit, like Flash). In addition to being able to scale to smaller cell sizes than DRAM, a PCM cell can also store multiple bits per cell (...
Building and Optimizing MRAM-Based Commodity Memories

Emerging non-volatile memory technologies such as MRAM are promising design solutions for energy-efficient memory architecture, especially for mobile systems. However, building commodity MRAM by reusing DRAM designs is not straightforward. The existing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 20, Issue 4

Special Issue on Reliable, Resilient, and Robust Design of Circuits and Systems

September 2015

475 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/2830627

Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 28 September 2015

Revised: 01 April 2015

Received: 01 January 2015

Published in TODAES Volume 20, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
240
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qoutb AFriedman E(2023)Double magnetic tunnel junction two bit memory and nonvolatile logic for in situ computingMicroelectronics Journal10.1016/j.mejo.2022.105635131(105635)Online publication date: Jan-2023
https://doi.org/10.1016/j.mejo.2022.105635
Choi JGong YChung S(2021)A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAMElectronics10.3390/electronics1005062310:5(623)Online publication date: 8-Mar-2021
https://doi.org/10.3390/electronics10050623
Shreya SKaushik B(2020)Modeling of Voltage-Controlled Spin–Orbit Torque MRAM for Multilevel Switching ApplicationIEEE Transactions on Electron Devices10.1109/TED.2019.295168467:1(90-98)Online publication date: Jan-2020
https://doi.org/10.1109/TED.2019.2951684

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents