skip to main content
research-article

Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology

Published: 28 September 2015 Publication History

Abstract

Modern mobile processors integrating an increasing number of cores into one single chip demand large-capacity, on-chip, last-level caches (LLCs) in order to achieve scalable performance improvements. However, adopting traditional memory technologies such as SRAM and embedded DRAM (eDRAM) leakage and scalability problems. Spin-transfer torque magnetic RAM (STT-MRAM) is a novel nonvolatile memory technology that has emerged as a promising alternative for constructing on-chip caches in high-end mobile processors. STT-MRAM has many advantages, such as short read latency, zero leakage from the memory cell, and better scalability than eDRAM and SRAM. Multilevel cell (MLC) STT-MRAM further enlarges capacity and reduces per-bit cost by storing more bits in one cell.
However, MLC STT-MRAM has long write latency which limits the effectiveness of MLC STT-MRAM-based LLCs. In this article, we address this limitation with three novel designs: line pairing (LP), line swapping (LS), and dynamic LP/LS enabler (DLE). LP forms fast cache lines by reorganizing MLC soft bits which are faster to write. LS dynamically stores frequently-written data into these fast cache lines. We then propose a dynamic LP/LS enabler (DLE) to enable LP and LS only if they help to improve the overall cache performance. Our experimental results show that the proposed designs improve system performance by 9--15% and reduce energy consumption by 14--21% for various types of mobile processors.

References

[1]
Mohammad Alizadeh, Adel Javanmard, Shang-Tse Chuang, Sundar Iyer, and Yi Lu. 2012. Versatile refresh: Low complexity refresh scheduling for high-throughput multi-banked eDRAM. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems. 247--258.
[2]
ARM. 2012a. Cortex-A15. http://www.arm.com/products/processors/cortex-a/cortex-a15.php.
[3]
ARM. 2012b. Cortex-A7. http://www.arm.com/products/processors/cortex-a/cortex-a7.php.
[4]
ARM. 2011. ARM big.LITTLE technology. http://www.arm.com/products/processors/technologies/biglittleprocessing.php.
[5]
Xiuyuan Bi, Mengjie Mao, Danghui Wang, and Hai Li. 2013. Unleashing the potential of MLC STT-RAM caches. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 429--436.
[6]
Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 143--154.
[7]
Yiran Chen, Xiaobin Wang, Wenzhong Zhu, Hai Li, Zhenyu Sun, Guangyu Sun, and Yuan Xie. 2010. Access scheme of multi-level cell spin-transfer torque random access memory and its optimization. In Proceedings of the IEEE International Midwest Symposium on Circuits and Systems. 1109--1112.
[8]
Yiran Chen, Weng-Fai Wong, Hai Li, and Cheng-Kok Koh. 2011. Processor caches built using multi-level spin-transfer torque RAM cells. In Proceedings of the International Symposium on Low Power Electronics and Design. 73--78.
[9]
Ping Chi, Cong Xu, Tao Zhang, Xiangyu Dong, and Yuan Xie. 2014. Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 301--308.
[10]
Suock Chung, K.-M. Rho, S.-D. Kim, H.-J. Suh, D.-J. Kim, H. J. Kim, S. H. Lee, J.-H. Park, H.-M. Hwang, S.-M. Hwang, J.-Y. Lee, Y.-B. Au, J.-U. Yi, Y.-H. Seo, D.-H. Jung, M.-S. Lee, S.-H. Cho, J.-N. Kim, G.-J. Park, J. Gyuan, A. Driskill-Smith, V. Nikitin, A. Ong, X. Tang, Y. Kim, J.-S. Rho, S.-K. Park, S. W. Chung, J. G. Jeong, and S. I. Hong. 2010. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application. In Proceedings of the IEEE International Electron Devices Meeting. 12--7.
[11]
Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, H. Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the ACM/IEEE Design Automation Conference. IEEE, 554--559.
[12]
Fujitsu. 2012. LOOX. http://solutions.us.fujitsu.com/LOOX/.
[13]
Preston Gralla. 2011. Motorola Xoom: The Missing Manual. O'Reilly Media, Inc.
[14]
Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 24--33.
[15]
M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proceedings of the IEEE International Electron Devices Meeting Technical Digest. 459--462.
[16]
HP. 2010. CACTI. http://www.hpl.hp.com/research/cacti/.
[17]
HTC. 2014. Desire 820. http://blog.htc.com/2014/09/htc-desire-820/.
[18]
Intel. 2013. Atom C2000. http://ark.intel.com/products/71269.
[19]
Intel. 2014. Atom Z3795. http://ark.intel.com/products/80267.
[20]
Intel. 2015. Core i7-5557U. http://ark.intel.com/products/84993/.
[21]
T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, K. Ito, H. Matsuoka, and H. Ohno. 2010. A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions. In Proceedings of the Symposium on VLSI Technology. 47--48.
[22]
Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. 2006. Impact of NBTI on SRAM read stability and design for reliability. In Proceedings of the IEEE International Symposium on Quality Electronic Design. 210--218.
[23]
Jianhua Li, Liang Shi, Qingan Li, Chun Jason Xue, Yiran Chen, Yinlong Xu, and Wei Wang. 2013. Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh. ACM Trans. Des. Automat. Electron. Syst. 19, 1 (2013), 5:1--5:23.
[24]
Xiaohua Lou, Zheng Gao, Dimitar V. Dimitrov, and Michael X. Tang. 2008. Demonstration of multilevel cell spin transfer switching in MgO magnetic tunnel junctions. Appl. Phys. Lett. 93, 24 (2008), 242502--242503.
[25]
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2 (2002), 50--58.
[26]
MediaTek. 2013. MT5692. http://event.mediatek.com/_en_octacore/.
[27]
nVIDIA. 2012. Tegra 2. http://www.nvidia.com/object/tegra-superchip.html.
[28]
nVIDIA. 2013. Tegra 4. http://www.nvidia.com/object/tegra-4-processor.html.
[29]
Qualcomm. 2013. Snapdragon 615. https://www.qualcomm.com/products/snapdragon/processors/615.
[30]
R. Sbiaa, R. Law, S. Y. H. Lua, E. L. Tan, T. Tahmasebi, C. C. Wang, and S. N. Piramanayagam. 2011. Spin transfer torque switching for multi-bit per cell magnetic memory with perpendicular anisotropy. Appl. Phys. Lett. 99, 9 (2011).
[31]
Mrigank Sharad, Rangharajan Venkatesan, Anand Raghunathan, and Kaushik Roy. 2013. Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches. In Proceedings of the International Symposium on Low Power Electronics and Design. 64--69.
[32]
Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 50--61.
[33]
Guangyu Sun, Huazhong Yang, and Yuan Xie. 2012. Performance/thermal-aware design of 3D-stacked L2 caches for CMPs. ACM Trans. Des. Autom. Electron. Syst. 17, 2 (2012), 13:1--13:20.
[34]
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 239--249.
[35]
Zhenyu Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Proceedings of the IEEE/ACM Design Automation Conference. 1--6.
[36]
Dean M. Tullsen and Jeffery A. Brown. 2001. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 318--327.
[37]
Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2013. i2WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 234--245.
[38]
Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, USA, 34--45.
[39]
Wei Xu, Yiran Chen, Xiaobin Wang, and Tong Zhang. 2009. Improving STT MRAM storage density through smaller-than-worst-case transistor sizing. In Proceedings of the ACM/IEEE Design Automation Conference. 87--90.
[40]
Bo Zhao, Jun Yang, Youtao Zhang, Yiran Chen, and Hai Li. 2013. Common-source-line array: An area efficient memory architecture for bipolar nonvolatile devices. ACM Trans. Des. Autom. Electron. Syst. 18, 4 (2013), 57:1--57:18.
[41]
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design. 264--268.

Cited By

View all
  • (2023)Double magnetic tunnel junction two bit memory and nonvolatile logic for in situ computingMicroelectronics Journal10.1016/j.mejo.2022.105635131(105635)Online publication date: Jan-2023
  • (2021)A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAMElectronics10.3390/electronics1005062310:5(623)Online publication date: 8-Mar-2021
  • (2020)Modeling of Voltage-Controlled Spin–Orbit Torque MRAM for Multilevel Switching ApplicationIEEE Transactions on Electron Devices10.1109/TED.2019.295168467:1(90-98)Online publication date: Jan-2020

Index Terms

  1. Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Design Automation of Electronic Systems
    ACM Transactions on Design Automation of Electronic Systems  Volume 20, Issue 4
    Special Issue on Reliable, Resilient, and Robust Design of Circuits and Systems
    September 2015
    475 pages
    ISSN:1084-4309
    EISSN:1557-7309
    DOI:10.1145/2830627
    • Editor:
    • Naehyuck Chang
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 28 September 2015
    Revised: 01 April 2015
    Received: 01 January 2015
    Published in TODAES Volume 20, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Spin-transfer torque
    2. magnetic random access memory
    3. multilevel cell

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Double magnetic tunnel junction two bit memory and nonvolatile logic for in situ computingMicroelectronics Journal10.1016/j.mejo.2022.105635131(105635)Online publication date: Jan-2023
    • (2021)A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAMElectronics10.3390/electronics1005062310:5(623)Online publication date: 8-Mar-2021
    • (2020)Modeling of Voltage-Controlled Spin–Orbit Torque MRAM for Multilevel Switching ApplicationIEEE Transactions on Electron Devices10.1109/TED.2019.295168467:1(90-98)Online publication date: Jan-2020

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media