skip to main content
research-article

Shielding STT-RAM Based Register Files on GPUs against Read Disturbance

Published: 01 November 2016 Publication History

Abstract

To address the high energy consumption issue of SRAM on GPUs, emerging Spin-Transfer Torque (STT-RAM) memory technology has been intensively studied to build GPU register files for better energy-efficiency, thanks to its benefits of low leakage power, high density, and good scalability. However, STT-RAM suffers from the read disturbance issue, which stems from the fact that the voltage difference between read current and write current becomes smaller as technology scales. The read disturbance leads to high error rates for read operations, which cannot be effectively protected by the SEC-DED ECC on large-capacity register files of GPUs.
Prior schemes (e.g., read-restore) to mitigate the read disturbance usually incur either non-trivial performance loss or excessive energy overhead, thus not applicable for the GPU register file design that aims to achieve both high performance and energy-efficiency. To combat the read disturbance, we propose a novel software-hardware co-designed solution (i.e., Red-Shield), which consists of three optimizations to overcome the limitations of the existing solutions. First, we identify dead reads at compiling stage and augment instructions to avoid unnecessary restores. Second, we employ a small read buffer to accommodate register reads with high-access locality to further reduce restores. Third, we propose an adaptive restore mechanism to selectively pick the suitable restore scheme, according to the busy status of corresponding register banks. Experimental results show that our proposed design can effectively mitigate the performance loss and energy overhead caused by restore operations while still maintaining the reliability of reads.

References

[1]
Andrew W. Appel. 1997. Modern Compiler Implementation in C. Cambridge University Press.
[2]
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, and T. M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’09). 163--174.
[3]
Shuai Che, Jeremy W. Sheaffer, Michael Boyer, Lukasz G. Szafaryn, Liang Wang, and Kevin Skadron. 2010. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC’10).
[4]
Ki Chul Chun, Hui Zhao, J. D. Harms, Tae-Hyoung Kim, Jian-Ping Wang, and C. H. Kim. 2013. A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory. IEEE Journal of Solid-State Circuits 48, 2 (2013), 598--610.
[5]
Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7 (2012), 994--1007.
[6]
Wenbin Fang, Bingsheng He, Qiong Luo, and Naga K. Govindaraju. 2011. Mars: Accelerating MapReduce with graphics processors. IEEE Transactions on Parallel and Distributed Systems 22, 4 (2011), 608--620.
[7]
Mark Gebhart, Stephen W. Keckler, and William J. Dally. 2011. A compile-time managed multi-level register file hierarchy. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11).
[8]
Nilanjan Goswami, Bingyi Cao, and Tao Li. 2013. Power-performance co-optimization of throughput core architecture using resistive memory. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13).
[9]
Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, and Xiaoyao Liang. 2013. An energy-efficient and scalable eDRAM-based register file architecture for GPGPU. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13).
[10]
Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). 487.
[11]
Gushu Li, Xiaoming Chen, Guangyu Sun, Henry Hoffmann, Yongpan Liu, Yu Wang, and Huazhong Yang. 2015. A STT-RAM-based low-power hybrid register file for GPGPUs. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). 1--6.
[12]
Jing Li, Haixin Liu, Sayeef Salahuddin, and Kaushik Roy. 2008. Variation-tolerant Spin-Torque transfer (STT) MRAM array for yield enhancement. In Proceedings of the Custom Integrated Circuits Conference. 193--196.
[13]
Xiaoxiao Liu, Mengjie Mao, Xiuyuan Bi, Hai Li, and Yiran Chen. 2015. An efficient STT-RAM-based register file in GPU architectures. In Proceedings of the 20th Asia and South Pacific Design Automation Conference (ASP-DAC’15). 490--495.
[14]
NVIDIA. 2009. NVIDIA next generation CUDA compute architecture: Fermi. Technical Report. http://www.nvidia.com/content/pdf/fermi.
[15]
NVIDIA. 2012. GPU Computing SDK. (2012). https://developer.nvidia.com.
[16]
David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2014. Precision-aware soft error protection for GPUs. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). 49--59.
[17]
Daniele Rossi, Nicola Timoncini, Michael Spica, and Cecilia Metra. 2011. Error correcting code analysis for cache memory high reliability and performance. In Proceedings of the Design, Automation, and Test in Europe (DATE’11). 1--6.
[18]
Bianca Schroeder, Eduardo Pinheiro, and Wolf-Dietrich Weber. 2009. DRAM errors in the wild: A large-scale field study. In Proceedings of the 11th International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’09).
[19]
Michael J. Schulte, Mike Ignatowski, Gabriel H. Loh, Bradford M. Beckmann, William C. Brantley, Sudhanva Gurumurthi, Nuwan Jayasena, Indrani Paul, Steven K. Reinhardt, and Gregory Rodgers. 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro 35, 4 (Jul 2015), 26--36.
[20]
Clinton W. Smullen, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. The STeTSiMS STT-RAM simulation and modeling system. In Proceedings of the International Conference on Computer-Aided Design (ICCAD’11).
[21]
John A. Stratton, Christopher Rodrigues, I.-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-Mei W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. IMPACT Technical Report.
[22]
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA’09). 239--249.
[23]
Zhenyu Sun, Xiuyuan Bi, Hai Helen Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 329--338.
[24]
Zhenyu Sun, Hai Li, and Wenqing Wu. 2012. A dual-mode architecture for fast-switching STT-RAM. In Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’12). 45--50.
[25]
R. Takemura, T. Kawahara, K. Ono, K. Miura, H. Matsuoka, and H. Ohno. 2010. Highly-scalable disruptive reading and restoring scheme for Gb-scale SPRAM and beyond. In Proceedings of the IEEE International Memory Workshop (IMW’10).
[26]
Jue Wang and Yuan Xie. 2015. A write-aware STTRAM-based register file architecture for GPGPU. ACM Journal on Emerging Technologies in Computing Systems 12, 1 (2015), 1--12.
[27]
Rujia Wang, Lei Jiang, Youtao Zhang, Linzhang Wang, and Jun Yang. 2015. Selective restore: An energy efficient read disturbance mitigation scheme for future STT-MRAM. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). 1--6.
[28]
Weisheng Zhao, C. Chappert, V. Javerliac, and J.-P. Noziere. 2009. High speed, high stability and low power sensing amplifier for MTJ/CMOS hybrid logic circuits. IEEE Transactions on Magnetics 45, 10 (Oct 2009), 3784--3787.
[29]
Cong Xu, Dimin Niu, Naveen Muralimanohar, Rajeev Balasubramonian, Tao Zhang, Shimeng Yu, and Yuan Xie. 2015. Overcoming the challenges of crossbar resistive memory architectures. In Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 476--488.
[30]
Hang Zhang, Xuhao Chen, Nong Xiao, and Fang Liu. 2016. Architecting energy-efficient STT-RAM based register file on GPGPUs via delta compression. In Proceedings of the 53rd Annual Design Automation Conference (DAC’16). ACM, New York, New York.
[31]
Yaojun Zhang, Wujie Wen, and Yiran Chen. 2012. The prospect of STT-RAM scaling from readability perspective. IEEE Transactions on Magnetics 48, 11 (2012), 3035--3038.
[32]
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’ 09). 14--23.

Cited By

View all
  • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
  • (2021)Exploring Applications of STT-RAM in GPU ArchitecturesIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2020.303189568:1(238-249)Online publication date: Jan-2021
  • (2021)CacheTree: Reducing Integrity Verification Overhead of Secure Nonvolatile MemoriesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301592540:7(1340-1353)Online publication date: Jul-2021
  • Show More Cited By

Index Terms

  1. Shielding STT-RAM Based Register Files on GPUs against Read Disturbance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 13, Issue 2
    Special Issue on Nanoelectronic Circuit and System Design Methods for the Mobile Computing Era and Regular Papers
    April 2017
    377 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3014160
    • Editor:
    • Yuan Xie
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 November 2016
    Accepted: 01 September 2016
    Revised: 01 July 2016
    Received: 01 April 2016
    Published in JETC Volume 13, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU
    2. STT-RAM
    3. read disturbance
    4. register file

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • Research Fund for the Doctoral Program of Higher Education of China
    • National High Technology Research
    • Development 863 Program of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
    • (2021)Exploring Applications of STT-RAM in GPU ArchitecturesIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2020.303189568:1(238-249)Online publication date: Jan-2021
    • (2021)CacheTree: Reducing Integrity Verification Overhead of Secure Nonvolatile MemoriesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301592540:7(1340-1353)Online publication date: Jul-2021
    • (2021)3RSeT: Read Disturbance Rate Reduction in STT-MRAM Caches by Selective Tag ComparisonIEEE Transactions on Computers10.1109/TC.2021.3082004(1-1)Online publication date: 2021
    • (2020)Multichannel Attention Refinement for Video Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336671016:1s(1-23)Online publication date: 12-Mar-2020
    • (2020)Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUsIEEE Access10.1109/ACCESS.2020.30087198(127768-127780)Online publication date: 2020
    • (2018)HPGraphScientific Programming10.1155/2018/93406972018Online publication date: 11-Dec-2018
    • (2018)High performance graph analytics with productivity on hybrid CPU-GPU platformsProceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications10.1145/3195612.3195614(17-21)Online publication date: 15-Mar-2018

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media