skip to main content
10.1145/3400302.3415637acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

DRAMA: an approximate DRAM architecture for high-performance and energy-efficient deep training system

Published: 17 December 2020 Publication History

Abstract

As the density of DRAM becomes larger, the refresh overhead becomes more significant. This becomes more problematic in the systems that require large DRAM capacity, such as the training of deep neural networks (DNNs). To solve this problem, we present DRAMA, a novel architecture which employs the approximate characteristic of DNNs. We make that non-critical bits are not refreshed while critical bits are normally refreshed. The refresh time of the critical bits are concealed by employing per-bank refreshes, significantly improving the training system performance. Furthermore, the potential racing hazard of bank-refresh technique is simply prevented by our novel command scheduler in DRAM controllers. Our experiments on various recent DNNs show that DRAMA can improve the training system performance by 10.4% and save 23.77% DRAM energy compared to the conventional architecture.

References

[1]
Ishwar Bhati, Zeshan Chishti, Shih-Lien Lu, and Bruce Jacob. 2015. Flexible auto-refresh: enabling scalable and energy-efficient DRAM refresh reductions. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 235--246.
[2]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.
[3]
Vamsi Boppana, Sagheer Ahmad, Ilya Ganusov, Vinod Kathail, Vidya Rajagopalan, and Ralph Wittig. 2015. UltraScale+ MPSoC and FPGA families. In 2015 IEEE Hot Chips 27 Symposium (HCS). IEEE, 1--37.
[4]
Kevin Kai-Wei Chang, Donghyuk Lee, Zeshan Chishti, Alaa R Alameldeen, Chris Wilkerson, Yoongu Kim, and Onur Mutlu. 2014. Improving DRAM performance by parallelizing refreshes with accesses. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 356--367.
[5]
Ryan R. Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, and Shangtong Zhang. 2018. mlpack 3: a fast, flexible machine learning library. Journal of Open Source Software 3 (2018), 726. Issue 26.
[6]
Scott J Derner, Casey R Kurth, and Daryl L Habersetzer. 2004. Partial array self-refresh. US Patent 6,834,022.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[8]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016).
[9]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
[10]
Boyeal Kim, Sang Hyun Lee, Hyun Kim, Duy-Thanh Nguyen, Minh-Son Le, Ik Joon Chang, Dohun Kwon, Jin Hyeok Yoo, Jun Won Choi, and Hyuk-Jae Lee. 2020. PCM: precision-controlled memory system for energy efficient deep neural network training. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1199--1204.
[11]
Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The CIFAR-10 dataset. online: http://www.cs.toronto.edu/kriz/cifar.html (2014).
[12]
Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998).
[13]
Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu. 2013. An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 60--71.
[14]
Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu. 2012. RAIDR: Retention-aware intelligent DRAM refresh. In ACM SIGARCH Computer Architecture News, Vol. 40. IEEE Computer Society, 1--12.
[15]
Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G Zorn. 2012. Flikker: saving DRAM refresh-power through critical data partitioning. ACM SIGPLAN Notices 47, 4 (2012), 213--224.
[16]
Naveen Mellempudi, Sudarshan Srinivasan, Dipankar Das, and Bharat Kaul. 2019. Mixed precision training with 8-bit floating point. arXiv preprint arXiv:1905.12334 (2019).
[17]
Micron. 2009. 4Gb_DDR3_SDRAM specification. https://www.micron.com/-/media/documents/products/data_sheet/dram/ddr3/4gb_ddr3_sdram.pdf (2009).
[18]
Duy-Thanh Nguyen, Nhut-Minh Ho, and Ik-Joon Chang. 2019. St-DRC: Stretchable DRAM Refresh Controller with No Parity-overhead Error Correction Scheme for Energy-efficient DNNs. In Proceedings of the 56th Annual Design Automation Conference 2019. ACM, 205.
[19]
Duy Thanh Nguyen, Hyun Kim, Hyuk-Jae Lee, and Ik-Joon Chang. 2018. An Approximate Memory Architecture for a Reduction of Refresh Power Consumption in Deep Learning Applications. In Circuits and Systems (ISCAS), 2018 IEEE International Symposium on. IEEE, 1--5.
[20]
Minesh Patel, Jeremie S Kim, and Onur Mutlu. 2017. The Reach Profiler (REAPER) Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions. ACM SIGARCH Computer Architecture News 45, 2 (2017), 255--268.
[21]
Arnab Raha, Hrishikesh Jayakumar, Soubhagya Sutar, and Vijay Raghunathan. 2015. Quality-aware data allocation in approximate DRAM. In Proceedings of the 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. IEEE Press, 89--98.
[22]
Phillip J Restle, JW Park, and Brian F Lloyd. 1992. DRAM variable retention time. IEDM Tech. Dig (1992), 807--810.
[23]
Scott Rixner, William J Dally, Ujval J Kapasi, Peter Mattson, and John D Owens. 2000. Memory access scheduling. ACM SIGARCH Computer Architecture News 28, 2 (2000), 128--138.
[24]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.
[25]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[26]
DDR4 SDRAM STANDARD and DDR3 SDRAM Standard. 2012. JESD79-3F. Joint Electron Device Engineering Council (2012).
[27]
JEDEC Standard. 2012. DDR4 SDRAM Standard. JEDEC Solid State Technology Association. JEDEC Standard 79--4 (2012), 1--214.
[28]
JEDEC Standard. 2014. Low power double data rate 4. LPDDR4), JESD209-4A,(Revision of JESD209-4 (2014).
[29]
JEDEC Standard. 2015. Low power double data rate 3. LPDDR3), JESD209-3C,(Revision of JESD209-3 (2015).
[30]
Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Viji Srinivasan, Xiaodong Cui, Wei Zhang, and Kailash Gopalakrishnan. 2019. Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks. In Advances in Neural Information Processing Systems. 4901--4910.
[31]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[32]
William K Zuravleff and Timothy Robinson. 1997. Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent 5,630,096.

Cited By

View all
  • (2023)Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network InferenceEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-19568-6_10(275-314)Online publication date: 1-Oct-2023
  • (2021)GRAMACM Transactions on Architecture and Code Optimization10.1145/344183018:2(1-24)Online publication date: 9-Feb-2021
  1. DRAMA: an approximate DRAM architecture for high-performance and energy-efficient deep training system

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
    November 2020
    1396 pages
    ISBN:9781450380263
    DOI:10.1145/3400302
    • General Chair:
    • Yuan Xie
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CAS
    • IEEE CEDA
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 December 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DRAM refresh
    2. approximate computing
    3. memory architecture
    4. neural networks

    Qualifiers

    • Research-article

    Funding Sources

    • Ministry of Science and ICT
    • Ministry of Trade, Industry and Energy

    Conference

    ICCAD '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 457 of 1,762 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network InferenceEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-19568-6_10(275-314)Online publication date: 1-Oct-2023
    • (2021)GRAMACM Transactions on Architecture and Code Optimization10.1145/344183018:2(1-24)Online publication date: 9-Feb-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media