research-article

DRAMA: an approximate DRAM architecture for high-performance and energy-efficient deep training system

Authors:

Duy-Thanh Nguyen,

Chang-Hong Min,

Ik-Joon ChangAuthors Info & Claims

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

Article No.: 47, Pages 1 - 8

https://doi.org/10.1145/3400302.3415637

Published: 17 December 2020 Publication History

Abstract

As the density of DRAM becomes larger, the refresh overhead becomes more significant. This becomes more problematic in the systems that require large DRAM capacity, such as the training of deep neural networks (DNNs). To solve this problem, we present DRAMA, a novel architecture which employs the approximate characteristic of DNNs. We make that non-critical bits are not refreshed while critical bits are normally refreshed. The refresh time of the critical bits are concealed by employing per-bank refreshes, significantly improving the training system performance. Furthermore, the potential racing hazard of bank-refresh technique is simply prevented by our novel command scheduler in DRAM controllers. Our experiments on various recent DNNs show that DRAMA can improve the training system performance by 10.4% and save 23.77% DRAM energy compared to the conventional architecture.

References

[1]

Ishwar Bhati, Zeshan Chishti, Shih-Lien Lu, and Bruce Jacob. 2015. Flexible auto-refresh: enabling scalable and energy-efficient DRAM refresh reductions. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 235--246.

Digital Library

[2]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.

Digital Library

[3]

Vamsi Boppana, Sagheer Ahmad, Ilya Ganusov, Vinod Kathail, Vidya Rajagopalan, and Ralph Wittig. 2015. UltraScale+ MPSoC and FPGA families. In 2015 IEEE Hot Chips 27 Symposium (HCS). IEEE, 1--37.

[4]

Kevin Kai-Wei Chang, Donghyuk Lee, Zeshan Chishti, Alaa R Alameldeen, Chris Wilkerson, Yoongu Kim, and Onur Mutlu. 2014. Improving DRAM performance by parallelizing refreshes with accesses. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 356--367.

[5]

Ryan R. Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, and Shangtong Zhang. 2018. mlpack 3: a fast, flexible machine learning library. Journal of Open Source Software 3 (2018), 726. Issue 26.

[6]

Scott J Derner, Casey R Kurth, and Daryl L Habersetzer. 2004. Partial array self-refresh. US Patent 6,834,022.

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[8]

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016).

[9]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).

[10]

Boyeal Kim, Sang Hyun Lee, Hyun Kim, Duy-Thanh Nguyen, Minh-Son Le, Ik Joon Chang, Dohun Kwon, Jin Hyeok Yoo, Jun Won Choi, and Hyuk-Jae Lee. 2020. PCM: precision-controlled memory system for energy efficient deep neural network training. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1199--1204.

[11]

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The CIFAR-10 dataset. online: http://www.cs.toronto.edu/kriz/cifar.html (2014).

[12]

Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998).

[13]

Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu. 2013. An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 60--71.

Digital Library

[14]

Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu. 2012. RAIDR: Retention-aware intelligent DRAM refresh. In ACM SIGARCH Computer Architecture News, Vol. 40. IEEE Computer Society, 1--12.

Digital Library

[15]

Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G Zorn. 2012. Flikker: saving DRAM refresh-power through critical data partitioning. ACM SIGPLAN Notices 47, 4 (2012), 213--224.

[16]

Naveen Mellempudi, Sudarshan Srinivasan, Dipankar Das, and Bharat Kaul. 2019. Mixed precision training with 8-bit floating point. arXiv preprint arXiv:1905.12334 (2019).

[17]

Micron. 2009. 4Gb_DDR3_SDRAM specification. https://www.micron.com/-/media/documents/products/data_sheet/dram/ddr3/4gb_ddr3_sdram.pdf (2009).

[18]

Duy-Thanh Nguyen, Nhut-Minh Ho, and Ik-Joon Chang. 2019. St-DRC: Stretchable DRAM Refresh Controller with No Parity-overhead Error Correction Scheme for Energy-efficient DNNs. In Proceedings of the 56th Annual Design Automation Conference 2019. ACM, 205.

Digital Library

[19]

Duy Thanh Nguyen, Hyun Kim, Hyuk-Jae Lee, and Ik-Joon Chang. 2018. An Approximate Memory Architecture for a Reduction of Refresh Power Consumption in Deep Learning Applications. In Circuits and Systems (ISCAS), 2018 IEEE International Symposium on. IEEE, 1--5.

[20]

Minesh Patel, Jeremie S Kim, and Onur Mutlu. 2017. The Reach Profiler (REAPER) Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions. ACM SIGARCH Computer Architecture News 45, 2 (2017), 255--268.

Digital Library

[21]

Arnab Raha, Hrishikesh Jayakumar, Soubhagya Sutar, and Vijay Raghunathan. 2015. Quality-aware data allocation in approximate DRAM. In Proceedings of the 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. IEEE Press, 89--98.

Digital Library

[22]

Phillip J Restle, JW Park, and Brian F Lloyd. 1992. DRAM variable retention time. IEDM Tech. Dig (1992), 807--810.

[23]

Scott Rixner, William J Dally, Ujval J Kapasi, Peter Mattson, and John D Owens. 2000. Memory access scheduling. ACM SIGARCH Computer Architecture News 28, 2 (2000), 128--138.

Digital Library

[24]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

[25]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[26]

DDR4 SDRAM STANDARD and DDR3 SDRAM Standard. 2012. JESD79-3F. Joint Electron Device Engineering Council (2012).

[27]

JEDEC Standard. 2012. DDR4 SDRAM Standard. JEDEC Solid State Technology Association. JEDEC Standard 79--4 (2012), 1--214.

[28]

JEDEC Standard. 2014. Low power double data rate 4. LPDDR4), JESD209-4A,(Revision of JESD209-4 (2014).

[29]

JEDEC Standard. 2015. Low power double data rate 3. LPDDR3), JESD209-3C,(Revision of JESD209-3 (2015).

[30]

Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Viji Srinivasan, Xiaodong Cui, Wei Zhang, and Kailash Gopalakrishnan. 2019. Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks. In Advances in Neural Information Processing Systems. 4901--4910.

[31]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[32]

William K Zuravleff and Timothy Robinson. 1997. Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent 5,630,096.

Cited By

Orosa LKoppula SKanellopoulos KYağlıkçı AMutlu O(2023)Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network InferenceEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-19568-6_10(275-314)Online publication date: 1-Oct-2023
https://doi.org/10.1007/978-3-031-19568-6_10
Ho Nsilva HWong W(2021)GRAMACM Transactions on Architecture and Code Optimization10.1145/344183018:2(1-24)Online publication date: 9-Feb-2021
https://dl.acm.org/doi/10.1145/3441830

DRAMA: an approximate DRAM architecture for high-performance and energy-efficient deep training system
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Per-bank refresh with adaptive early termination for high density DRAM
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

DRAM, which is mainly used as main memory, requires a refresh operation to maintain the integrity of stored data. Since memory read and write operations to a bank are not allowed while the bank is being refreshed, a lot of memory accesses may be blocked ...
Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling
ASPLOS '17

DRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command ...
Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling
Asplos'17

DRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

November 2020

1396 pages

ISBN:9781450380263

DOI:10.1145/3400302

General Chair:
Yuan Xie
Univ. of California, Santa Barbara, CA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CAS
IEEE CEDA
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Science and ICT
Ministry of Trade, Industry and Energy

Conference

ICCAD '20

Sponsor:

SIGDA

ICCAD '20: IEEE/ACM International Conference on Computer-Aided Design

November 2 - 5, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
213
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Orosa LKoppula SKanellopoulos KYağlıkçı AMutlu O(2023)Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network InferenceEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-19568-6_10(275-314)Online publication date: 1-Oct-2023
https://doi.org/10.1007/978-3-031-19568-6_10
Ho Nsilva HWong W(2021)GRAMACM Transactions on Architecture and Code Optimization10.1145/344183018:2(1-24)Online publication date: 9-Feb-2021
https://dl.acm.org/doi/10.1145/3441830

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents