research-article

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

Authors:
Siyuan Huang

George Washington University

George Washington University

0000-0003-1643-5060
View Profile

,
Brian D. Hoskins

National Institute of Standards and Technology

National Institute of Standards and Technology

0000-0002-9418-9291
View Profile

,
Matthew W. Daniels

National Institute of Standards and Technology

National Institute of Standards and Technology

0000-0002-3390-4714
View Profile

,
Mark D. Stiles

National Institute of Standards and Technology

National Institute of Standards and Technology

0000-0001-8238-4156
View Profile

,
Gina C. Adam

George Washington University

George Washington University

0000-0003-0027-1145
View Profile

ACM Journal on Emerging Technologies in Computing Systems Volume 19 Issue 2Article No.: 16pp 1–24https://doi.org/10.1145/3577214

Published:18 May 2023Publication History

ACM Journal on Emerging Technologies in Computing Systems

Abstract

The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-k approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories.

REFERENCES

[1] Adam Gina C., Hoskins Brian D., Prezioso Mirko, Merrikh-Bayat Farnood, Chakrabarti Bhaswar, and Strukov Dmitri B.. 2016. 3-D memristor crossbars for analog and neuromorphic computing applications. IEEE Transactions on Electron Devices 64, 1 (2016), 312–318.Google ScholarCross Ref
[2] Adam Gina C., Khiat Ali, and Prodromakis Themis. 2018. Challenges hindering memristive neuromorphic hardware from going mainstream. Nature Communications 9, 1 (2018), 1–4.Google ScholarCross Ref
[3] Allen-Zhu Zeyuan and Li Yuanzhi. 2017. First efficient convergence for streaming k-PCA: A global, gap-free, and near-optimal rate. In Proceedings of the 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS’17). IEEE Los Alamitos, CA, 487–492.Google ScholarCross Ref
[4] Ambrogio Stefano, Narayanan Pritish, Tsai Hsinyu, Shelby Robert M., Boybat Irem, Nolfo Carmelo Di, Sidler Severin, et al. 2018. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 7708 (2018), 60–67.Google ScholarCross Ref
[5] Ash-Saki A., Khan M. N. I., and Ghosh S.. 2021. Reconfigurable and dense analog circuit design using two terminal resistive memory. IEEE Transactions on Emerging Topics in Computing 9, 3 (2021), 1596–1608.Google ScholarCross Ref
[6] Balcan Maria-Florina, Du Simon Shaolei, Wang Yining, and Yu Adams Wei. 2016. An improved gap-dependency analysis of the noisy power method. In Proceedings of the Conference on Learning Theory. 284–309.Google Scholar
[7] Bayat Farnood Merrikh, Prezioso Mirko Prezioso, Chakrabarti Bhaswar, Kataeva Irina, and Strukov Dmitri B.. 2017. Memristor-based perceptron classifier: Increasing complexity and coping with imperfect hardware. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17).549–554.Google Scholar
[8] Bishop Mindy D., Wong H.-S. Philip, Mitra Subhasish, and Shulaker Max M.. 2019. Monolithic 3-D integration. IEEE Micro 39, 6 (2019), 16–27.Google ScholarCross Ref
[9] Blum Avrim, Hopcroft John, and Kannan Ravindran. 2020. Foundations of Data Science. Cambridge University Press.Google ScholarCross Ref
[10] Boahen Kwabena. 2017. A neuromorph’s prospectus. Computing in Science & Engineering 19, 2 (2017), 14–28.Google ScholarDigital Library
[11] Boboila Simona and Desnoyers Peter. 2010. Write endurance in flash drives: Measurements and analysis. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). 115–128.Google Scholar
[12] Bordelon Blake, Canatar Abdulkadir, and Pehlevan Cengiz. 2020. Spectrum dependent learning curves in kernel regression and wide neural networks. In Proceedings of the International Conference on Machine Learning. 1024–1034.Google Scholar
[13] Bowman Benjamin and Montufar Guido. 2022. Spectral bias outside the training set for deep networks in the kernel regime. arXiv preprint arXiv:2206.02927 (2022).Google Scholar
[14] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.Google Scholar
[15] Burr Geoffrey W., Ambrogio Stefano, Narayanan Pritish, Tsai Hsinyu, Mackin Charles, and Chen An. 2020. Accelerating deep neural networks with analog memory devices. In Proceedings of the International Conference on Artificial Intelligence Circuits and Systems (AICAS’20). IEEE, Los Alamitos, CA, 149–152.Google Scholar
[16] Burr Geoffrey W., Shelby Robert M., Sidler Severin, Nolfo Carmelo Di, Jang Junwoo, Boybat Irem, Shenoy Rohit S., et al. 2015. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. IEEE Transactions on Electron Devices 62, 11 (2015), 3498–3507.Google ScholarCross Ref
[17] Canatar Abdulkadir, Bordelon Blake, and Pehlevan Cengiz. 2021. Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature Communications 12 (2021), Article 2914.Google ScholarCross Ref
[18] Ceze Luis, Hasler Jennifer, Likharev Konstantin K., Seo Jae-Sun, Sherwood Tim, Strukov Dmitri, Xie Yuan, and Yu Shimeng. 2016. Nanoelectronic neurocomputing: Status and prospects. In Proceedings of the 74th Annual Device Research Conference (DRC’16).Google Scholar
[19] Chakrabarti Bhaswar, Lastras-Montaño Miguel Angel, Adam Gina, Prezioso Mirko, Hoskins Brian, Payvand M., Madhavan A., et al. 2017. A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit. Scientific Reports 7 (2017), 42429.Google ScholarCross Ref
[20] Chen Zhen, Chen Zhibo, Lin Jianxin, Liu Sen, and Li Weiping. 2020. Deep neural network acceleration based on low-rank approximated channel pruning. IEEE Transactions on Circuits and Systems I: Regular Papers 67, 4 (2020), 1232–1244.Google ScholarCross Ref
[21] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 248–255.Google ScholarCross Ref
[22] Eshraghian Jason K., Cho Kyoungrok, and Kang Sung Mo. 2021. A 3-D reconfigurable RRAM crossbar inference engine. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’21).Google ScholarCross Ref
[23] Fick Dave and Henry Mike. 2018. Analog computation in flash memory for datacenter-scale AI inference in a small chip. In Proceedings of the 2018 Symposium on High Performance Chips (Hot Chips’18).Google Scholar
[24] Fuller Elliot J., Keene Scott T., Melianas Armantas, Wang Zhongrui, Agarwal Sapan, Li Yiyang, Tuchman Yaakov, et al. 2019. Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing. Science 364, 6440 (2019), 570–574.Google ScholarCross Ref
[25] Gao Yutong, Wu Shang, and Adam Gina C.. 2020. Batch training for neuromorphic systems with device non-idealities. In Proceedings of the International Conference on Neuromorphic Systems (ICONS’20).Google ScholarDigital Library
[26] Ge Shiming, Luo Zhao, Zhao Shengwei, Jin Xin, and Zhang Xiao-Yu. 2017. Compressing deep neural networks for efficient visual inference. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, Los Alamitos, CA, 667–672.Google ScholarCross Ref
[27] Gokmen Tayfun and Vlasov Yurii. 2016. Acceleration of deep neural network training with resistive cross-point devices: Design considerations. Frontiers in Neuroscience 10 (2016), 333.Google ScholarCross Ref
[28] Golmant Noah, Vemuri Nikita, Yao Zhewei, Feinberg Vladimir, Gholami Amir, Rothauge Kai, Mahoney Michael W., and Gonzalez Joseph. 2018. On the computational inefficiency of large batch sizes for stochastic gradient descent. arXiv preprint arXiv:1811.12941 (2018).Google Scholar
[29] Golub Gene H. and Loan Charles F. Van. 2012. Matrix Computations. JHU Press.Google Scholar
[30] Goyal Priya, Dollár Piotr, Girshick Ross, Noordhuis Pieter, Wesolowski Lukasz, Kyrola Aapo, Tulloch Andrew, Jia Yangqing, and He Kaiming. 2017. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).Google Scholar
[31] Goyal Saurabh, Choudhury Anamitra Roy, and Sharma Vivek. 2019. Compression of deep neural networks by combining pruning and low rank decomposition. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’19). IEEE, Los Alamitos, CA, 952–958.Google ScholarCross Ref
[32] Guan Naiyang, Tao Dacheng, Luo Zhigang, and Yuan Bo. 2012. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Transactions on Neural Networks and Learning Systems 23, 7 (2012), 1087–1099.Google ScholarCross Ref
[33] Gural Albert, Nadeau Phillip, Tikekar Mehul, and Murmann Boris. 2020. Low-rank training of deep neural networks for emerging memory technology. arXiv preprint arXiv:2009.03887 (2020).Google Scholar
[34] Hardt Moritz and Price Eric. 2014. The noisy power method: A meta algorithm with applications. In Advances in Neural Information Processing Systems (NIPS’14). 2861–2869.Google Scholar
[35] Hoskins Brian D., Daniels Mathew W., Huang Siyuan, Madhavan Advait, Adam Gina C., Zhitenev Nikolai, McClelland Jabez J., and Stiles Mark. 2019. Streaming batch eigenupdates for hardware neural networks. Frontiers in Neuroscience 13 (2019), 793.Google ScholarCross Ref
[36] Imani M., Peroni D., Rahimi A., and Rosing T.. 2016. Resistive CAM acceleration for tunable approximate computing. IEEE Transactions on Emerging Topics in Computing 7, 2 (2016), 271–280.Google ScholarCross Ref
[37] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.Google ScholarDigital Library
[38] Kataeva Irina, Ohtsuka Shigeki, Nili Hussein, Kim Hyungjin, Isobe Yoshihiko, Yako Koichi, and Strukov Dmitri. 2019. Towards the development of analog neuromorphic chip prototype with 2.4 M integrated memristors. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). IEEE, Los Alamitos, CA, 1–5.Google ScholarCross Ref
[39] Kawahara Akifumi, Kawai Ken, Ikeda Yuuichirou, Katoh Yoshikazu, Azuma Ryotaro, Yoshimoto Yuhei, Tanabe Kouhei, et al. 2013. Filament scaling forming technique and level-verify-write scheme with endurance over \(10^7\) cycles in ReRAM. In Proceedings of the 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. IEEE, Los Alamitos, CA, 220–221.Google ScholarCross Ref
[40] Krizhevsky Alex. 2009. Learning Multiple Layers of Features from Tiny Images. Master’s thesis. University of Toronto, Toronto, Canada.Google Scholar
[41] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS’12). 1097–1105.Google Scholar
[42] Lanczos Cornelius. 1950. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards 45, 4 (1950), 255–282.Google ScholarCross Ref
[43] Li Chun-Liang, Lin Hsuan-Tien, and Lu Chi-Jen. 2016. Rivalry of two families of algorithms for memory-restricted streaming PCA. Proceedings of Machine Learning Research 51 (2016), 473–481.Google Scholar
[44] Li Mu, Andersen David G., Smola Alexander J., and Yu Kai. 2014. Communication efficient distributed machine learning with the parameter server. In Advances in Neural Information Processing Systems (NIPS’14). 19–27.Google Scholar
[45] Li Y., Kim S., Sun X., Solomon P., Gokmen T., Tsai H., Koswatta S., et al. 2018. Capacitor-based cross-point array for analog neural network with record symmetry and linearity. In Proceedings of the 2018 IEEE Symposium on VLSI Technology. IEEE, Los Alamitos, CA, 25–26.Google ScholarCross Ref
[46] Lin Peng, Li Can, Wang Zhongrui, Li Yunning, Jiang Hao, Song Wenhao, Rao Mingyi, et al. 2020. Three-dimensional memristor circuits as complex neural networks. Nature Electronics 3, 4 (2020), 225–232.Google ScholarCross Ref
[47] Lin Yujun, Han Song, Mao Huizi, Wang Yu, and Dally William J.. 2017. Deep gradient compression: Reducing the communication bandwidth for distributed training. ICLR 2018 proceedingsGoogle Scholar
[48] Liu Dong C. and Nocedal Jorge. 1989. On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 1-3 (1989), 503–528.Google ScholarCross Ref
[49] Liu Peng, You Zhiqiang, Wu Jigang, Elimu Michael, Wang Weizheng, Cai Shuo, and Han Yinhe. 2021. Defect analysis and parallel testing for 3D hybrid CMOS-memristor memory. IEEE Transactions on Emerging Topics in Computing 8, 2 (2021), 745–758.Google ScholarCross Ref
[50] Mayer Ruben and Jacobsen Hans-Arno. 2020. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Computing Surveys 53, 1 (2020), 1–37.Google ScholarDigital Library
[51] Mazzia Vittorio, Salvetti Francesco, and Chiaberge Marcello. 2021. Efficient-CapsNet: Capsule network with self-attention routing. Scientific Reports 11 (2021), Article 14634.Google ScholarCross Ref
[52] McKinstry Jeffrey L., Esser Steven K., Appuswamy Rathinakumar, Bablani Deepika, Arthur John V., Yildiz Izzet B., and Modha Dharmendra S.. 2019. Discovering low-precision networks close to full-precision networks for efficient inference. In Proceedings of the 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS’19). IEEE, Los Alamitos, CA, 6–9.Google ScholarCross Ref
[53] Mitliagkas Ioannis, Caramanis Constantine, and Jain Prateek. 2013. Memory limited, streaming PCA. In Advances in Neural Information Processing Systems (NIPS’13). 2886–2894.Google Scholar
[54] Novikov Alexander, Podoprikhin Dmitry, Osokin Anton, and Vetrov Dmitry. 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (NIPS’15).442–450.Google Scholar
[55] Oja Erkki. 1982. Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15, 3 (1982), 267–273.Google ScholarCross Ref
[56] Oja Erkki. 1992. Principal components, minor components, and linear neural networks. Neural Networks 5, 6 (1992), 927–935.Google ScholarDigital Library
[57] Oja Erkki and Karhunen Juha. 1985. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications 106, 1 (1985), 69–84.Google ScholarCross Ref
[58] Pi Shuang, Li Can, Jiang Hao, Xia Weiwei, Xin Huolin, Yang Joshua J. Joshua, and Xia Qiangfei. 2020. Memristor crossbar arrays with 6-nm half-pitch and 2-nm critical dimension. Nature Nanotechnology 14, 1 (2020), 35–39.Google ScholarCross Ref
[59] Prezioso Mirko, Merrikh-Bayat Farnood, Hoskins Brian D., Adam Gina C., Likharev Konstantin K., and Strukov Dmitri B.. 2015. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 7550 (2015), 61–64.Google ScholarCross Ref
[60] Rahaman Nasim, Baratin Aristide, Arpit Devansh, Draxler Felix, Lin Min, Hamprecht Fred, Bengio Yoshua, and Courville Aaron. 2019. On the spectral bias of neural networks. In Proceedings of the International Conference on Machine Learning. 5301–5310.Google Scholar
[61] Rubing Yang, Jialin Mao, and Pratik Chaudhari. 2021. Does the data induce capacity control in deep learning? In Proceedings of the International Conference on Machine Learning. 25166–25197.Google Scholar
[62] Scardapane Simone, Vaerenbergh Steven Van, Totaro Simone, and Uncini Aurelio. 2019. Kafnets: Kernel-based non-parametric activation functions for neural networks. Neural Networks 110 (2019), 19–32.Google ScholarCross Ref
[63] Schreiber Robert and Loan Charles Van. 1989. A storage-efficient WY representation for products of householder transformations. SIAM Journal on Scientific and Statistical Computing 10, 1 (1989), 53–57.Google ScholarDigital Library
[64] Shallue Christopher J., Lee Jaehoon, Antognini Joseph, Sohl-Dickstein Jascha, Frostig Roy, and Dahl George E.. 2019. Measuring the effects of data parallelism on neural network training. Journal of Machine Learning Research 20 (2019), 1–49.Google Scholar
[65] Smith Samuel L., Kindermans Pieter-Jan, Ying Chris, and Le Quoc V.. 2017. Don’t decay the learning rate, increase the batch size. ICLR 2018 proceedings. (2017)Google Scholar
[66] Song Linghao, Qian Xuehai, Li Hai, and Chen Yiran. 2017. PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, Los Alamitos, CA, 541–552.Google ScholarCross Ref
[67] Srivastava Nitish, Hinton Geoffrey, Krizhevsky Alex, Sutskever Ilya, and Salakhutdinov Ruslan. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958.Google ScholarDigital Library
[68] Strobach Peter. 1997. Bi-iteration SVD subspace tracking algorithms. IEEE Transactions on Signal Processing 45, 5 (1997), 1222–1240.Google ScholarDigital Library
[69] Su Dong, Zhang Huan, Chen Hongge, Yi Jinfeng, Chen Pin-Yu, and Gao Yupeng. 2018. Is robustness the cost of accuracy?–A comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European Conference on Computer Vision (ECCV’18). 631–648.Google ScholarDigital Library
[70] Suri Manan, Querlioz Damien, Bichler Olivier, Palma Giorgio, Vianello Elisa, Vuillaume Dominique, Gamrat Christian, and DeSalvo Barbara. 2013. Bio-inspired stochastic computing using binary CBRAM synapses. IEEE Transactions on Electron Devices 60, 7 (2013), 2402–2409.Google ScholarCross Ref
[71] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need.In Advances in Neural Information Processing Systems(NIPS’17).Google Scholar
[72] Vogels Thijs, Karinireddy Sai Praneeth, and Jaggi Martin. 2019. PowerSGD: Practical low-rank gradient compression for distributed optimization. In Advances In Neural Information Processing Systems (NIPS’19).Google Scholar
[73] Wang Linnan, Wu Wei, Zhang Junyu, Liu Hang, Bosilca George, Herlihy Maurice, and Fonseca Rodrigo. 2020. FFT-based gradient sparsification for the distributed training of deep neural networks. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. 113–124.Google ScholarDigital Library
[74] Wang Mengdi, Meng Chen, Long Guoping, Wu Chuan, Yang Jun, Lin Wei, and Jia Yangqing. 2019. Characterizing deep learning training workloads on Alibaba-PAI. arXiv preprint arXiv:1910.05930 (2019).Google Scholar
[75] Wang Zhongrui, Li Can, Lin Peng, Rao Mingyi, Nie Yongyang, Song Wenhao, Qiu Qinru, et al. 2019. In situ training of feed-forward and recurrent convolutional memristor networks. Nature Machine Intelligence 1 (2019), 434–442.Google ScholarCross Ref
[76] Wen Wei, Xu Cong, Yan Feng, Wu Chunpeng, Wang Yandan, Chen Yiran, and Li Hai. 2017. TernGrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in Neural Information Processing Systems (NIPS’17). 1509–1519.Google Scholar
[77] Xiao T. Patrick, Bennett Christopher H., Feinberg Ben, Agarwal Sapan, and Marinella Matthew J.. 2020. Analog architectures for neural network acceleration based on non-volatile memory. Applied Physics Reviews 7, 3 (2020), 031301.Google ScholarCross Ref
[78] Yang Bin. 1995. An extension of the PASTd algorithm to both rank and subspace tracking. IEEE Signal Processing Letters 2, 9 (1995), 179–182.Google ScholarCross Ref
[79] Yang Puyudi, Hsieh Cho-Jui, and Wang Jane-Ling. 2018. History PCA: A new algorithm for streaming PCA. arXiv preprint arXiv:1802.05447 (2018).Google Scholar
[80] Yu Shimeng, Shim Wonbo, Peng Xiaochen, and Luo Yandong. 2021. RRAM for compute-in-memory: From inference to training. IEEE Transactions on Circuits and Systems I 68, 7 (2021), 2753–2765.Google ScholarCross Ref
[81] Zehui Lin, Liu Pengfei, Huang Luyao, Chen Junkun, Qiu Xipeng, and Huang Xuanjing. 2019. DropAttention: A regularization method for fully-connected self-attention networks. arXiv preprint arXiv:1907.11065 (2019).Google Scholar

Index Terms

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures

Recommendations

A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Read More
A durable and energy efficient main memory using phase change memory technology

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Read More
Write-once-memory-code phase change memory
DATE '14: Proceedings of the conference on Design, Automation & Test in Europe

This paper describes a write-once-memory-code phase change memory (WOM-code PCM) architecture for next-generation non-volatile memory applications. Specifically, we address the long latency of the write operation in PCM --- attributed to PCM SET --- by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Journal on Emerging Technologies in Computing Systems Volume 19, Issue 2
April 2023
214 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3587888
Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA
Issue’s Table of Contents
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 18 May 2023
- Online AM: 13 January 2023
- Accepted: 30 November 2022
- Revised: 27 September 2022
- Received: 17 August 2021
Published in jetc Volume 19, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
gradient data decomposition
streaming
principal component analysis
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 437
  Total Downloads
- Downloads (Last 12 months)365
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

ACM Journal on Emerging Technologies in Computing Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A durable and energy efficient main memory using phase change memory technology

A durable and energy efficient main memory using phase change memory technology

Write-once-memory-code phase change memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

ACM Journal on Emerging Technologies in Computing Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A durable and energy efficient main memory using phase change memory technology

A durable and energy efficient main memory using phase change memory technology

Write-once-memory-code phase change memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media