skip to main content
research-article

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

Published:18 May 2023Publication History
Skip Abstract Section

Abstract

The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-k approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories.

REFERENCES

  1. [1] Adam Gina C., Hoskins Brian D., Prezioso Mirko, Merrikh-Bayat Farnood, Chakrabarti Bhaswar, and Strukov Dmitri B.. 2016. 3-D memristor crossbars for analog and neuromorphic computing applications. IEEE Transactions on Electron Devices 64, 1 (2016), 312318.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Adam Gina C., Khiat Ali, and Prodromakis Themis. 2018. Challenges hindering memristive neuromorphic hardware from going mainstream. Nature Communications 9, 1 (2018), 14.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Allen-Zhu Zeyuan and Li Yuanzhi. 2017. First efficient convergence for streaming k-PCA: A global, gap-free, and near-optimal rate. In Proceedings of the 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS’17). IEEE Los Alamitos, CA, 487492.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Ambrogio Stefano, Narayanan Pritish, Tsai Hsinyu, Shelby Robert M., Boybat Irem, Nolfo Carmelo Di, Sidler Severin, et al. 2018. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 7708 (2018), 6067.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Ash-Saki A., Khan M. N. I., and Ghosh S.. 2021. Reconfigurable and dense analog circuit design using two terminal resistive memory. IEEE Transactions on Emerging Topics in Computing 9, 3 (2021), 15961608.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Balcan Maria-Florina, Du Simon Shaolei, Wang Yining, and Yu Adams Wei. 2016. An improved gap-dependency analysis of the noisy power method. In Proceedings of the Conference on Learning Theory. 284309.Google ScholarGoogle Scholar
  7. [7] Bayat Farnood Merrikh, Prezioso Mirko Prezioso, Chakrabarti Bhaswar, Kataeva Irina, and Strukov Dmitri B.. 2017. Memristor-based perceptron classifier: Increasing complexity and coping with imperfect hardware. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17).549554.Google ScholarGoogle Scholar
  8. [8] Bishop Mindy D., Wong H.-S. Philip, Mitra Subhasish, and Shulaker Max M.. 2019. Monolithic 3-D integration. IEEE Micro 39, 6 (2019), 1627.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Blum Avrim, Hopcroft John, and Kannan Ravindran. 2020. Foundations of Data Science. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Boahen Kwabena. 2017. A neuromorph’s prospectus. Computing in Science & Engineering 19, 2 (2017), 1428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Boboila Simona and Desnoyers Peter. 2010. Write endurance in flash drives: Measurements and analysis. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). 115128.Google ScholarGoogle Scholar
  12. [12] Bordelon Blake, Canatar Abdulkadir, and Pehlevan Cengiz. 2020. Spectrum dependent learning curves in kernel regression and wide neural networks. In Proceedings of the International Conference on Machine Learning. 10241034.Google ScholarGoogle Scholar
  13. [13] Bowman Benjamin and Montufar Guido. 2022. Spectral bias outside the training set for deep networks in the kernel regime. arXiv preprint arXiv:2206.02927 (2022).Google ScholarGoogle Scholar
  14. [14] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 18771901.Google ScholarGoogle Scholar
  15. [15] Burr Geoffrey W., Ambrogio Stefano, Narayanan Pritish, Tsai Hsinyu, Mackin Charles, and Chen An. 2020. Accelerating deep neural networks with analog memory devices. In Proceedings of the International Conference on Artificial Intelligence Circuits and Systems (AICAS’20). IEEE, Los Alamitos, CA, 149152.Google ScholarGoogle Scholar
  16. [16] Burr Geoffrey W., Shelby Robert M., Sidler Severin, Nolfo Carmelo Di, Jang Junwoo, Boybat Irem, Shenoy Rohit S., et al. 2015. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. IEEE Transactions on Electron Devices 62, 11 (2015), 34983507.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Canatar Abdulkadir, Bordelon Blake, and Pehlevan Cengiz. 2021. Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature Communications 12 (2021), Article 2914.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Ceze Luis, Hasler Jennifer, Likharev Konstantin K., Seo Jae-Sun, Sherwood Tim, Strukov Dmitri, Xie Yuan, and Yu Shimeng. 2016. Nanoelectronic neurocomputing: Status and prospects. In Proceedings of the 74th Annual Device Research Conference (DRC’16).Google ScholarGoogle Scholar
  19. [19] Chakrabarti Bhaswar, Lastras-Montaño Miguel Angel, Adam Gina, Prezioso Mirko, Hoskins Brian, Payvand M., Madhavan A., et al. 2017. A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit. Scientific Reports 7 (2017), 42429.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Chen Zhen, Chen Zhibo, Lin Jianxin, Liu Sen, and Li Weiping. 2020. Deep neural network acceleration based on low-rank approximated channel pruning. IEEE Transactions on Circuits and Systems I: Regular Papers 67, 4 (2020), 12321244.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Eshraghian Jason K., Cho Kyoungrok, and Kang Sung Mo. 2021. A 3-D reconfigurable RRAM crossbar inference engine. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’21).Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Fick Dave and Henry Mike. 2018. Analog computation in flash memory for datacenter-scale AI inference in a small chip. In Proceedings of the 2018 Symposium on High Performance Chips (Hot Chips’18).Google ScholarGoogle Scholar
  24. [24] Fuller Elliot J., Keene Scott T., Melianas Armantas, Wang Zhongrui, Agarwal Sapan, Li Yiyang, Tuchman Yaakov, et al. 2019. Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing. Science 364, 6440 (2019), 570574.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Gao Yutong, Wu Shang, and Adam Gina C.. 2020. Batch training for neuromorphic systems with device non-idealities. In Proceedings of the International Conference on Neuromorphic Systems (ICONS’20).Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Ge Shiming, Luo Zhao, Zhao Shengwei, Jin Xin, and Zhang Xiao-Yu. 2017. Compressing deep neural networks for efficient visual inference. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, Los Alamitos, CA, 667672.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Gokmen Tayfun and Vlasov Yurii. 2016. Acceleration of deep neural network training with resistive cross-point devices: Design considerations. Frontiers in Neuroscience 10 (2016), 333.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Golmant Noah, Vemuri Nikita, Yao Zhewei, Feinberg Vladimir, Gholami Amir, Rothauge Kai, Mahoney Michael W., and Gonzalez Joseph. 2018. On the computational inefficiency of large batch sizes for stochastic gradient descent. arXiv preprint arXiv:1811.12941 (2018).Google ScholarGoogle Scholar
  29. [29] Golub Gene H. and Loan Charles F. Van. 2012. Matrix Computations. JHU Press.Google ScholarGoogle Scholar
  30. [30] Goyal Priya, Dollár Piotr, Girshick Ross, Noordhuis Pieter, Wesolowski Lukasz, Kyrola Aapo, Tulloch Andrew, Jia Yangqing, and He Kaiming. 2017. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).Google ScholarGoogle Scholar
  31. [31] Goyal Saurabh, Choudhury Anamitra Roy, and Sharma Vivek. 2019. Compression of deep neural networks by combining pruning and low rank decomposition. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’19). IEEE, Los Alamitos, CA, 952958.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Guan Naiyang, Tao Dacheng, Luo Zhigang, and Yuan Bo. 2012. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Transactions on Neural Networks and Learning Systems 23, 7 (2012), 10871099.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Gural Albert, Nadeau Phillip, Tikekar Mehul, and Murmann Boris. 2020. Low-rank training of deep neural networks for emerging memory technology. arXiv preprint arXiv:2009.03887 (2020).Google ScholarGoogle Scholar
  34. [34] Hardt Moritz and Price Eric. 2014. The noisy power method: A meta algorithm with applications. In Advances in Neural Information Processing Systems (NIPS’14). 28612869.Google ScholarGoogle Scholar
  35. [35] Hoskins Brian D., Daniels Mathew W., Huang Siyuan, Madhavan Advait, Adam Gina C., Zhitenev Nikolai, McClelland Jabez J., and Stiles Mark. 2019. Streaming batch eigenupdates for hardware neural networks. Frontiers in Neuroscience 13 (2019), 793.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Imani M., Peroni D., Rahimi A., and Rosing T.. 2016. Resistive CAM acceleration for tunable approximate computing. IEEE Transactions on Emerging Topics in Computing 7, 2 (2016), 271280.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Kataeva Irina, Ohtsuka Shigeki, Nili Hussein, Kim Hyungjin, Isobe Yoshihiko, Yako Koichi, and Strukov Dmitri. 2019. Towards the development of analog neuromorphic chip prototype with 2.4 M integrated memristors. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Kawahara Akifumi, Kawai Ken, Ikeda Yuuichirou, Katoh Yoshikazu, Azuma Ryotaro, Yoshimoto Yuhei, Tanabe Kouhei, et al. 2013. Filament scaling forming technique and level-verify-write scheme with endurance over \(10^7\) cycles in ReRAM. In Proceedings of the 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. IEEE, Los Alamitos, CA, 220221.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Krizhevsky Alex. 2009. Learning Multiple Layers of Features from Tiny Images. Master’s thesis. University of Toronto, Toronto, Canada.Google ScholarGoogle Scholar
  41. [41] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS’12). 10971105.Google ScholarGoogle Scholar
  42. [42] Lanczos Cornelius. 1950. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards 45, 4 (1950), 255282.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Li Chun-Liang, Lin Hsuan-Tien, and Lu Chi-Jen. 2016. Rivalry of two families of algorithms for memory-restricted streaming PCA. Proceedings of Machine Learning Research 51 (2016), 473481.Google ScholarGoogle Scholar
  44. [44] Li Mu, Andersen David G., Smola Alexander J., and Yu Kai. 2014. Communication efficient distributed machine learning with the parameter server. In Advances in Neural Information Processing Systems (NIPS’14). 1927.Google ScholarGoogle Scholar
  45. [45] Li Y., Kim S., Sun X., Solomon P., Gokmen T., Tsai H., Koswatta S., et al. 2018. Capacitor-based cross-point array for analog neural network with record symmetry and linearity. In Proceedings of the 2018 IEEE Symposium on VLSI Technology. IEEE, Los Alamitos, CA, 2526.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Lin Peng, Li Can, Wang Zhongrui, Li Yunning, Jiang Hao, Song Wenhao, Rao Mingyi, et al. 2020. Three-dimensional memristor circuits as complex neural networks. Nature Electronics 3, 4 (2020), 225232.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Lin Yujun, Han Song, Mao Huizi, Wang Yu, and Dally William J.. 2017. Deep gradient compression: Reducing the communication bandwidth for distributed training. ICLR 2018 proceedingsGoogle ScholarGoogle Scholar
  48. [48] Liu Dong C. and Nocedal Jorge. 1989. On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 1-3 (1989), 503528.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Liu Peng, You Zhiqiang, Wu Jigang, Elimu Michael, Wang Weizheng, Cai Shuo, and Han Yinhe. 2021. Defect analysis and parallel testing for 3D hybrid CMOS-memristor memory. IEEE Transactions on Emerging Topics in Computing 8, 2 (2021), 745758.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Mayer Ruben and Jacobsen Hans-Arno. 2020. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Computing Surveys 53, 1 (2020), 137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Mazzia Vittorio, Salvetti Francesco, and Chiaberge Marcello. 2021. Efficient-CapsNet: Capsule network with self-attention routing. Scientific Reports 11 (2021), Article 14634.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] McKinstry Jeffrey L., Esser Steven K., Appuswamy Rathinakumar, Bablani Deepika, Arthur John V., Yildiz Izzet B., and Modha Dharmendra S.. 2019. Discovering low-precision networks close to full-precision networks for efficient inference. In Proceedings of the 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS’19). IEEE, Los Alamitos, CA, 69.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Mitliagkas Ioannis, Caramanis Constantine, and Jain Prateek. 2013. Memory limited, streaming PCA. In Advances in Neural Information Processing Systems (NIPS’13). 28862894.Google ScholarGoogle Scholar
  54. [54] Novikov Alexander, Podoprikhin Dmitry, Osokin Anton, and Vetrov Dmitry. 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (NIPS’15).442450.Google ScholarGoogle Scholar
  55. [55] Oja Erkki. 1982. Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15, 3 (1982), 267273.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Oja Erkki. 1992. Principal components, minor components, and linear neural networks. Neural Networks 5, 6 (1992), 927935.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Oja Erkki and Karhunen Juha. 1985. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications 106, 1 (1985), 6984.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Pi Shuang, Li Can, Jiang Hao, Xia Weiwei, Xin Huolin, Yang Joshua J. Joshua, and Xia Qiangfei. 2020. Memristor crossbar arrays with 6-nm half-pitch and 2-nm critical dimension. Nature Nanotechnology 14, 1 (2020), 3539.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Prezioso Mirko, Merrikh-Bayat Farnood, Hoskins Brian D., Adam Gina C., Likharev Konstantin K., and Strukov Dmitri B.. 2015. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 7550 (2015), 6164.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Rahaman Nasim, Baratin Aristide, Arpit Devansh, Draxler Felix, Lin Min, Hamprecht Fred, Bengio Yoshua, and Courville Aaron. 2019. On the spectral bias of neural networks. In Proceedings of the International Conference on Machine Learning. 53015310.Google ScholarGoogle Scholar
  61. [61] Rubing Yang, Jialin Mao, and Pratik Chaudhari. 2021. Does the data induce capacity control in deep learning? In Proceedings of the International Conference on Machine Learning. 2516625197.Google ScholarGoogle Scholar
  62. [62] Scardapane Simone, Vaerenbergh Steven Van, Totaro Simone, and Uncini Aurelio. 2019. Kafnets: Kernel-based non-parametric activation functions for neural networks. Neural Networks 110 (2019), 1932.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Schreiber Robert and Loan Charles Van. 1989. A storage-efficient WY representation for products of householder transformations. SIAM Journal on Scientific and Statistical Computing 10, 1 (1989), 5357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Shallue Christopher J., Lee Jaehoon, Antognini Joseph, Sohl-Dickstein Jascha, Frostig Roy, and Dahl George E.. 2019. Measuring the effects of data parallelism on neural network training. Journal of Machine Learning Research 20 (2019), 149.Google ScholarGoogle Scholar
  65. [65] Smith Samuel L., Kindermans Pieter-Jan, Ying Chris, and Le Quoc V.. 2017. Don’t decay the learning rate, increase the batch size. ICLR 2018 proceedings. (2017)Google ScholarGoogle Scholar
  66. [66] Song Linghao, Qian Xuehai, Li Hai, and Chen Yiran. 2017. PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, Los Alamitos, CA, 541552.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Srivastava Nitish, Hinton Geoffrey, Krizhevsky Alex, Sutskever Ilya, and Salakhutdinov Ruslan. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 19291958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Strobach Peter. 1997. Bi-iteration SVD subspace tracking algorithms. IEEE Transactions on Signal Processing 45, 5 (1997), 12221240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Su Dong, Zhang Huan, Chen Hongge, Yi Jinfeng, Chen Pin-Yu, and Gao Yupeng. 2018. Is robustness the cost of accuracy?–A comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European Conference on Computer Vision (ECCV’18). 631648.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Suri Manan, Querlioz Damien, Bichler Olivier, Palma Giorgio, Vianello Elisa, Vuillaume Dominique, Gamrat Christian, and DeSalvo Barbara. 2013. Bio-inspired stochastic computing using binary CBRAM synapses. IEEE Transactions on Electron Devices 60, 7 (2013), 24022409.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need.In Advances in Neural Information Processing Systems(NIPS’17).Google ScholarGoogle Scholar
  72. [72] Vogels Thijs, Karinireddy Sai Praneeth, and Jaggi Martin. 2019. PowerSGD: Practical low-rank gradient compression for distributed optimization. In Advances In Neural Information Processing Systems (NIPS’19).Google ScholarGoogle Scholar
  73. [73] Wang Linnan, Wu Wei, Zhang Junyu, Liu Hang, Bosilca George, Herlihy Maurice, and Fonseca Rodrigo. 2020. FFT-based gradient sparsification for the distributed training of deep neural networks. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. 113124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Wang Mengdi, Meng Chen, Long Guoping, Wu Chuan, Yang Jun, Lin Wei, and Jia Yangqing. 2019. Characterizing deep learning training workloads on Alibaba-PAI. arXiv preprint arXiv:1910.05930 (2019).Google ScholarGoogle Scholar
  75. [75] Wang Zhongrui, Li Can, Lin Peng, Rao Mingyi, Nie Yongyang, Song Wenhao, Qiu Qinru, et al. 2019. In situ training of feed-forward and recurrent convolutional memristor networks. Nature Machine Intelligence 1 (2019), 434442.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Wen Wei, Xu Cong, Yan Feng, Wu Chunpeng, Wang Yandan, Chen Yiran, and Li Hai. 2017. TernGrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in Neural Information Processing Systems (NIPS’17). 15091519.Google ScholarGoogle Scholar
  77. [77] Xiao T. Patrick, Bennett Christopher H., Feinberg Ben, Agarwal Sapan, and Marinella Matthew J.. 2020. Analog architectures for neural network acceleration based on non-volatile memory. Applied Physics Reviews 7, 3 (2020), 031301.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Yang Bin. 1995. An extension of the PASTd algorithm to both rank and subspace tracking. IEEE Signal Processing Letters 2, 9 (1995), 179182.Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] Yang Puyudi, Hsieh Cho-Jui, and Wang Jane-Ling. 2018. History PCA: A new algorithm for streaming PCA. arXiv preprint arXiv:1802.05447 (2018).Google ScholarGoogle Scholar
  80. [80] Yu Shimeng, Shim Wonbo, Peng Xiaochen, and Luo Yandong. 2021. RRAM for compute-in-memory: From inference to training. IEEE Transactions on Circuits and Systems I 68, 7 (2021), 27532765.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Zehui Lin, Liu Pengfei, Huang Luyao, Chen Junkun, Qiu Xipeng, and Huang Xuanjing. 2019. DropAttention: A regularization method for fully-connected self-attention networks. arXiv preprint arXiv:1907.11065 (2019).Google ScholarGoogle Scholar

Index Terms

  1. Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 19, Issue 2
      April 2023
      214 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3587888
      • Editor:
      • Ramesh Karri
      Issue’s Table of Contents

      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 May 2023
      • Online AM: 13 January 2023
      • Accepted: 30 November 2022
      • Revised: 27 September 2022
      • Received: 17 August 2021
      Published in jetc Volume 19, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format