skip to main content
research-article

STADIA: Photonic Stochastic Gradient Descent for Neural Network Accelerators

Published:09 September 2023Publication History
Skip Abstract Section

Abstract

Deep Neural Networks (DNNs) have demonstrated great success in many fields such as image recognition and text analysis. However, the ever-increasing sizes of both DNN models and training datasets make deep leaning extremely computation- and memory-intensive. Recently, photonic computing has emerged as a promising technology for accelerating DNNs. While the design of photonic accelerators for DNN inference and forward propagation of DNN training has been widely investigated, the architectural acceleration for equally important backpropagation of DNN training has not been well studied. In this paper, we propose a novel silicon photonic-based backpropagation accelerator for high performance DNN training. Specifically, a general-purpose photonic gradient descent unit named STADIA is designed to implement the multiplication, accumulation, and subtraction operations required for computing gradients using mature optical devices including Mach-Zehnder Interferometer (MZI) and Mircoring Resonator (MRR), which can significantly reduce the training latency and improve the energy efficiency of backpropagation. To demonstrate efficient parallel computing, we propose a STADIA-based backpropagation acceleration architecture and design a dataflow by using wavelength-division multiplexing (WDM). We analyze the precision of STADIA by quantifying the precision limitations imposed by losses and noises. Furthermore, we evaluate STADIA with different element sizes by analyzing the power, area and time delay for photonic accelerators based on DNN models such as AlexNet, VGG19 and ResNet. Simulation results show that the proposed architecture STADIA can achieve significant improvement by 9.7× in time efficiency and 147.2× in energy efficiency, compared with the most advanced optical-memristor based backpropagation accelerator.

REFERENCES

  1. [1] Akiyama Suguru, Baba Takeshi, Imai Masahiko, Akagawa Takeshi, Takahashi Masashi, Hirayama Naoki, Takahashi Hiroyuki, Noguchi Yoshiji, Okayama Hideaki, Horikawa Tsuyoshi, et al. 2012. 12.5-Gb/s operation with 0.29-V· cm V \(\pi\) L using silicon mach-zehnder modulator based-on forward-biased pin diode. Optics Express 20, 3 (2012), 29112923.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Al-Qadasi MA, Chrostowski L, Shastri BJ, and Shekhar S. 2022. Scaling up silicon photonic-based accelerators: Challenges and opportunities. APL Photonics 7, 2 (2022), 020902.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Alexoudi Theoni, Kanellos George Theodore, and Pleros Nikos. 2020. Optical RAM and integrated optical memories: A survey. Light: Science & Applications 9, 1 (2020), 116.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Awny Ahmed, Nagulapalli Rajasekhar, Kroh Marcel, Hoffmann Jan, Runge Patrick, Micusik Daniel, Fischer Gunter, Ulusoy Ahmet Cagri, Ko Minsu, and Kissinger Dietmar. 2017. A linear differential transimpedance amplifier for 100-Gb/s integrated coherent optical fiber receivers. IEEE Transactions on Microwave Theory and Techniques 66, 2 (2017), 973986.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Xia, Milosevic Milan M, Stanković Stevan, Reynolds Scott, Bucio Thalia Dominguez, Li Ke, Thomson David J, Gardes Frederic, and Reed Graham T. 2018. The emergence of silicon photonics as a flexible technology platform. Proc. IEEE 106, 12 (2018), 21012116.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chetlur Sharan, Woolley Cliff, Vandermersch Philippe, Cohen Jonathan, Tran John, Catanzaro Bryan, and Shelhamer Evan. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google ScholarGoogle Scholar
  7. [7] Dang Dharanidhar, Chittamuru Sai Vineel Reddy, Pasricha Sudeep, Mahapatra Rabi, and Sahoo Debashis. 2021. BPLight-CNN: A photonics-based backpropagation accelerator for deep learning. ACM Journal on Emerging Technologies in Computing Systems (JETC) 17, 4 (2021), 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Dang Dharanidhar, Khansama Aurosmita, Mahapatra Rabi, and Sahoo Debashis. 2020. BPhoton-CNN: An ultrafast photonic backpropagation accelerator for deep learning. In Proceedings of the Great Lakes Symposium on VLSI. 2732.Google ScholarGoogle Scholar
  9. [9] Dang Dharanidhar, Taheri Sahar, Lin Bill, and Sahoo Debashis. 2020. MEMTONIC: A neuromorphic accelerator for energy efficient deep learning. In 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 12.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Sa Christopher De, Feldman Matthew, Ré Christopher, and Olukotun Kunle. 2017. Understanding and optimizing asynchronous low-precision stochastic gradient descent. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 561574.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Dean Jeffrey, Corrado Greg, Monga Rajat, Chen Kai, Devin Matthieu, Mao Mark, Ranzato Marc’aurelio, Senior Andrew, Tucker Paul, Yang Ke, et al. 2012. Large scale distributed deep networks. Advances in Neural Information Processing Systems 25 (2012), 12231231.Google ScholarGoogle Scholar
  12. [12] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Deng Li. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141142.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Coarer Florian Denis-Le, Sciamanna Marc, Katumba Andrew, Freiberger Matthias, Dambre Joni, Bienstman Peter, and Rontani Damien. 2018. All-optical reservoir computing on a photonic chip using silicon-based ring resonators. IEEE Journal of Selected Topics in Quantum Electronics 24, 6 (2018), 18.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Descos A, Jany C, Bordel D, Duprez H, Farias G Beninca de, Brianceau P, Menezo S, and Bakir B Ben. 2013. Heterogeneously integrated III-V/Si distributed Bragg reflector laser with adiabatic coupling. In 39th European Conference and Exhibition on Optical Communication (ECOC 2013). IET, 13.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Fang Michael Y-S, Manipatruni Sasikanth, Wierzynski Casimir, Khosrowshahi Amir, and DeWeese Michael R. 2019. Design of optical neural networks with component imprecisions. Optics Express 27, 10 (2019), 1400914029.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Fedus William, Zoph Barret, and Shazeer Noam. 2021. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Google ScholarGoogle Scholar
  18. [18] Feldmann Johannes, Youngblood Nathan, Karpov Maxim, Gehring Helge, Li Xuan, Stappers Maik, Gallo Manuel Le, Fu Xin, Lukashchuk Anton, Raja Arslan Sajid, et al. 2021. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 7840 (2021), 5258.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Floridi Luciano and Chiriatti Massimo. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681694.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Guo Xianxin, Barrett Thomas D, Wang Zhiming M, and Lvovsky AI. 2021. Backpropagation through nonlinear units for the all-optical training of neural networks. Photonics Research 9, 3 (2021), B71–B80.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Han Song, Mao Huizi, and Dally William J. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  22. [22] Hartenstein Reiner. 2016. The alternative machine paradigm for energy-efficient computing. ResearchGate 113 (2016), 113. Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hui Rongqing. 2019. Introduction to Fiber-optic Communications. Academic Press.Google ScholarGoogle Scholar
  25. [25] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27042713.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kaleem Rashid, Pai Sreepathi, and Pingali Keshav. 2015. Stochastic gradient descent on GPUs. In Proceedings of the 8th Workshop on General Purpose Processing using GPUs. 8189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Koetsier John. 2021. Photonic Supercomputer For AI: 10X faster, 90% less energy, plus runway for 100X speed boost. Forbes (2021). https://www.forbes.com/sites/johnkoetsier/2021/04/07/photonic-supercomputer-for-ai-10x-faster-90-less-energy-plus-runway-for-100x-speed-boost/?sh=4589d9b67260Google ScholarGoogle Scholar
  28. [28] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 8490.Google ScholarGoogle Scholar
  29. [29] LeCun Yann, Bottou Léon, Bengio Yoshua, and Haffner Patrick. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 22782324.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Liao Kun, Li Chentong, Dai Tianxiang, Zhong Chuyu, Lin Hongtao, Hu Xiaoyong, and Gong Qihuang. 2022. Matrix eigenvalue solver based on reconfigurable photonic neural network. Nanophotonics 11, 17 (2022), 40894099.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Lin Xing, Rivenson Yair, Yardimci Nezih T, Veli Muhammed, Luo Yi, Jarrahi Mona, and Ozcan Aydogan. 2018. All-optical machine learning using diffractive deep neural networks. Science 361, 6406 (2018), 10041008.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Liu Qijun, Jimenez Miguel, Inda Maria Eugenia, Riaz Arslan, Zirtiloglu Timur, Chandrakasan Anantha P, Lu Timothy K, Traverso Giovanni, Nadeau Phillip, and Yazicigil Rabia Tugce. 2022. A threshold-based bioluminescence detector with a CMOS-integrated photodiode array in 65 nm for a multi-diagnostic ingestible capsule. IEEE Journal of Solid-State Circuits (2022).Google ScholarGoogle Scholar
  33. [33] Mehrabian Armin, Al-Kabani Yousra, Sorger Volker J, and El-Ghazawi Tarek. 2018. PCNNA: A photonic convolutional neural network accelerator. In 2018 31st International System-on-Chip Conference. IEEE, 169173.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] O’Connor Ian and Nicolescu Gabriela. 2012. Integrated Optical Interconnect Architectures for Embedded Systems. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Pai Sunil, Sun Zhanghao, Hughes Tyler W, Park Taewon, Bartlett Ben, Williamson Ian AD, Minkov Momchil, Milanizadeh Maziyar, Abebe Nathnael, Morichetti Francesco, et al. 2023. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science 380, 6643 (2023), 398404.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Shafaei Alireza, Wang Yanzhi, and Lin Xue. 2014. FinCACTI: Architectural analysis and modeling of caches with deeply-scaled FinFET devices. In 2014 IEEE Computer Society Annual Symposium on VLSI. 290295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Shafiee Ali, Nag Anirban, Muralimanohar Naveen, Balasubramonian Rajeev, Strachan John Paul, Hu Miao, and Williams R Stanley. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 1426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Shen Yichen, Harris Nicholas C, Skirlo Scott, Prabhu Mihika, Baehr-Jones Tom, Hochberg Michael, Sun Xin, Zhao Shijie, Larochelle Hugo, Englund Dirk, et al. 2017. Deep learning with coherent nanophotonic circuits. Nature Photonics 11, 7 (2017), 441446.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Shiflett Kyle, Karanth Avinash, Bunescu Razvan, and Louri Ahmed. 2021. Albireo: Energy-efficient acceleration of convolutional neural networks via silicon photonics. In ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 860873.Google ScholarGoogle Scholar
  40. [40] Shiflett Kyle, Wright Dylan, Karanth Avinash, and Louri Ahmed. 2020. PIXEL: Photonic neural network accelerator. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 474487.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Shokraneh Farhad, Geoffroy-Gagnon Simon, Nezami Mohammadreza Sanadgol, and Liboiron-Ladouceur Odile. 2019. A single layer neural network implemented by a \(4 \times 4\) MZI-based optical processor. IEEE Photonics Journal 11, 6 (2019), 112.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014), 114.Google ScholarGoogle Scholar
  43. [43] Singh Jaspreet, Dabeer Onkar, and Madhow Upamanyu. 2009. On the limits of communication with low-precision analog-to-digital conversion at the receiver. IEEE Transactions on Communications 57, 12 (2009), 36293639.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Song Linghao, Qian Xuehai, Li Hai, and Chen Yiran. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 541552.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Sun Xiao, Choi Jungwook, Chen Chia-Yu, Wang Naigang, Venkataramani Swagath, Srinivasan Vijayalakshmi Viji, Cui Xiaodong, Zhang Wei, and Gopalakrishnan Kailash. 2019. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  46. [46] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Tait Alexander N, Nahmias Mitchell A, Shastri Bhavin J, and Prucnal Paul R. 2014. Broadcast and weight: An integrated network for scalable photonic spike processing. Journal of Lightwave Technology 32, 21 (2014), 34273439.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Texas. 2023. Texas Instruments ADS1285 32-Bit Low-Power ADC. https://nz.mouser.com/new/texas-instruments/ti-ads1285-low-power-adc/Google ScholarGoogle Scholar
  49. [49] Wijk Arthur van, Doerr Christopher R, Ali Zain, Karabiyik Mustafa, and Akca B Imran. 2020. Compact ultrabroad-bandwidth cascaded arrayed waveguide gratings. Optics Express 28, 10 (2020), 1461814626.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Xiang Chao, Guo Joel, Jin Warren, Wu Lue, Peters Jonathan, Xie Weiqiang, Chang Lin, Shen Boqiang, Wang Heming, Yang Qi-Fan, et al. 2021. High-performance lasers for fully integrated silicon nitride photonics. Nature Communications 12, 1 (2021), 6650.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Xiang Shuiying, Han Yanan, Song Ziwei, Guo Xingxing, Zhang Yahui, Ren Zhenxing, Wang Suhong, Ma Yuanting, Zou Weiwen, Ma Bowen, et al. 2021. A review: Photonics devices, architectures, and algorithms for optical neural computing. Journal of Semiconductors 42, 2 (2021), 023105.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Xie Xiaolong, Tan Wei, Fong Liana L, and Liang Yun. 2017. CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 7992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhou Shijie, Kannan Rajgopal, and Prasanna Viktor. 2020. Accelerating stochastic gradient descent based matrix factorization on FPGA. IEEE Transactions on Parallel and Distributed Systems 31, 8 (2020), 18971911.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhu Feng, Gong Ruihao, Yu Fengwei, Liu Xianglong, Wang Yanfei, Li Zhelong, Yang Xiuqi, and Yan Junjie. 2020. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19691979.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. STADIA: Photonic Stochastic Gradient Descent for Neural Network Accelerators

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 22, Issue 5s
        Special Issue ESWEEK 2023
        October 2023
        1394 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3614235
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 September 2023
        • Accepted: 1 July 2023
        • Revised: 2 June 2023
        • Received: 23 March 2023
        Published in tecs Volume 22, Issue 5s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)353
        • Downloads (Last 6 weeks)40

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text