Abstract
Deep Neural Networks (DNNs) have demonstrated great success in many fields such as image recognition and text analysis. However, the ever-increasing sizes of both DNN models and training datasets make deep leaning extremely computation- and memory-intensive. Recently, photonic computing has emerged as a promising technology for accelerating DNNs. While the design of photonic accelerators for DNN inference and forward propagation of DNN training has been widely investigated, the architectural acceleration for equally important backpropagation of DNN training has not been well studied. In this paper, we propose a novel silicon photonic-based backpropagation accelerator for high performance DNN training. Specifically, a general-purpose photonic gradient descent unit named STADIA is designed to implement the multiplication, accumulation, and subtraction operations required for computing gradients using mature optical devices including Mach-Zehnder Interferometer (MZI) and Mircoring Resonator (MRR), which can significantly reduce the training latency and improve the energy efficiency of backpropagation. To demonstrate efficient parallel computing, we propose a STADIA-based backpropagation acceleration architecture and design a dataflow by using wavelength-division multiplexing (WDM). We analyze the precision of STADIA by quantifying the precision limitations imposed by losses and noises. Furthermore, we evaluate STADIA with different element sizes by analyzing the power, area and time delay for photonic accelerators based on DNN models such as AlexNet, VGG19 and ResNet. Simulation results show that the proposed architecture STADIA can achieve significant improvement by 9.7× in time efficiency and 147.2× in energy efficiency, compared with the most advanced optical-memristor based backpropagation accelerator.
- [1] . 2012. 12.5-Gb/s operation with 0.29-V· cm V \(\pi\) L using silicon mach-zehnder modulator based-on forward-biased pin diode. Optics Express 20, 3 (2012), 2911–2923.Google ScholarCross Ref
- [2] . 2022. Scaling up silicon photonic-based accelerators: Challenges and opportunities. APL Photonics 7, 2 (2022), 020902.Google ScholarCross Ref
- [3] . 2020. Optical RAM and integrated optical memories: A survey. Light: Science & Applications 9, 1 (2020), 1–16.Google ScholarCross Ref
- [4] . 2017. A linear differential transimpedance amplifier for 100-Gb/s integrated coherent optical fiber receivers. IEEE Transactions on Microwave Theory and Techniques 66, 2 (2017), 973–986.Google ScholarCross Ref
- [5] . 2018. The emergence of silicon photonics as a flexible technology platform. Proc. IEEE 106, 12 (2018), 2101–2116.Google ScholarCross Ref
- [6] . 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google Scholar
- [7] . 2021. BPLight-CNN: A photonics-based backpropagation accelerator for deep learning. ACM Journal on Emerging Technologies in Computing Systems (JETC) 17, 4 (2021), 1–26.Google ScholarDigital Library
- [8] . 2020. BPhoton-CNN: An ultrafast photonic backpropagation accelerator for deep learning. In Proceedings of the Great Lakes Symposium on VLSI. 27–32.Google Scholar
- [9] . 2020. MEMTONIC: A neuromorphic accelerator for energy efficient deep learning. In 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–2.Google ScholarCross Ref
- [10] . 2017. Understanding and optimizing asynchronous low-precision stochastic gradient descent. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 561–574.Google ScholarDigital Library
- [11] . 2012. Large scale distributed deep networks. Advances in Neural Information Processing Systems 25 (2012), 1223–1231.Google Scholar
- [12] . 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 248–255.Google ScholarCross Ref
- [13] . 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.Google ScholarCross Ref
- [14] . 2018. All-optical reservoir computing on a photonic chip using silicon-based ring resonators. IEEE Journal of Selected Topics in Quantum Electronics 24, 6 (2018), 1–8.Google ScholarCross Ref
- [15] . 2013. Heterogeneously integrated III-V/Si distributed Bragg reflector laser with adiabatic coupling. In 39th European Conference and Exhibition on Optical Communication (ECOC 2013). IET, 1–3.Google ScholarCross Ref
- [16] . 2019. Design of optical neural networks with component imprecisions. Optics Express 27, 10 (2019), 14009–14029.Google ScholarCross Ref
- [17] . 2021. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Google Scholar
- [18] . 2021. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 7840 (2021), 52–58.Google ScholarCross Ref
- [19] . 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.Google ScholarDigital Library
- [20] . 2021. Backpropagation through nonlinear units for the all-optical training of neural networks. Photonics Research 9, 3 (2021), B71–B80.Google ScholarCross Ref
- [21] . 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
- [22] . 2016. The alternative machine paradigm for energy-efficient computing. ResearchGate 113 (2016), 1–13. Google ScholarCross Ref
- [23] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [24] . 2019. Introduction to Fiber-optic Communications. Academic Press.Google Scholar
- [25] . 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.Google ScholarCross Ref
- [26] . 2015. Stochastic gradient descent on GPUs. In Proceedings of the 8th Workshop on General Purpose Processing using GPUs. 81–89.Google ScholarDigital Library
- [27] . 2021. Photonic Supercomputer For AI: 10X faster, 90% less energy, plus runway for 100X speed boost. Forbes (2021). https://www.forbes.com/sites/johnkoetsier/2021/04/07/photonic-supercomputer-for-ai-10x-faster-90-less-energy-plus-runway-for-100x-speed-boost/?sh=4589d9b67260Google Scholar
- [28] . 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 84–90.Google Scholar
- [29] . 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
- [30] . 2022. Matrix eigenvalue solver based on reconfigurable photonic neural network. Nanophotonics 11, 17 (2022), 4089–4099.Google ScholarCross Ref
- [31] . 2018. All-optical machine learning using diffractive deep neural networks. Science 361, 6406 (2018), 1004–1008.Google ScholarCross Ref
- [32] . 2022. A threshold-based bioluminescence detector with a CMOS-integrated photodiode array in 65 nm for a multi-diagnostic ingestible capsule. IEEE Journal of Solid-State Circuits (2022).Google Scholar
- [33] . 2018. PCNNA: A photonic convolutional neural network accelerator. In 2018 31st International System-on-Chip Conference. IEEE, 169–173.Google ScholarCross Ref
- [34] . 2012. Integrated Optical Interconnect Architectures for Embedded Systems. Springer Science & Business Media.Google ScholarDigital Library
- [35] . 2023. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science 380, 6643 (2023), 398–404.Google ScholarCross Ref
- [36] . 2014. FinCACTI: Architectural analysis and modeling of caches with deeply-scaled FinFET devices. In 2014 IEEE Computer Society Annual Symposium on VLSI. 290–295.Google ScholarDigital Library
- [37] . 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14–26.Google ScholarDigital Library
- [38] . 2017. Deep learning with coherent nanophotonic circuits. Nature Photonics 11, 7 (2017), 441–446.Google ScholarCross Ref
- [39] . 2021. Albireo: Energy-efficient acceleration of convolutional neural networks via silicon photonics. In ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 860–873.Google Scholar
- [40] . 2020. PIXEL: Photonic neural network accelerator. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 474–487.Google ScholarCross Ref
- [41] . 2019. A single layer neural network implemented by a \(4 \times 4\) MZI-based optical processor. IEEE Photonics Journal 11, 6 (2019), 1–12.Google ScholarCross Ref
- [42] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014), 1–14.Google Scholar
- [43] . 2009. On the limits of communication with low-precision analog-to-digital conversion at the receiver. IEEE Transactions on Communications 57, 12 (2009), 3629–3639.Google ScholarCross Ref
- [44] . 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 541–552.Google ScholarCross Ref
- [45] . 2019. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- [46] . 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
- [47] . 2014. Broadcast and weight: An integrated network for scalable photonic spike processing. Journal of Lightwave Technology 32, 21 (2014), 3427–3439.Google ScholarCross Ref
- [48] . 2023. Texas Instruments ADS1285 32-Bit Low-Power ADC. https://nz.mouser.com/new/texas-instruments/ti-ads1285-low-power-adc/Google Scholar
- [49] . 2020. Compact ultrabroad-bandwidth cascaded arrayed waveguide gratings. Optics Express 28, 10 (2020), 14618–14626.Google ScholarCross Ref
- [50] . 2021. High-performance lasers for fully integrated silicon nitride photonics. Nature Communications 12, 1 (2021), 6650.Google ScholarCross Ref
- [51] . 2021. A review: Photonics devices, architectures, and algorithms for optical neural computing. Journal of Semiconductors 42, 2 (2021), 023105.Google ScholarCross Ref
- [52] . 2017. CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 79–92.Google ScholarDigital Library
- [53] . 2020. Accelerating stochastic gradient descent based matrix factorization on FPGA. IEEE Transactions on Parallel and Distributed Systems 31, 8 (2020), 1897–1911.Google ScholarCross Ref
- [54] . 2020. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1969–1979.Google ScholarCross Ref
Index Terms
- STADIA: Photonic Stochastic Gradient Descent for Neural Network Accelerators
Recommendations
DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics
ICPP '20: Proceedings of the 49th International Conference on Parallel ProcessingDeep Neural Networks (DNNs) are currently used in many fields, including critical real-time applications. Due to its compute-intensive nature, speeding up DNNs has become an important topic in current research. We propose a hybrid opto-electronic ...
Coarse grain parallelization of deep neural networks
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingDeep neural networks (DNN) have recently achieved extraordinary results in domains like computer vision and speech recognition. An essential element for this success has been the introduction of high performance computing (HPC) techniques in the ...
Stochastic gradient descent on GPUs
GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUsIrregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scale-...
Comments