ABSTRACT
In chiplet-based heterogeneous architectures, electrical network-on-package (NoP) designs are typically over-provisioned with routers and channels to provide sufficient bandwidth during periods of high network load. Observing that there are significant periods of low/idle network utilization, prior work has proposed modified network-on-chip (NoC) architectures to enable in-network compute, especially for compute-intensive operations (e.g. linear algebra). However, electrical package-level interconnects impose fundamental energy and bandwidth scaling issues for future chiplet architectures.
This paper proposes Flumen, a dual-purpose photonic interconnect that provides communication at the package-level while also doubling as an accelerator, performing parallel linear computation when network load is low. The proposed architecture utilizes the inherent parallelism of light to create energy-efficient interconnects that support en route computation with minimal changes to the network. By dynamically adjusting the topology, Flumen can change the communication and compute sections of the architecture to adapt to workload fluctuations. Performance evaluation on linear algebra applications shows that Flumen achieves a 2.5× reduction in energy, a 3.6× speedup improvement, and a 9.3× reduction in energy-delay product on average when compared to an electrical mesh network that is used exclusively for communication.
- Theonitsa Alexoudi, Nikolaos Terzenidis, Stelios Pitris, Miltiadis Moralis-Pegios, Pavlos Maniotis, Christos Vagionas, Charoula Mitsolidou, George Mourgias-Alexandris, George T. Kanellos, Amalia Miliou, Konstantinos Vyrsokinos, and Nikos Pleros. 2019. Optics in Computing: From Photonic Network-on-Chip to Chip-to-Chip Interconnects and Disintegrated Architectures. Journal of Lightwave Technology 37, 2 (2019), 363--379. Google ScholarCross Ref
- Viraj Bangari, Bicky A. Marquez, Heidi Miller, Alexander N. Tait, Mitchell A. Nahmias, Thomas Ferreira de Lima, Hsuan-Tung Peng, Paul R. Prucnal, and Bhavin J. Shastri. 2020. Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs). IEEE Journal of Selected Topics in Quantum Electronics 26, 1 (2020), 1--13. Google ScholarCross Ref
- W. Bogaerts, P. De Heyn, T. Van Vaerenbergh, K. De Vos, S. Kumar Selvaraja, T. Claes, P. Dumon, P. Bienstman, D. Van Thourhout, and R. Baets. 2012. Silicon microring resonators. Laser & Photonics Reviews 6, 1 (2012), 47--73. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/lpor.201100017 Google ScholarCross Ref
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdfGoogle Scholar
- Pietro Caragiulo, Oscar Elisio Mattia, Amin Arbabian, and Boris Murmann. 2020. A Compact 14 GS/s 8-Bit Switched-Capacitor DAC in 16 nm FinFET CMOS. In 2020 IEEE Symposium on VLSI Circuits. 1--2. Google ScholarCross Ref
- Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An Evaluation of High-Level Mechanistic Core Models. ACM Transactions on Architecture and Code Optimization (TACO), Article 5 (2014), 23 pages. Google ScholarDigital Library
- Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High Performance Convolutional Neural Networks for Document Processing. In Tenth International Workshop on Frontiers in Handwriting Recognition, Guy Lorette (Ed.). Université de Rennes 1, Suvisoft, La Baule (France). https://hal.inria.fr/inria-00112631 http://www.suvisoft.com.Google Scholar
- Sai Vineel Reddy Chittamuru, Srinivas Desai, and Sudeep Pasricha. 2017. SWIFT-NoC: A Reconfigurable Silicon-Photonic Network with Multicast-Enabled Channel Sharing for Multicore Architectures. J. Emerg. Technol. Comput. Syst. 13, 4, Article 58 (jun 2017), 27 pages. Google ScholarDigital Library
- Lukas Chrostowski, Zeqin Lu, Jonas Flueckiger, Xu Wang, Jackson Klein, Amy Liu, Jaspreet Jhoja, and James Pond. 2016. Design and simulation of silicon photonic schematics and layouts. In Silicon Photonics and Photonic Integrated Circuits V, Laurent Vivien, Lorenzo Pavesi, and Stefano Pelli (Eds.), Vol. 9891. International Society for Optics and Photonics, SPIE, 185 -- 195. Google ScholarCross Ref
- William R. Clements, Peter C. Humphreys, Benjamin J. Metcalf, W. Steven Kolthammer, and Ian A. Walmsley. 2016. Optimal design for universal multiport interferometers. Optica 3, 12 (Dec 2016), 1460--1465. Google ScholarCross Ref
- Cansu Demirkiran, Furkan Eris, Gongyu Wang, Jonathan Elmhurst, Nick Moore, Nicholas C. Harris, Ayon Basumallik, Vijay Janapa Reddi, Ajay Joshi, and Darius Bunandar. 2021. An Electro-Photonic System for Accelerating Deep Neural Networks. Google ScholarCross Ref
- Hadi Esmaeilzadeh, Emily Blem, Renée St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 365--376.Google ScholarDigital Library
- Darjn Esposito, Antonio Giuseppe Maria Strollo, Ettore Napoli, Davide De Caro, and Nicola Petra. 2018. Approximate Multipliers Based on New Approximate Compressors. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 12 (2018), 4169--4182. Google ScholarCross Ref
- Mingqiang Guo, Jiaji Mao, Sai-Weng Sin, Hegong Wei, and Rui P. Martins. 2020. A 5 GS/s 29 mW Interleaved SAR ADC With 48.5 dB SNDR Using Digital-Mixing Background Timing-Skew Calibration for Direct Sampling Applications. IEEE Access 8 (2020), 138944--138954. Google ScholarCross Ref
- Ryan Hamerly, Saumil Bandyopadhyay, and Dirk Englund. 2021. Accurate Self-Configuration of Rectangular Multiport Interferometers. CoRR abs/2106.03249 (2021). arXiv:2106.03249 https://arxiv.org/abs/2106.03249Google Scholar
- Nicholas C. Harris, Jacques Carolan, Darius Bunandar, Mihika Prabhu, Michael Hochberg, Tom Baehr-Jones, Michael L. Fanto, A. Matthew Smith, Christopher C. Tison, Paul M. Alsing, and Dirk Englund. 2018. Linear programmable nanophotonic processors. Optica 5, 12 (Dec 2018), 1623--1631. Google ScholarCross Ref
- Jiayi Huang, Ramprakash Reddy Puli, Pritam Majumder, Sungkeun Kim, Rahul Boyapati, Ki Hwan Yum, and Eun Jung Kim. 2019. Active-Routing: Compute on the Way for Near-Data Processing. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 674--686. Google ScholarCross Ref
- Subramanian S. Iyer. 2016. Heterogeneous Integration for Performance and Scaling. IEEE Transactions on Components, Packaging and Manufacturing Technology 6, 7 (2016), 973--982. Google ScholarCross Ref
- Natalie Enright Jerger, Ajaykumar Kannan, Zimo Li, and Gabriel H. Loh. 2014. NoC Architectures for Silicon Interposer Systems: Why Pay for more Wires when you Can Get them (from your interposer) for Free?. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 458--470. Google ScholarDigital Library
- Nan Jiang, Daniel U. Becker, George Michelogiannakis, James Balfour, Brian Towles, D. E. Shaw, John Kim, and William J. Dally. 2013. A detailed and flexible cycle-accurate Network-on-Chip simulator. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 86--96. Google ScholarCross Ref
- Ajaykumar Kannan, Natalie Enright Jerger, and Gabriel H. Loh. 2015. Enabling interposer-based disintegration of multi-core processors. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 546--558. Google ScholarDigital Library
- Ammar Karkar, Terrence Mak, Kin-Fai Tong, and Alex Yakovlev. 2016. A Survey of Emerging Interconnects for On-Chip Efficient Multicast and Broadcast in Many-Cores. IEEE Circuits and Systems Magazine 16, 1 (2016), 58--72. Google ScholarCross Ref
- Cheng Li, Mark Browning, Paul V. Gratz, and Samuel Palermo. 2012. LumiNOC: A power-efficient, high-performance, photonic network-on-chip for future parallel architectures. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 421--422.Google ScholarDigital Library
- Guoliang Li, Ashok V. Krishnamoorthy, Ivan Shubin, Jin Yao, Ying Luo, Hiren Thacker, Xuezhe Zheng, Kannan Raj, and John E. Cunningham. 2013. Ring Resonator Modulators in Silicon for Interchip Photonic Links. IEEE Journal of Selected Topics in Quantum Electronics 19, 6 (2013), 95--113. Google ScholarCross Ref
- Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480.Google Scholar
- Zeqin Lu, Dritan Celo, Patrick Dumais, Eric Bernier, and Lukas Chrostowski. 2015. Comparison of photonic 2×2 3-dB couplers for 220 nm silicon-on-insulator platforms. In 2015 IEEE 12th International Conference on Group IV Photonics (GFP). 57--58. Google ScholarCross Ref
- Lumerical Inc. [n. d.]. https://www.lumerical.com/products/Google Scholar
- Armin Mehrabian, Yousra Al-Kabani, Volker J Sorger, and Tarek El-Ghazawi. 2018. PCNNA: A Photonic Convolutional Neural Network Accelerator. In 2018 31st IEEE International System-on-Chip Conference (SOCC). 169--173. Google ScholarCross Ref
- D. A. B. Miller. 2017. Attojoule Optoelectronics for Low-Energy Information Processing and Communications. Journal of Lightwave Technology 35, 3 (2017), 346--396. Google ScholarCross Ref
- Randy Morris, Evan Jolley, and Avinash Karanth Kodi. 2014. Extending the Performance and Energy-Efficiency of Shared Memory Multicores with Nanophotonic Technology. IEEE Transactions on Parallel and Distributed Systems 25, 1 (2014), 83--92. Google ScholarDigital Library
- Randy Morris, Avinash Karanth Kodi, and Ahmed Louri. 2012. Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. 282--293. Google ScholarDigital Library
- Samuel Naffziger, Noah Beck, Thomas Burd, Kevin Lepak, Gabriel H. Loh, Mahesh Subramony, and Sean White. 2021. Pioneering Chiplet Technology and Design for the AMD EPYC™ and Ryzen™ Processor Families : Industrial Product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 57--70. Google ScholarDigital Library
- Sunil Pai, Ben Bartlett, Olav Solgaard, and David A. B. Miller. 2019. Matrix Optimization on Universal Unitary Photonic Devices. Phys. Rev. Applied 11 (Jun 2019), 064044. Issue 6. Google ScholarCross Ref
- Hyunchul Park, Yongjun Park, and Scott Mahlke. 2009. Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution for Mobile Multimedia Applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (New York, New York) (MICRO 42). Association for Computing Machinery, New York, NY, USA, 370--380. Google ScholarDigital Library
- Jiaxin Peng, Yousra Alkabani, Shuai Sun, Volker J. Sorger, and Tarek El-Ghazawi. 2020. DNNARA: A Deep Neural Network Accelerator Using Residue Arithmetic and Integrated Photonics. In 49th International Conference on Parallel Processing-ICPP (Edmonton, AB, Canada) (ICPP '20). Association for Computing Machinery, NewYork, NY, USA, Article 61, 11 pages. Google ScholarDigital Library
- Robert Polster, Yvain Thonnart, Guillaume Waltener, José-Luis Gonzalez, and Eric Cassan. 2016. Efficiency Optimization of Silicon Photonic Links in 65-nm CMOS and 28-nm FDSOI Technology Nodes. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 12 (2016), 3450--3459. Google ScholarDigital Library
- John W. Poulton, John M. Wilson, Walker J. Turner, Brian Zimmer, Xi Chen, Sudhir S. Kudva, Sanquan Song, Stephen G. Tell, Nikola Nedovic, Wenxu Zhao, Sunil R. Sudhakaran, C. Thomas Gray, and William J. Dally. 2019. A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator. IEEE Journal of Solid-State Circuits 54, 1 (2019), 43--54. Google ScholarCross Ref
- Michal Rakowski, Julien Ryckaert, Marianna Pantouvaki, Hui Yu, Wim Bogaerts, Kristin de Meyer, Michiel Steyaert, Philippe P. Absil, and Joris Van Campenhout. 2012. Low-Power, 10-Gbps 1.5-Vpp differential CMOS driver for a silicon electro-optic ring modulator. In Proceedings of the IEEE 2012 Custom Integrated Circuits Conference. 1--6. Google ScholarCross Ref
- Karthik Sangaiah, Michael Lui, Ragh Kuttappa, Baris Taskin, and Mark Hempstead. 2020. SnackNoC: Processing in the Communication Layer. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 461--473. Google ScholarCross Ref
- Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (Columbus, OH, USA) (MICRO '52). Association for Computing Machinery, New York, NY, USA, 14--27. Google ScholarDigital Library
- Yichen Shen, Nicholas C. Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, and Marin Soljačić. 2017. Deep learning with coherent nanophotonic circuits. Nature Photonics 11, 7 (2017), 441--446. Google ScholarCross Ref
- Zhen Sheng, Liu Liu, Joost Brouckaert, Sailing He, and Dries Van Thourhout. 2010. InGaAs PIN photodetectors integrated on silicon-on-insulator waveguides. Opt. Express 18, 2 (Jan 2010), 1756--1761. Google ScholarCross Ref
- Kyle Shiflett, Avinash Karanth, Razvan Bunescu, and Ahmed Louri. 2021. Albireo: Energy-Efficient Acceleration of Convolutional Neural Networks via Silicon Photonics. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 860--873. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1409.1556Google Scholar
- Aaron Stillmaker and Bevan Baas. 2017. Scaling equations for the accurate prediction of CMOS device performance from 180nm to 7nm. Integration 58 (2017), 74--81. Google ScholarCross Ref
- Mitsuru Takenaka, Jae-Hoon Han, Frédéric Boeuf, Jin-Kwon Park, Qiang Li, Chong Pei Ho, Dongsheng Lyu, Shuhei Ohno, Junichi Fujikata, Shigeki Takahashi, and Shinichi Takagi. 2019. III-V/Si Hybrid MOS Optical Phase Shifter for Si Photonic Integrated Circuits. Journal of Lightwave Technology 37, 5 (2019), 1474--1483. Google ScholarCross Ref
- Scott Van Winkle, Avinash Karanth Kodi, Razvan Bunescu, and Ahmed Louri. 2018. Extending the Power-Efficiency and Performance of Photonic Interconnects for Heterogeneous Multicores with Machine Learning. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 480--491. Google ScholarCross Ref
- Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn. 2008. Corona: System Implications of Emerging Nanophotonic Technology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, USA, 153--164. Google ScholarDigital Library
- Nandita Vijaykumar, Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, and Onur Mutlu. 2018. The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality In GPUs. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 829--842. Google ScholarDigital Library
- Michael R. Watts, William A. Zortman, Douglas C. Trotter, Ralph W. Young, and Anthony L. Lentine. 2011. Vertical junction silicon microdisk modulators and switches. Opt. Express 19, 22 (Oct 2011), 21989--22003. Google ScholarCross Ref
- Xingyuan Xu, Mengxi Tan, Bill Corcoran, Jiayang Wu, Andreas Boes, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Damien G. Hicks, Roberto Morandotti, Arnan Mitchell, and David J. Moss. 2021. 1 TOPS photonic convolutional accelerator for optical neural networks. Nature 589, 7840 (2021), 44--51. Google ScholarCross Ref
- Yi Zhang, Shuyu Yang, Andy Eu-Jin Lim, Guo-Qiang Lo, Christophe Galland, Tom Baehr-Jones, and Michael Hochberg. 2013. A compact and low loss Y-junction for submicron silicon waveguide. Opt. Express 21, 1 (Jan 2013), 1310--1316. Google ScholarCross Ref
- Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. SpArch: Efficient Architecture for Sparse Matrix Multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 261--274. Google ScholarCross Ref
- Xuezhe Zheng, Eric Chang, Philip Amberg, Ivan Shubin, Jon Lexau, Frankie Liu, Hiren Thacker, Stevan S. Djordjevic, Shiyun Lin, Ying Luo, Jin Yao, Jin-Hyoung Lee, Kannan Raj, Ron Ho, John E. Cunningham, and Ashok V. Krishnamoorthy. 2014. A high-speed, tunable silicon photonic ring modulator integrated with ultra-efficient active wavelength control. Opt. Express 22, 10 (May 2014), 12628--12633. Google ScholarCross Ref
Index Terms
- Flumen: Dynamic Processing in the Photonic Interconnect
Recommendations
Photonic NoCs: System-Level Design Exploration
Network-on-chip is a key enabling technology to address the challenges of interconnecting the increasing number of cores in emerging chip multiprocessors. By leveraging recent advances in the CMOS integration of photonic devices and the unique ...
Time-division-multiplexed arbitration in silicon nanophotonic networks-on-chip for high-performance chip multiprocessors
As the computational performance of microprocessors continues to grow through the integration of an increasing number of processing cores on a single die, the interconnection network has become the central subsystem for providing the communications ...
Spectrum: a hybrid nanophotonic-electric on-chip network
DAC '09: Proceedings of the 46th Annual Design Automation ConferenceOn many-core chip designs, short, often-multicast, latency-critical messages, used extensively in high-level coherence and synchronization protocols, often become the bottleneck of parallel performance scaling. This paper presents Spectrum, a hybrid ...
Comments