Abstract
As the relentless quest for higher throughput and lower energy cost continues in heterogenous multicores, there is a strong demand for energy-efficient and high-performance Network-on-Chip (NoC) architectures. Heterogeneous architectures that can simultaneously utilize both the serialized nature of the CPU as well as the thread level parallelism of the GPU are gaining traction in the industry. A critical issue with heterogeneous architectures is finding an optimal way to utilize the shared resources such as the last level cache and NoC without hindering the performance of either the CPU or the GPU core. Photonic interconnects are a disruptive technology solution that has the potential to increase the bandwidth, reduce latency, and improve energy-efficiency over traditional metallic interconnects.
In this article, we propose a CPU-GPU heterogeneous architecture called Shared Heterogeneous Architecture with Reconfigurable Photonic Network-on-Chip (SHARP) that clusters CPU and GPU cores around the same router and dynamically allocates bandwidth between the CPU and GPU cores based on application demands. The SHARP architecture is designed as a Single-Writer Multiple-Reader (SWMR) crossbar with reservation-assist to connect CPU/GPU cores that dynamically reallocates bandwidth using buffer utilization information at runtime. As network traffic exhibits temporal and spatial fluctuations due to application behavior, SHARP can dynamically reallocate bandwidth and thereby adapt to application demands. SHARP demonstrates 34% performance (throughput) improvement over a baseline electrical CMESH while consuming 25% less energy per bit. Simulation results have also shown 6.9% to 14.9% performance improvement over other flavors of the proposed SHARP architecture without dynamic bandwidth allocation.
- AMD. 2015a. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). Retrieved from http://developer.amd.com/sdks/amdappsdk/.Google Scholar
- AMD. 2015b. Carrizo. Retrieved from http://www.amd.com/en-us/who-we-are/corporate-information/events/isscc.Google Scholar
- C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C. Holzwarth, M. Popovic, H. Li, H. Smith, J. Hoyt, F. Kartner, R. Ram, V. Stojanovic, and K. Asanovic. 2008. Building manycore processor-to-DRAM networks with monolithic silicon photonics. In Proceedings of the 16th IEEE Symposium on High Performance Interconnects. 21--30. Google ScholarDigital Library
- S. Le Beux, H. Li, I. O’Connor, K. Cheshmi, X. Liu, J. Trajkovic, and G. Nicolescu. 2014. Chameleon: Channel efficient optical network-on-chip. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--6. Google ScholarDigital Library
- C. Bienia and K. Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.Google Scholar
- S. Borkar. 2013. Exascale computing—A fact or a fiction? In Proceedings of the IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS’13). 3--3. Google ScholarDigital Library
- W. Choi, K. Duraisamy, R. G. Kim, J. R. Doppa, P. P. Pande, R. Marculescu, and D. Marculescu. 2016. Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms. In Proceedings of the International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES’16). 1--10. Google ScholarDigital Library
- E. Fusella and A. Cilardo. 2017. H2ONoC: A hybrid optical electronic NoC based on hybrid topology. IEEE Trans. Very Large Scale Integr. Syst. 25, 1 (Jan 2017), 330--343. Google ScholarDigital Library
- M. Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). 10--14.Google ScholarCross Ref
- Intel. 2015a. Sixth generation Intel Core i7 Processors (formerly Skylake). Retrieved from http://www.intel.com/content/www/us/en/processors/core/core-i7-processor.html?wapkw=skylake.Google Scholar
- Intel. 2015b. The next generation of computing has arrived: Performance to power amazing experiences. Retrieved from http://www.intel.com/content/dam/www/public/us/en/documents/guides/mobile-5th-gen-core-app-power-guidelines-addendum.pdf.Google Scholar
- Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, and David I. August. 2012. Dynamically managed data for CPU-GPU architectures. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 165--174. Google ScholarDigital Library
- Onur Kayiran, Nachiappan Chidambaram Nachiappan, Adwait Jog, Rachata Ausavarungnirun, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, and Chita R. Das. 2014. Managing GPU concurrency in heterogeneous architectures. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). 114--126. Google ScholarDigital Library
- Y. Kim, J. Lee, J. E. Jo, and J. Kim. 2014. GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management. In Proceedings of 20th IEEE International Symposium on Computer Architecture (HPCA’14). 546–557.Google Scholar
- N. Kirman, M. Kirman, R. K. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, and D. H. Albonesi. 2006. Leveraging optical technology in future bus-based chip multiprocessors. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). 492--503. Google ScholarDigital Library
- S. Koohi and S. Hessabi. 2014. All-optical wavelength-routed architecture for a power-efficient network on chip. IEEE Trans. Comput. 63, 3 (March 2014), 777--792. Google ScholarDigital Library
- Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili. 2013a. Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures. ACM Trans. Des. Autom. Electron. Syst. 18, 4 (Oct. 2013), 48:1--48:28. Google ScholarDigital Library
- Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili. 2013b. Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture. J. Parallel Distrib. Comput. 73, 12 (2013), 1525--1538. Google ScholarDigital Library
- Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. SIGARCH Comput. Archit. News 41, 3 (June 2013), 487--498. Google ScholarDigital Library
- Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 469--480. Google ScholarDigital Library
- Sparsh Mittal and Jeffrey S. Vetter. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47, 4 (July 2015), 69:1--69:35. Google ScholarDigital Library
- R. Morris, A. K. Kodi, and A. Louri. 2012. Dynamic reconfiguration of 3D photonic networks-on-chip for maximizing performance and improving fault tolerance. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’12). 282--293. Google ScholarDigital Library
- C. Nitta, M. Farrens, and V. Akella. 2011. Addressing system-level trimming issues in on-chip nanophotonic networks. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture. 122--131. Google ScholarDigital Library
- NVIDIA. 2015. Tegra X1. Retrieved from http://www.nvidia.com/object/tegra-x1-processor.html.Google Scholar
- Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, and Alok Choudhary. 2009. Firefly: Illuminating future network-on-chip with nanophotonics. SIGARCH Comput. Archit. News 37, 3 (June 2009), 429--440. Google ScholarDigital Library
- David A. Patterson and John L. Hennessy. 2013. Computer Organization and Design: The Hardware/Software Interface (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarDigital Library
- P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. Erraguntla, Y. Hoskote, S. Vangal, G. Ruhl, and N. Borkar. 2011. A 2 Tb/s 64 mesh network for a single-chip cloud computer With DVFS in 45 nm CMOS. IEEE J. Solid-State Circ. 46, 4 (April 2011), 757--766.Google ScholarCross Ref
- K. T. Settaluri, S. Lin, S. Moazeni, E. Timurdogan, C. Sun, M. Moresco, Z. Su, Y. H. Chen, G. Leake, D. LaTulipe, C. McDonough, J. Hebding, D. Coolbaugh, M. Watts, and V. Stojanovič. 2015. Demonstration of an optical chip-to-chip link in a 3D integrated electronic-photonic platform. In Proceedings of the 41st European Solid-State Circuits Conference (ESSCIRC’15). 156--159.Google Scholar
- A. Shah, N. Mansoor, B. Johnstone, A. Ganguly, and S. L. Alarcon. 2014. Heterogeneous photonic network-on-chip with dynamic bandwidth allocation. In Proceedings of the 27th IEEE International System-on-Chip Conference (SOCC’14). 249--254.Google Scholar
- Chen Sun, C.-H. O. Chen, G. Kurian, Lan Wei, J. Miller, A. Agarwal, Li-Shiuan Peh, and V. Stojanovic. 2012. DSENT - A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS’12). 201--210. Google ScholarDigital Library
- Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 335--344. Google ScholarDigital Library
- Rafael Ubal, Dana Schaa, Perhaad Mistry, Xiang Gong, Yash Ukidave, Zhongliang Chen, Gunar Schirner, and David Kaeli. 2014. Exploring the heterogeneous design space for both performance and reliability. In Proceedings of the 51st Annual Design Automation Conference (DAC’14). 181:1--181:6. Google ScholarDigital Library
- S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. 2008. An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43, 1 (Jan 2008), 29--41.Google ScholarCross Ref
- Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Washington, DC, 153--164. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture. Google ScholarDigital Library
- Xiaowen Wu, Jiang Xu, Yaoyao Ye, Zhehui Wang, Mahdi Nikdast, and Xuan Wang. 2014. SUOR: Sectioned undirectional optical ring for chip multiprocessor. J. Emerg. Technol. Comput. Syst. 10, 4, Article 29 (June 2014). Google ScholarDigital Library
- Hui Zhao, Mahmut Kandemir, Wei Ding, and Mary Jane Irwin. 2011. Exploring heterogeneous NoC design space. In Proceedings of the International Conference on Computer-Aided Design (ICCAD’11). 787--793. Google ScholarDigital Library
- Amir Kavyan Kavyan Ziabari, Jose L. Abellán, Rafael Ubal, Chao Chen, Ajay Joshi, and David Kaeli. 2015. Leveraging silicon-photonic NoC for designing scalable GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS’15). 273--282. Google ScholarDigital Library
Index Terms
SHARP: Shared Heterogeneous Architecture with Reconfigurable Photonic Network-on-Chip
Recommendations
Towards High-Performance and Power-Efficient Optical NoCs Using Silicon-in-Silica Photonic Components
INA-OCMC '15: Proceedings of the 2015 Ninth International Workshop on Interconnection Network Architectures: On-Chip, Multi-ChipNetworks-on-Chips (NoCs) are meeting the growing inter-tile communication needs of multicore chips. However, achieving system scalability by utilizing hundreds of cores on-chip requires high performance, yet energy-efficient on-chip interconnects. As ...
RAFT: A router architecture with frequency tuning for on-chip networks
With increasing number of cores being integrated on a single die, Network-on-Chips (NoCs) have become the de-facto standard in providing scalable communication backbones for these multi-core chips. NoCs have a significant impact on the system's ...
[2010] OREX - An Optical Ring with Electrical Crossbar Hybrid Photonic Network-on-Chip
IWIA '10: Proceedings of the 2010 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and SystemsThe role of network-on-chip (NoC) is becomingmore important as the number of processing elements (PE)integration onto a single chip increases. Lowering powerconsumption while providing capability of high-performancecommunication is a challenging problem ...
Comments