skip to main content
research-article
Public Access

SHARP: Shared Heterogeneous Architecture with Reconfigurable Photonic Network-on-Chip

Published:11 July 2018Publication History
Skip Abstract Section

Abstract

As the relentless quest for higher throughput and lower energy cost continues in heterogenous multicores, there is a strong demand for energy-efficient and high-performance Network-on-Chip (NoC) architectures. Heterogeneous architectures that can simultaneously utilize both the serialized nature of the CPU as well as the thread level parallelism of the GPU are gaining traction in the industry. A critical issue with heterogeneous architectures is finding an optimal way to utilize the shared resources such as the last level cache and NoC without hindering the performance of either the CPU or the GPU core. Photonic interconnects are a disruptive technology solution that has the potential to increase the bandwidth, reduce latency, and improve energy-efficiency over traditional metallic interconnects.

In this article, we propose a CPU-GPU heterogeneous architecture called Shared Heterogeneous Architecture with Reconfigurable Photonic Network-on-Chip (SHARP) that clusters CPU and GPU cores around the same router and dynamically allocates bandwidth between the CPU and GPU cores based on application demands. The SHARP architecture is designed as a Single-Writer Multiple-Reader (SWMR) crossbar with reservation-assist to connect CPU/GPU cores that dynamically reallocates bandwidth using buffer utilization information at runtime. As network traffic exhibits temporal and spatial fluctuations due to application behavior, SHARP can dynamically reallocate bandwidth and thereby adapt to application demands. SHARP demonstrates 34% performance (throughput) improvement over a baseline electrical CMESH while consuming 25% less energy per bit. Simulation results have also shown 6.9% to 14.9% performance improvement over other flavors of the proposed SHARP architecture without dynamic bandwidth allocation.

References

  1. AMD. 2015a. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). Retrieved from http://developer.amd.com/sdks/amdappsdk/.Google ScholarGoogle Scholar
  2. AMD. 2015b. Carrizo. Retrieved from http://www.amd.com/en-us/who-we-are/corporate-information/events/isscc.Google ScholarGoogle Scholar
  3. C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C. Holzwarth, M. Popovic, H. Li, H. Smith, J. Hoyt, F. Kartner, R. Ram, V. Stojanovic, and K. Asanovic. 2008. Building manycore processor-to-DRAM networks with monolithic silicon photonics. In Proceedings of the 16th IEEE Symposium on High Performance Interconnects. 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Le Beux, H. Li, I. O’Connor, K. Cheshmi, X. Liu, J. Trajkovic, and G. Nicolescu. 2014. Chameleon: Channel efficient optical network-on-chip. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bienia and K. Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.Google ScholarGoogle Scholar
  6. S. Borkar. 2013. Exascale computing—A fact or a fiction? In Proceedings of the IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS’13). 3--3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Choi, K. Duraisamy, R. G. Kim, J. R. Doppa, P. P. Pande, R. Marculescu, and D. Marculescu. 2016. Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms. In Proceedings of the International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES’16). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Fusella and A. Cilardo. 2017. H2ONoC: A hybrid optical electronic NoC based on hybrid topology. IEEE Trans. Very Large Scale Integr. Syst. 25, 1 (Jan 2017), 330--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). 10--14.Google ScholarGoogle ScholarCross RefCross Ref
  10. Intel. 2015a. Sixth generation Intel Core i7 Processors (formerly Skylake). Retrieved from http://www.intel.com/content/www/us/en/processors/core/core-i7-processor.html?wapkw=skylake.Google ScholarGoogle Scholar
  11. Intel. 2015b. The next generation of computing has arrived: Performance to power amazing experiences. Retrieved from http://www.intel.com/content/dam/www/public/us/en/documents/guides/mobile-5th-gen-core-app-power-guidelines-addendum.pdf.Google ScholarGoogle Scholar
  12. Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, and David I. August. 2012. Dynamically managed data for CPU-GPU architectures. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Onur Kayiran, Nachiappan Chidambaram Nachiappan, Adwait Jog, Rachata Ausavarungnirun, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, and Chita R. Das. 2014. Managing GPU concurrency in heterogeneous architectures. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). 114--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Kim, J. Lee, J. E. Jo, and J. Kim. 2014. GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management. In Proceedings of 20th IEEE International Symposium on Computer Architecture (HPCA’14). 546–557.Google ScholarGoogle Scholar
  15. N. Kirman, M. Kirman, R. K. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, and D. H. Albonesi. 2006. Leveraging optical technology in future bus-based chip multiprocessors. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). 492--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Koohi and S. Hessabi. 2014. All-optical wavelength-routed architecture for a power-efficient network on chip. IEEE Trans. Comput. 63, 3 (March 2014), 777--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili. 2013a. Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures. ACM Trans. Des. Autom. Electron. Syst. 18, 4 (Oct. 2013), 48:1--48:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili. 2013b. Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture. J. Parallel Distrib. Comput. 73, 12 (2013), 1525--1538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. SIGARCH Comput. Archit. News 41, 3 (June 2013), 487--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sparsh Mittal and Jeffrey S. Vetter. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47, 4 (July 2015), 69:1--69:35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Morris, A. K. Kodi, and A. Louri. 2012. Dynamic reconfiguration of 3D photonic networks-on-chip for maximizing performance and improving fault tolerance. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’12). 282--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Nitta, M. Farrens, and V. Akella. 2011. Addressing system-level trimming issues in on-chip nanophotonic networks. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture. 122--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. NVIDIA. 2015. Tegra X1. Retrieved from http://www.nvidia.com/object/tegra-x1-processor.html.Google ScholarGoogle Scholar
  25. Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, and Alok Choudhary. 2009. Firefly: Illuminating future network-on-chip with nanophotonics. SIGARCH Comput. Archit. News 37, 3 (June 2009), 429--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David A. Patterson and John L. Hennessy. 2013. Computer Organization and Design: The Hardware/Software Interface (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. Erraguntla, Y. Hoskote, S. Vangal, G. Ruhl, and N. Borkar. 2011. A 2 Tb/s 64 mesh network for a single-chip cloud computer With DVFS in 45 nm CMOS. IEEE J. Solid-State Circ. 46, 4 (April 2011), 757--766.Google ScholarGoogle ScholarCross RefCross Ref
  28. K. T. Settaluri, S. Lin, S. Moazeni, E. Timurdogan, C. Sun, M. Moresco, Z. Su, Y. H. Chen, G. Leake, D. LaTulipe, C. McDonough, J. Hebding, D. Coolbaugh, M. Watts, and V. Stojanovič. 2015. Demonstration of an optical chip-to-chip link in a 3D integrated electronic-photonic platform. In Proceedings of the 41st European Solid-State Circuits Conference (ESSCIRC’15). 156--159.Google ScholarGoogle Scholar
  29. A. Shah, N. Mansoor, B. Johnstone, A. Ganguly, and S. L. Alarcon. 2014. Heterogeneous photonic network-on-chip with dynamic bandwidth allocation. In Proceedings of the 27th IEEE International System-on-Chip Conference (SOCC’14). 249--254.Google ScholarGoogle Scholar
  30. Chen Sun, C.-H. O. Chen, G. Kurian, Lan Wei, J. Miller, A. Agarwal, Li-Shiuan Peh, and V. Stojanovic. 2012. DSENT - A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS’12). 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 335--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rafael Ubal, Dana Schaa, Perhaad Mistry, Xiang Gong, Yash Ukidave, Zhongliang Chen, Gunar Schirner, and David Kaeli. 2014. Exploring the heterogeneous design space for both performance and reliability. In Proceedings of the 51st Annual Design Automation Conference (DAC’14). 181:1--181:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. 2008. An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43, 1 (Jan 2008), 29--41.Google ScholarGoogle ScholarCross RefCross Ref
  34. Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Washington, DC, 153--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xiaowen Wu, Jiang Xu, Yaoyao Ye, Zhehui Wang, Mahdi Nikdast, and Xuan Wang. 2014. SUOR: Sectioned undirectional optical ring for chip multiprocessor. J. Emerg. Technol. Comput. Syst. 10, 4, Article 29 (June 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hui Zhao, Mahmut Kandemir, Wei Ding, and Mary Jane Irwin. 2011. Exploring heterogeneous NoC design space. In Proceedings of the International Conference on Computer-Aided Design (ICCAD’11). 787--793. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Amir Kavyan Kavyan Ziabari, Jose L. Abellán, Rafael Ubal, Chao Chen, Ajay Joshi, and David Kaeli. 2015. Leveraging silicon-photonic NoC for designing scalable GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS’15). 273--282. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SHARP: Shared Heterogeneous Architecture with Reconfigurable Photonic Network-on-Chip

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 14, Issue 2
      Special Issue on Frontiers of Hardware and Algorithms for On-chip Learning, Special Issue on Silicon Photonics and Regular Papers
      April 2018
      322 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3227199
      • Editor:
      • Yuan Xie
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 July 2018
      • Accepted: 1 February 2018
      • Revised: 1 January 2018
      • Received: 1 June 2017
      Published in jetc Volume 14, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader