skip to main content
10.1145/2786572.2786596acmconferencesArticle/Chapter ViewAbstractPublication PagesnocsConference Proceedingsconference-collections
research-article

Asymmetric NoC Architectures for GPU Systems

Authors Info & Claims
Published:28 September 2015Publication History

ABSTRACT

While both Chip MultiProcessors (CMPs) and Graphics Processing Units (GPUs) are many-core systems, they exhibit different memory access patterns. CMPs execute threads in parallel, where threads communicate and synchronize through the memory hierarchy (without any coalescing). GPUs on the other hand execute a large number of independent thread blocks and their accesses to memory are frequent and coalesced, resulting in a completely different access pattern.

NoC designs for GPUs have not been extensively explored. In this paper, we first evaluate several NoC designs for GPUs to determine the most power/performance efficient NoCs. To improve NoC energy efficiency, we explore an asymmetric NoC design tailored for a GPU's memory access pattern, providing one network for L1-to-L2 communication and a second for L2-to-L1 traffic. Our analysis shows that an asymmetric multi-network Cmesh provides the most energy-efficient communication fabric for our target GPU system.

References

  1. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). http://developer.amd.com/sdks/amdappsdk/.Google ScholarGoogle Scholar
  2. Predictive Technology Model. http://ptm.asu.edu/.Google ScholarGoogle Scholar
  3. AMD Graphics Cores Next (GCN) Architecture, June 2012. White paper.Google ScholarGoogle Scholar
  4. A. Bakhoda, J. Kim, and T. M. Aamodt. Throughput-effective on-chip networks for manycore accelerators. In Proceedings of the 2010 43rd annual IEEE/ACM international symposium on microarchitecture, pages 421--432. IEEE Computer Society, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Prof. of the Int'l Symposium on Performance Analysis of Systems and Software, April 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Balfour and W. J. Dally. Design tradeoffs for tiled cmp on-chip networks. In Proceedings of the 20th annual international conference on Supercomputing, pages 187--198. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cole, S. Newman, F. Foertter, I. Aguilar, and M. Coffey. Breeding and genetics symposium: Really big data: Processing and analysis of very large data sets. Journal of animal science, 90(3):723--733, 2012.Google ScholarGoogle Scholar
  8. X. Cui, J. S. Charles, and T. Potok. Gpu enhanced parallel computing for large scale data clustering. Future Generation Computer Systems, 29(7):1736--1741, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Goswami, Z. Li, R. Shankar, and T. Li. Exploring silicon nanophotonics in throughput architecture. Design & Test, IEEE, 31(5):18--27, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Joshi, B. Kim, and V. Stojanovic. Designing energy-efficient low-diameter on-chip networks with equalized interconnects. In High Performance Interconnects, 2009. HOTI 2009. 17th IEEE Symposium on, pages 3--12. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. R. Kaeli, P. Mistry, D. Schaa, and D. P. Zhang. Heterogeneous Computing with OpenCL 2.0. Morgan Kaufmann, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Kim, W. J. Dally, and D. Abts. Flattened butterfly: a cost-efficient topology for high-radix networks. ACM SIGARCH Computer Architecture News, 35(2):126--137, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Krone, J. E. Stone, T. Ertl, and K. Schulten. Fast visualization of gaussian density surfaces for molecular dynamics and particle system trajectories. EuroVis-Short Papers, 2012:67--71, 2012.Google ScholarGoogle Scholar
  14. P. Kumar, Y. Pan, J. Kim, G. Memik, and A. Choudhary. Exploring concentration and channel slicing in on-chip network router. In Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip, pages 276--285. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Liang, K. Turgay, and D. Brooks. Architectural power models for sram and cam structures based on hybrid analytical/empirical techniques. In Proc. of the Int'l Conference on Computer Aided Design, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Macedonia. The gpu enters computing's mainstream. Computer, 36(10):106--108, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Mantor. Amd hd7970 graphics core next (gcn) architecture. In HOT Chips, A Symposium on High Performance Chips, 2012.Google ScholarGoogle Scholar
  18. J. Meng, C. Chen, A. K. Coskun, and A. Joshi. Run-time energy management of manycore systems through reconfigurable interconnects. In Proceedings of the 21st Edition of the Great Lakes Symposium on Great Lakes Symposium on VLSI, GLSVLSI '11, pages 43--48, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. K. Mishra, N. Vijaykrishnan, and C. R. Das. A case for heterogeneous on-chip interconnects for cmps. In Computer Architecture (ISCA), 2011 38th Annual International Symposium on, pages 389--399. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Park, T. Krishna, C.-H. Chen, B. Daya, A. Chandrakasan, and L.-S. Peh. Approaching the Theoretical Limits of a Mesh NoC with a 16-Node Chip Prototype in 45nm SOI. In Proc. of the 49th Design Automation Conference, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. E. Stone, D. Gohara, and G. Shi. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Design and Test, 12(3), May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. Multi2Sim: A Simulation Framework for CPU-GPU Computing. In Proc. of the 21st Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. An 80-Tile Sub-100W TeraFLOPS Processor in 65nm CMOS. IEEE Journal of Solid-State Circuits, 43(1), Jan. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  24. H. Wang, L.-S. Peh, and S. Malik. Power-Driven Design of Router Microarchitectures in On-Chip Networks. In Proc. of the 36th Int'l Symposium on Microarchitecture, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Wentzlaff, L. Bao, B. Edwards, P. Griffin, H. Hoffmann, A. Agarwal, J. F. Brown III, C. Ramey, C.-C. Miao, and M. Mattina. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, 27(5), Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. K. Ziabari, J. L. Abellan, R. Ubal Tena, C. Chen, A. Joshi, and D. Kaeli. Leveraging silicon-photonic noc for designing scalable gpus. In ACM International Conference on Supercomputing. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Asymmetric NoC Architectures for GPU Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        NOCS '15: Proceedings of the 9th International Symposium on Networks-on-Chip
        September 2015
        233 pages
        ISBN:9781450333962
        DOI:10.1145/2786572

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 September 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate14of44submissions,32%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader