research-article

Asymmetric NoC Architectures for GPU Systems

Authors:
Amir Kavyan Ziabari

Department of Electrical and Computer Engineering, Northeastern University

Department of Electrical and Computer Engineering, Northeastern University
View Profile

,
José L. Abellán

Computer Science Dept., Universidad Católica San, Antonio de Murcia

Computer Science Dept., Universidad Católica San, Antonio de Murcia
View Profile

,
Yenai Ma

Department of Electrical and Computer Engineering, Boston University

Department of Electrical and Computer Engineering, Boston University
View Profile

,
Ajay Joshi

Department of Electrical and Computer Engineering, Boston University

Department of Electrical and Computer Engineering, Boston University
View Profile

,
David Kaeli

Department of Electrical and Computer Engineering, Northeastern University

Department of Electrical and Computer Engineering, Northeastern University
View Profile

NOCS '15: Proceedings of the 9th International Symposium on Networks-on-ChipSeptember 2015Article No.: 25Pages 1–8https://doi.org/10.1145/2786572.2786596

Published:28 September 2015Publication History

NOCS '15: Proceedings of the 9th International Symposium on Networks-on-Chip

Pages 1–8

ABSTRACT

While both Chip MultiProcessors (CMPs) and Graphics Processing Units (GPUs) are many-core systems, they exhibit different memory access patterns. CMPs execute threads in parallel, where threads communicate and synchronize through the memory hierarchy (without any coalescing). GPUs on the other hand execute a large number of independent thread blocks and their accesses to memory are frequent and coalesced, resulting in a completely different access pattern.

NoC designs for GPUs have not been extensively explored. In this paper, we first evaluate several NoC designs for GPUs to determine the most power/performance efficient NoCs. To improve NoC energy efficiency, we explore an asymmetric NoC design tailored for a GPU's memory access pattern, providing one network for L1-to-L2 communication and a second for L2-to-L1 traffic. Our analysis shows that an asymmetric multi-network Cmesh provides the most energy-efficient communication fabric for our target GPU system.

References

AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). http://developer.amd.com/sdks/amdappsdk/.Google Scholar
Predictive Technology Model. http://ptm.asu.edu/.Google Scholar
AMD Graphics Cores Next (GCN) Architecture, June 2012. White paper.Google Scholar
A. Bakhoda, J. Kim, and T. M. Aamodt. Throughput-effective on-chip networks for manycore accelerators. In Proceedings of the 2010 43rd annual IEEE/ACM international symposium on microarchitecture, pages 421--432. IEEE Computer Society, 2010. Google ScholarDigital Library
A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Prof. of the Int'l Symposium on Performance Analysis of Systems and Software, April 2009.Google ScholarCross Ref
J. Balfour and W. J. Dally. Design tradeoffs for tiled cmp on-chip networks. In Proceedings of the 20th annual international conference on Supercomputing, pages 187--198. ACM, 2006. Google ScholarDigital Library
J. Cole, S. Newman, F. Foertter, I. Aguilar, and M. Coffey. Breeding and genetics symposium: Really big data: Processing and analysis of very large data sets. Journal of animal science, 90(3):723--733, 2012.Google Scholar
X. Cui, J. S. Charles, and T. Potok. Gpu enhanced parallel computing for large scale data clustering. Future Generation Computer Systems, 29(7):1736--1741, 2013. Google ScholarDigital Library
N. Goswami, Z. Li, R. Shankar, and T. Li. Exploring silicon nanophotonics in throughput architecture. Design & Test, IEEE, 31(5):18--27, 2014.Google ScholarCross Ref
A. Joshi, B. Kim, and V. Stojanovic. Designing energy-efficient low-diameter on-chip networks with equalized interconnects. In High Performance Interconnects, 2009. HOTI 2009. 17th IEEE Symposium on, pages 3--12. IEEE, 2009. Google ScholarDigital Library
D. R. Kaeli, P. Mistry, D. Schaa, and D. P. Zhang. Heterogeneous Computing with OpenCL 2.0. Morgan Kaufmann, 2015. Google ScholarDigital Library
J. Kim, W. J. Dally, and D. Abts. Flattened butterfly: a cost-efficient topology for high-radix networks. ACM SIGARCH Computer Architecture News, 35(2):126--137, 2007. Google ScholarDigital Library
M. Krone, J. E. Stone, T. Ertl, and K. Schulten. Fast visualization of gaussian density surfaces for molecular dynamics and particle system trajectories. EuroVis-Short Papers, 2012:67--71, 2012.Google Scholar
P. Kumar, Y. Pan, J. Kim, G. Memik, and A. Choudhary. Exploring concentration and channel slicing in on-chip network router. In Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip, pages 276--285. IEEE Computer Society, 2009. Google ScholarDigital Library
X. Liang, K. Turgay, and D. Brooks. Architectural power models for sram and cam structures based on hybrid analytical/empirical techniques. In Proc. of the Int'l Conference on Computer Aided Design, 2007. Google ScholarDigital Library
M. Macedonia. The gpu enters computing's mainstream. Computer, 36(10):106--108, 2003. Google ScholarDigital Library
M. Mantor. Amd hd7970 graphics core next (gcn) architecture. In HOT Chips, A Symposium on High Performance Chips, 2012.Google Scholar
J. Meng, C. Chen, A. K. Coskun, and A. Joshi. Run-time energy management of manycore systems through reconfigurable interconnects. In Proceedings of the 21st Edition of the Great Lakes Symposium on Great Lakes Symposium on VLSI, GLSVLSI '11, pages 43--48, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
A. K. Mishra, N. Vijaykrishnan, and C. R. Das. A case for heterogeneous on-chip interconnects for cmps. In Computer Architecture (ISCA), 2011 38th Annual International Symposium on, pages 389--399. IEEE, 2011. Google ScholarDigital Library
S. Park, T. Krishna, C.-H. Chen, B. Daya, A. Chandrakasan, and L.-S. Peh. Approaching the Theoretical Limits of a Mesh NoC with a 16-Node Chip Prototype in 45nm SOI. In Proc. of the 49th Design Automation Conference, June 2012. Google ScholarDigital Library
J. E. Stone, D. Gohara, and G. Shi. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Design and Test, 12(3), May 2010. Google ScholarDigital Library
R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. Multi2Sim: A Simulation Framework for CPU-GPU Computing. In Proc. of the 21st Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2012. Google ScholarDigital Library
S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. An 80-Tile Sub-100W TeraFLOPS Processor in 65nm CMOS. IEEE Journal of Solid-State Circuits, 43(1), Jan. 2008.Google ScholarCross Ref
H. Wang, L.-S. Peh, and S. Malik. Power-Driven Design of Router Microarchitectures in On-Chip Networks. In Proc. of the 36th Int'l Symposium on Microarchitecture, 2003. Google ScholarDigital Library
D. Wentzlaff, L. Bao, B. Edwards, P. Griffin, H. Hoffmann, A. Agarwal, J. F. Brown III, C. Ramey, C.-C. Miao, and M. Mattina. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, 27(5), Sept. 2007. Google ScholarDigital Library
A. K. Ziabari, J. L. Abellan, R. Ubal Tena, C. Chen, A. Joshi, and D. Kaeli. Leveraging silicon-photonic noc for designing scalable gpus. In ACM International Conference on Supercomputing. ACM, 2015. Google ScholarDigital Library

Index Terms

Asymmetric NoC Architectures for GPU Systems
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Integrated circuits
    1. Interconnect

Recommendations

Evaluation of GPU Architectures Using Spiking Neural Networks
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

During recent years General-Purpose Graphical Processing Units (GP-GPUs) have entered the field of High-Performance Computing (HPC) as one of the primary architectural focuses for many research groups working with complex scientific applications. Nvidia'...
Read More
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures

Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver energy-efficient designs. It is still an open problem to effectively leverage the advantages of both CPUs and GPUs on integrated architectures. In this work, we port ...
Read More
Software Transactional Memory for GPU Architectures
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Modern GPUs have shown promising results in accelerating computation intensive and numerical workloads with limited dynamic data sharing. However, many real-world applications manifest ample amount of data sharing among concurrently executing threads. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NOCS '15: Proceedings of the 9th International Symposium on Networks-on-Chip
September 2015
233 pages
ISBN:9781450333962
DOI:10.1145/2786572
General Chairs:
Andre Ivanov
University of British Columbia, Canada
,
Diana Marculescu
Carnegie Mellon University, USA
,
Program Chairs:
Partha Pratim Pande
Washington State University, USA
,
José Flich
Universitat Politècnica de València, Spain
,
Publications Chair:
Karthik Pattabiraman
University of British Columbia, Canada
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 September 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate14of44submissions,32%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 368
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Asymmetric NoC Architectures for GPU Systems

NOCS '15: Proceedings of the 9th International Symposium on Networks-on-Chip

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of GPU Architectures Using Spiking Neural Networks

Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures

Software Transactional Memory for GPU Architectures