ABSTRACT
Given the switch from monolithic architectures to integrated systems of commodity components, scalable high performance computing architectures often suffer from unwanted latencies when operations depart an individual device domain. Transferring control and/or data across loosely coupled commodity devices implies a certain degree of cooperating in the form of complex system software. The end result being a total system architecture the operates in an inefficient manner.
This work presents initial research into creating micro architecture extensions to the RISC-V instruction set that provide tightly coupled support for common high performance computing operations. This xBGAS micro architecture extension provides applications the ability to access globally shared memory blocks directly from rudimentary instructions. The end result being a highly efficient micro architecture for scalable shared memory programming environments.
- {n. d.}. CCIX Consortium. https://www.ccixconsortium.com/. Accessed: 2017-09-09.Google Scholar
- {n. d.}. OpenCAPI Consortium. http://opencapi.org/. Accessed: 2017-09-09.Google Scholar
- R. H. Arpaci, D. E. Culler, A. Krishnamurthy, S. G. Steinberg, and K. Yelick. 1995. Empirical evaluation of the CRAY-T3D: a compiler perspective. In Proceedings 22nd Annual International Symposium on Computer Architecture. 320--331. Google ScholarDigital Library
- Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, and David Wentzlaff. 2016. OpenPiton: An Open Source Manycore Research Framework. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 217--232. Google ScholarDigital Library
- Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel Programmability and the Chapel Language. IJHPCA 21 (2007), 291--312. Google ScholarDigital Library
- Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS Community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model (PGAS '10). ACM, New York, NY, USA, Article 2, 3 pages. Google ScholarDigital Library
- GenZ Consortium. 2017. GenZ Core Specification. Technical Report. GenZ Consortium. http://genzconsortium.org/specifications/draft-core-specification-july-2017/Google Scholar
- UPC++ Specification Working Group. 2018. UPC++ Specification v1.0 Draft 5. Technical Report. Lawrence Berkeley National Laboratory.Google Scholar
- Vijay Karamcheti and Andrew A. Chien. 1995. A Comparison of Architectural Support for Messaging in the TMC CM-5 and the Cray T3D. SIGARCH Comput. Archit. News 23, 2 (May 1995), 298--307. Google ScholarDigital Library
- John D Leidel, Xi Wang, and Yong Chen. 2017. Toward a Memory-Centric, Stacked Architecture for Extreme-Scale, Data-Intensive Computing. In Workshop On Pioneering Processor Paradigms, 2017 IEEE Symposium on High Performance Computer Architecture. IEEE.Google Scholar
- John D. Leidel. 2017. GoblinCore-64: A Scalable, Open Architecture for Data Intensive High Performance Computing. Ph.D. Dissertation. Texas Tech University.Google Scholar
- J. D. Leidel, J. Bolding, and G. Rogers. 2013. Toward a Scalable Heterogeneous Runtime System for the Convey MX Architecture. In 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum. 1597--1606. Google ScholarDigital Library
- J. D. Leidel, K. Wadleigh, J. Bolding, T. Brewer, and D. Walker. 2012. CHOMP: A Framework and Instruction Set for Latency Tolerant, Massively Multithreaded Processors. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. 232--239. Google ScholarDigital Library
- John D. Leidel, Xi Wang, and Yong Chen. 2015. GoblinCore-64: Architectural Specification. Technical Report. Texas Tech University. http://gc64.org/wp-content/uploads/2015/09/gc64-arch-spec.pdfGoogle Scholar
- Arjun Menon, Subadra Murugan, Chester Rebeiro, Neel Gala, and Kamakoti Veezhinathan. 2017. Shakti-T: A RISC-V Processor with Light Weight Security Extensions. In Proceedings of the Hardware and Architectural Support for Security and Privacy (HASP '17). ACM, New York, NY, USA, Article 2, 8 pages. Google ScholarDigital Library
- Wilfried Oed and Martin Walker. 1993. An Overview of Cray Research Computers Including the Y-MP/C90 and the New MPP T3D. In Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '93). ACM, New York, NY, USA, 271--272. Google ScholarDigital Library
- Andreas Traber, Florian Zaruba, Sven Stucki, Antonio Pullini, Germain Haugou, Eric Flamand, Frank K. Gurkaynak, and Luca Benini. 2016. PULPino: A small single-core RISC-V SoC. http://iis-projects.ee.ethz.ch/images/d/d0/Pulpino_poster_riscv2015.pdf RISC-V Workshop.Google Scholar
- Xi Wang, John D. Leidel, and Yong Chen. 2016. Concurrent Dynamic Memory Coalescing on GoblinCore-64 Architecture. In Proceedings of the Second International Symposium on Memory Systems (MMEMSYS '16). ACM, New York, NY, USA, 177--187. Google ScholarDigital Library
- Andrew Waterman and Krste Asanovic. 2017. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.2. Technical Report. SiFive, Inc. https://riscv.org/specifications/Google Scholar
- Andrew Waterman and Krste Asanovic. 2017. The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.10. Technical Report. SiFive, Inc. https://riscv.org/specifications/Google Scholar
Index Terms
- xBGAS: Toward a RISC-V ISA Extension for Global, Scalable Shared Memory
Recommendations
Temporal isolation on multiprocessing architectures
DAC '11: Proceedings of the 48th Design Automation ConferenceMultiprocessing architectures provide hardware for executing multiple tasks simultaneously via techniques such as simultaneous multithreading and symmetric multiprocessing. The problem addressed by this paper is that even when tasks that are executing ...
A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors
The slowdown in technology scaling puts architectural features at the forefront of the innovation in modern processors. This article presents a Metric-Guided Method (MGM) that extends Top-Down analysis with carefully selected, dynamically adapted ...
Energy-efficient and high-performance instruction fetch using a block-aware ISA
ISLPED '05: Proceedings of the 2005 international symposium on Low power electronics and designThe front-end in superscalar processors must deliver high application performance in an energy-effective manner. Impediments such as multi-cycle instruction accesses, instruction-cache misses, and mispredictions reduce performance by 48% and increase ...
Comments