ABSTRACT
Accurate simulation is essential for the proper design and evaluation of any computing platform. Upon the current move toward the CPU-GPU heterogeneous computing era, researchers need a simulation framework that can model both kinds of computing devices and their interaction. In this paper, we present Multi2Sim, an open-source, modular, and fully configurable toolset that enables ISA-level simulation of an x86 CPU and an AMD Evergreen GPU. Focusing on a model of the AMD Radeon 5870 GPU, we address program emulation correctness, as well as architectural simulation accuracy, using AMD's OpenCL benchmark suite. Simulation capabilities are demonstrated with a preliminary architectural exploration study, and workload characterization examples. The project source code, benchmark packages, and a detailed user's guide are publicly available at www.multi2sim.org.
- AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). http://developer.amd.com/sdks/amdappsdk/.Google Scholar
- AMD Accelerated Parallel Processing OpenCL Programming Guide (v1.3c).Google Scholar
- AMD Evergreen Family Instruction Set Arch. (v1.0d). http://developer.amd.com/sdks/amdappsdk/documentation/.Google Scholar
- AMD Intermediate Language (IL) Spec. (v2.0e). http://developer.amd.com/sdks/amdappsdk/documentation/.Google Scholar
- Intel Ivy Bridge. http://ark.intel.com/products/codename/29902/Ivy-Bridge.Google Scholar
- NVIDIA PTX: Parallel Thread Execution ISA. http://developer.nvidia.com/cuda-downloads/.Google Scholar
- OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems. www.khronos.org/opencl.Google Scholar
- The AMD Fusion Family of APUs. http://fusion.amd.com/.Google Scholar
- The NVIDIA Denver Project. http://blogs.nvidia.com/.Google Scholar
- A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Proc. of the Int'l Symposium on Performance Analysis of Systems and Software (ISPASS), Apr. 2009.Google ScholarCross Ref
- N. L. Binkert, E. G. Hallnor, and S. K. Reinhardt. Network-Oriented Full-System Simulation Using M5. 6th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), Feb. 2003.Google Scholar
- S. Collange, M. Daumas, D. Defour, and D. Parello. Barra: A Parallel Functional Simulator for GPGPU. In Proc. of the 18th Int'l Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Aug. 2010. Google ScholarDigital Library
- G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark. Ocelot: a Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous Systems. In Proc. of the 19th Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2010. Google ScholarDigital Library
- P. S. M. et. al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2), 2002. Google ScholarDigital Library
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In Proc. of the 40th Int'l Symposium on Microarchitecture, Dec. 2007. Google ScholarDigital Library
- B. Jang, D. Schaa, P. Mistry, and D. Kaeli. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures. IEEE Transactions on Parallel and Distributed Systems, 22(1), Jan. 2011. Google ScholarDigital Library
- M. Houston and M. Mantor. AMD Graphics Core Next. http://developer.amd.com/afds/assets/presentations/2620_final.pdf.Google Scholar
- G. L. Yuan, A. A. Bakhoda, and T. M. Aamodt. Complexity Effective Memory Access Scheduling for Many-Core Accelerator Architectures. In 42nd Int'l Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library
Index Terms
- Multi2Sim: a simulation framework for CPU-GPU computing
Recommendations
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
MIC acceleration of short-range molecular dynamics simulations
COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many CoresHeterogeneous systems containing accelerators such as GPUs or co-processors such as Intel MIC are becoming more prevalent due to their ability of exploiting large-scale parallelism in applications. In this paper, we have developed a hierarchical ...
Collaborative Computing for Heterogeneous Integrated Systems
ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance EngineeringComputing systems today typically employ, in addition to powerful CPUs, various types of specialized devices such as Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs). Such heterogeneous systems are evolving towards tighter ...
Comments