research-article

Multi2Sim: a simulation framework for CPU-GPU computing

Authors:
Rafael Ubal

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Byunghyun Jang

University of Mississippi, University, MS, USA

University of Mississippi, University, MS, USA
View Profile

,
Perhaad Mistry

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Dana Schaa

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
David Kaeli

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesSeptember 2012Pages 335–344https://doi.org/10.1145/2370816.2370865

Published:19 September 2012Publication History

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 335–344

ABSTRACT

Accurate simulation is essential for the proper design and evaluation of any computing platform. Upon the current move toward the CPU-GPU heterogeneous computing era, researchers need a simulation framework that can model both kinds of computing devices and their interaction. In this paper, we present Multi2Sim, an open-source, modular, and fully configurable toolset that enables ISA-level simulation of an x86 CPU and an AMD Evergreen GPU. Focusing on a model of the AMD Radeon 5870 GPU, we address program emulation correctness, as well as architectural simulation accuracy, using AMD's OpenCL benchmark suite. Simulation capabilities are demonstrated with a preliminary architectural exploration study, and workload characterization examples. The project source code, benchmark packages, and a detailed user's guide are publicly available at www.multi2sim.org.

References

AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). http://developer.amd.com/sdks/amdappsdk/.Google Scholar
AMD Accelerated Parallel Processing OpenCL Programming Guide (v1.3c).Google Scholar
AMD Evergreen Family Instruction Set Arch. (v1.0d). http://developer.amd.com/sdks/amdappsdk/documentation/.Google Scholar
AMD Intermediate Language (IL) Spec. (v2.0e). http://developer.amd.com/sdks/amdappsdk/documentation/.Google Scholar
Intel Ivy Bridge. http://ark.intel.com/products/codename/29902/Ivy-Bridge.Google Scholar
NVIDIA PTX: Parallel Thread Execution ISA. http://developer.nvidia.com/cuda-downloads/.Google Scholar
OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems. www.khronos.org/opencl.Google Scholar
The AMD Fusion Family of APUs. http://fusion.amd.com/.Google Scholar
The NVIDIA Denver Project. http://blogs.nvidia.com/.Google Scholar
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Proc. of the Int'l Symposium on Performance Analysis of Systems and Software (ISPASS), Apr. 2009.Google ScholarCross Ref
N. L. Binkert, E. G. Hallnor, and S. K. Reinhardt. Network-Oriented Full-System Simulation Using M5. 6th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), Feb. 2003.Google Scholar
S. Collange, M. Daumas, D. Defour, and D. Parello. Barra: A Parallel Functional Simulator for GPGPU. In Proc. of the 18th Int'l Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Aug. 2010. Google ScholarDigital Library
G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark. Ocelot: a Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous Systems. In Proc. of the 19th Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2010. Google ScholarDigital Library
P. S. M. et. al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2), 2002. Google ScholarDigital Library
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In Proc. of the 40th Int'l Symposium on Microarchitecture, Dec. 2007. Google ScholarDigital Library
B. Jang, D. Schaa, P. Mistry, and D. Kaeli. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures. IEEE Transactions on Parallel and Distributed Systems, 22(1), Jan. 2011. Google ScholarDigital Library
M. Houston and M. Mantor. AMD Graphics Core Next. http://developer.amd.com/afds/assets/presentations/2620_final.pdf.Google Scholar
G. L. Yuan, A. A. Bakhoda, and T. M. Aamodt. Complexity Effective Memory Access Scheduling for Many-Core Accelerator Architectures. In 42nd Int'l Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library

Index Terms

Multi2Sim: a simulation framework for CPU-GPU computing
1. Software and its engineering

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
MIC acceleration of short-range molecular dynamics simulations
COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores

Heterogeneous systems containing accelerators such as GPUs or co-processors such as Intel MIC are becoming more prevalent due to their ability of exploiting large-scale parallelism in applications. In this paper, we have developed a hierarchical ...
Read More
Collaborative Computing for Heterogeneous Integrated Systems
ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering

Computing systems today typically employ, in addition to powerful CPUs, various types of specialized devices such as Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs). Such heterogeneous systems are evolving towards tighter ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
September 2012
512 pages
ISBN:9781450311823
DOI:10.1145/2370816
General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 September 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CPU-GPU
heterogeneous computing
multi2sim
simulation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 391
  Total Citations
  View Citations
- 1,521
  Total Downloads
- Downloads (Last 12 months)125
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi2Sim: a simulation framework for CPU-GPU computing

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

MIC acceleration of short-range molecular dynamics simulations

Collaborative Computing for Heterogeneous Integrated Systems