ABSTRACT
Heterogeneous System Architecture (HSA) is an open industry standard designed to support a large variety of data-parallel and task-parallel programming models. Currently, most of HSA hardware and software components are still in development. It is helpful to provide various heterogeneous simulation environments for HSA developers in developing HSA software stacks. This paper presents the design of HSAemu, a full system emulator for the HSA platform, and illustrates how those HSA features are implemented in the simulator. HSAemu provides an infrastructure of heterogeneous simulation environments by supporting required HSA features, including hUMA, hQ and HSAIL. Based on the infrastructure, HSAemu provide two simulation models, FastSim and DeepSim, for high-speed functional emulation and slow cycle-accurate simulation, respectively. In our preliminary experiments, HSAemu helps test a complete HSA software stack and profile system performance. Our case studies show that HSAemu is very useful as a hardware/software co-design tool for heterogeneous systems.
- Bellard, F. QEMU, a fast and portable dynamic translator. In USENIX ATC, 41--46, 2005. Google ScholarDigital Library
- Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F. Simics: A full system simulation platform. Computer. 35, 2 (2002), 50--58. Google ScholarDigital Library
- AMD. A revolutionary, new architecture pioneered by AMD. Retrieved April 20, 2014 from http://www.amd.com/en-us/innovations/software-technologies/hsaGoogle Scholar
- HSA Foundation. HSA programmer's reference manual: HSAIL virtual ISA and programming model, compiler writer's guide, and object format (BRIG) v.95. Retrieved April 20, 2014 from http://hsafoundation.com/standards/Google Scholar
- Lattner, C., and Adve, V. LLVM: a compilation framework for lifelong program analysis & transformation. In CGO, 75--86, 2004. Google ScholarDigital Library
- Ding, J. H., Chang, P. C., Hsu, W. C., and Chung, Y. C. PQEMU: a parallel system emulator based on QEMU. In ICPADS, 276--283, 2011. Google ScholarDigital Library
- Ubal, R., Sahuquillo, J., Petit, S., and López, P. Multi2Sim: a simulation framework to evaluate multicore-multithread processors. In HPCA, 62--68, 2007.Google Scholar
- Bakhoda, A., Yuan, G. L., Fung, W. W. L., Wong, H., and Aamodt, T. M. Analyzing CUDA workloads using a detailed GPU simulator. In ISPASS, 163--174, 2009.Google Scholar
- Leng, J., Hetherington, T., ElTantawy, A., Gilani, S., Kim, N. S., Aamodt, T. M., and Reddi, V. J. GPUWattch: Enabling Energy Optimizations in GPGPUs. In ISCA, 2013. Google ScholarDigital Library
- Austin T., Larson E., and Ernst D. SimpleScalar: an infrastructure for computer system modeling. Computer. 35, 2 (2002), 59--67. Google ScholarDigital Library
- Brooks, D., Tiwari, V., and Martonosi, M. Wattch: A Framework for Architectural-Level Power Analysis and Optimization. In ISCA, 83--94, 2000. Google ScholarDigital Library
- Sanchez, D., and Kozyrakis, C. ZSim: fast and accurate microarchitectural simulation of thousand-core systems. In ISCA, 2013. Google ScholarDigital Library
- Witchel, E. and Rosenblum R. 1996. "Embra: fast and flexible machine simulation," In Proc. ACM SIGMETRICS Intl. Conf. on Measurement and Modeling of Computer Systems, 68--78, 1996. Google ScholarDigital Library
- Bohrer, P., Peterson, J., Elnozahy, M., Rajamony, R., Gheith, A., Rockhold, R., Lefurgy, C., et al. Mambo: a full system simulator for the PowerPC architecture. ACM SIGMETRICS Performance Evaluation Review, 31, 4 (2004), 8--12. Google ScholarDigital Library
- Wang, Z., Liu, R., Chen, Y., Wu, X., Chen, H., Zhang, W., and Zang, B. 2011. COREMU: a scalable and portable parallel full-system emulator. In PPoPP, 213--222, 2011. Google ScholarDigital Library
- Wang, K., Zhang, Y., Wang, Y., and Shen, X. Parallelization of IBM Mambo System Simulator in Functional Modes. ACM SIGOPS Operating Systems Review, 42, 1 (2008). 71--76. Google ScholarDigital Library
- Lantz, R. E. Fast Functional Simulation with Parallel Embra. In Proceedings of Workshop on Modeling, Benchmarking, and Simulation (MoBS), 2008.Google Scholar
- Hung, S.-H., Shih C.-S., Kuo T.-W, Tu C.-H., Tu, and Chang C.-W. A Real-Time, Energy-Efficient System Software Suite for Heterogeneous Multicore Platforms. In CODES+ISSS, 23--32, 2012. Google ScholarDigital Library
- Stone, J. E., Gohara, D., and Shi, G. OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering. 12, 3 (2010), 66--73. Google ScholarDigital Library
- Collange, S., Daumas, M., Defour, D., and Parello, D. 2010. Barra: a parallel functional simulator for GPGPU. In Proc. IEEE Intl. Symp. on Modeling, Analysis & Simulation of Computer and Telecommunication Systems. 351--360. Google ScholarDigital Library
- August, D., Chang, J., Girbal, S., Gracia-Perez, D., Mouchard, G., Penry, D., Temam, O., and Vachharajani, N. UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development. IEEE CAL, 6, 2(2007), 45--48. Google ScholarDigital Library
- Diamos, G. F., Kerr, A. R., Yalamanchili, S., and Clark, N. Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In PACT, 353--364, 2010. Google ScholarDigital Library
- Zakharenko, V., Aamodt, T., and Moshovos, A. Characterizing the performance benefits of fused CPU/GPU systems using FusionSim. In DATE, 685--688, 2013. Google ScholarDigital Library
- Yourst, M. T. PTLsim: a cycle accurate full system x86--64 microarchitectural simulator. In ISPASS, 23--24, 2007.Google Scholar
- Shen, B.-Y., Chen, J.-Y., Hsu, W,-C., and Yang, W. LLBT: An LLVM-Based Static Binary Translator. In CASES, 51--60, 2012. Google ScholarDigital Library
- Perez, G. A., Kao, C.-M., Hsu, W.-C., and Chung, Y.-C. A Hybrid Just-In-Time Compiler for Android. In CASES, 41--50, 2012. Google ScholarDigital Library
- Smith, J. E., and Nair, R.. Virtual machines: versatile platforms for systems and processes. Elsevier, 2005. Google ScholarDigital Library
Index Terms
- HSAemu: a full system emulator for HSA platforms
Recommendations
HSAemu 2.0: Full System Emulation for HSA platforms with Soft-MMU
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsWith the increasing computing complexity and the proliferation of data, the world demands efficient, next-generation system architecture to enable large-scale applications at acceptable costs. Heterogeneous computing has become a hot topic and a ...
Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support
The heterogeneous system architecture (HSA), announced by the HSA Foundation, is an approach to integrate central processing unit (CPU) and graphics processing unit (GPU) architectures. The open computing language (OpenCL) is a programming framework ...
GPU Acceleration for Simulating Massively Parallel Many-Core Platforms
Emerging massively parallel architectures such as a general-purpose processor plus many-core programmable accelerators are creating an increasing demand for novel methods to perform their architectural simulation. Most state-of-the-art simulation ...
Comments