ABSTRACT
Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these outputs contain a lot of detailed information relative to the logical and physical transactions about the execution, they are usually extremely large and hard to analyze. What is needed is an interface into the simulator that can help programmers and architects shift through this myriad of data. In this contribution, we describe the M2S-Visual trace-driven visualization tool, a complementary addition to Multi2sim (M2S) heterogeneous system simulator. M2S-Visual provides a graphical representation of parallel program execution on the simulator. M2S is an established simulator, designed with an emphasis on simulating the execution of parallel applications on graphics processing units, and provides a number of instrumentation capabilities that enable research in architecture exploration and application characterization. This visualization framework, added to Multi2sim, aims to complement (and potentially replace) text-based statistical profiling, enabling the user to better learn and understand each software transaction executed on the simulated hardware. While M2S supports emulation of both OpenCL and CUDA programs, our visualization framework presently only supports OpenCL execution. M2S supports execution on both CPUs (X86, ARM and MIPS) and GPUs (AMD Evergreen and Southern Islands, and NVIDIA Fermi and Kepler), but presently only supports detailed visualization on a multicore X86 CPU and AMD Evergreen and Southern Islands GPUs. Besides supporting OpenCL programming and debugging, an additional goal is to deliver a reliable product for teaching the details of parallel programming execution on heterogeneous systems. Given the move to many-core architectures in the industry, this toolset is timely and addresses a growing gap in our educational infrastructure. The tool is also designed to support the research community, providing analysis of performance bottlenecks of OpenCL programs. We also incorporated the option to produce visualization graphs which provide deeper insight into application performance and hardware resource utilization.
- Nvidia gpu occupancy calculator. ttp://developer.download.nvidia.com/.Google Scholar
- AMD Graphics Cores Next (GCN) Architecture, June 2012. White paper.Google Scholar
- G. Adams. Dlxview--(preliminary) user manual. http://yara.ecn.purdue.edu/teamaaa/dlxview.Google Scholar
- N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha. Garnet: A detailed on-chip network model inside a full-system simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, pages 33--42. IEEE, 2009.Google ScholarCross Ref
- AMD. AMD Accelerated Parallel Processing OpenCL Programming Guide. http://developer.amd.com/GPU/AMDAPPSDK/, Jan. 2011.Google Scholar
- A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In IEEE International Symposium on Performance Analysis of Systems and Software, 2009., pages 163--174. IEEE, 2009.Google ScholarCross Ref
- D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. ACM SIGARCH Computer Architecture News, 25(3):13--25, 1997. Google ScholarDigital Library
- M. I. Garcia, S. Rodríguez, A. Pérez, and A. García. p88110: A graphical simulator for computer architecture and organization courses. Education, IEEE Transactions on, 52(2):248--256, 2009. Google ScholarDigital Library
- A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi. Orion 2.0: A power-area simulator for interconnection networks. Institute of Electrical and Electronics Engineers, 2011.Google Scholar
- M. Marty, B. Beckmann, L. Yen, A. Alameldeen, M. Xu, and K. Moore. Gems: Multifacet's general execution-driven multiprocessor simulator. In International Symposium on Computer Architecture, 2006.Google Scholar
- M. Mohiyuddin. Tuning hardware and software for multiprocessors. PhD thesis, University of California, Berkeley, 2012. Google ScholarDigital Library
- M. Á. V. Rodriguez, J. M. S. Pérez, and J. A. G. Pulido. An educational tool for testing caches on symmetric multiprocessors. Microprocessors and Microsystems, 25(4):187--194, 2001.Google ScholarCross Ref
- M. Schulz, B. S. White, S. A. McKee, H.-H. S. Lee, and J. Jeitner. Owl: next generation system monitoring. In Proceedings of the 2nd conference on Computing frontiers, pages 116--124. ACM, 2005. Google ScholarDigital Library
- D. Skrien. Cpu sim 3.1: A tool for simulating computer architectures for computer organization classes. Journal on Educational Resources in Computing (JERIC), 1(4):46--59, 2001. Google ScholarDigital Library
- G. Team. The gtk+ project.". http://www.gtk.org.Google Scholar
- I. Tollis, P. Eades, G. Di Battista, and L. Tollis. Graph drawing: algorithms for the visualization of graphs, volume 1. Prentice Hall New York, 1998. Google ScholarDigital Library
- R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. Multi2sim: A simulation framework for cpu-gpu computing. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pages 335--344. ACM, 2012. Google ScholarDigital Library
- R. Ubal, J. Sahuquillo, S. Petit, P. Lopez, Z. Chen, and D. R. Kaeli. The multi2sim simulation framework: A cpu-gpu model for heterogeneous computing. https://www.multi2sim.org.Google Scholar
- D. Uluski, M. Moffie, and D. Kaeli. Characterizing antivirus workload execution. ACM SIGARCH Computer Architecture News, 33(1):90--98, 2005. Google ScholarDigital Library
- M. Wilkening, V. Sridharan, S. Li, F. Previlon, S. Gurumurthi, and D. R. Kaeli. Calculating architectural vulnerability factors for spatial multi-bit transient faults. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pages 293--305. IEEE, 2014. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News, volume 23, pages 24--36. ACM, 1995. Google ScholarDigital Library
- Y. Zhang and G. B. Adams III. An interactive, visual simulator for the dlx pipeline. In Proceedings of the 1997 workshop on Computer architecture education, page 2. ACM, 1997. Google ScholarDigital Library
- A. K. Ziabari, J. L. Abéllan, R. Ubal, C. Chen, A. Joshi, and D. Kaeli. Leveraging silicon-photonic noc for designing scalable gpus. In 29th International Conference on Supercomputing. ACM, 2015. Google ScholarDigital Library
Index Terms
- Visualization of OpenCL application execution on CPU-GPU systems
Recommendations
Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
XSEDE '12: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyondThe Uintah Computational Framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow ...
Analyzing memory management methods on integrated CPU-GPU systems
ISMM '17Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates the need to ...
Analyzing memory management methods on integrated CPU-GPU systems
ISMM 2017: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory ManagementHeterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates the need to ...
Comments