skip to main content
10.1145/2795122.2795125acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Visualization of OpenCL application execution on CPU-GPU systems

Published:13 June 2015Publication History

ABSTRACT

Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these outputs contain a lot of detailed information relative to the logical and physical transactions about the execution, they are usually extremely large and hard to analyze. What is needed is an interface into the simulator that can help programmers and architects shift through this myriad of data. In this contribution, we describe the M2S-Visual trace-driven visualization tool, a complementary addition to Multi2sim (M2S) heterogeneous system simulator. M2S-Visual provides a graphical representation of parallel program execution on the simulator. M2S is an established simulator, designed with an emphasis on simulating the execution of parallel applications on graphics processing units, and provides a number of instrumentation capabilities that enable research in architecture exploration and application characterization. This visualization framework, added to Multi2sim, aims to complement (and potentially replace) text-based statistical profiling, enabling the user to better learn and understand each software transaction executed on the simulated hardware. While M2S supports emulation of both OpenCL and CUDA programs, our visualization framework presently only supports OpenCL execution. M2S supports execution on both CPUs (X86, ARM and MIPS) and GPUs (AMD Evergreen and Southern Islands, and NVIDIA Fermi and Kepler), but presently only supports detailed visualization on a multicore X86 CPU and AMD Evergreen and Southern Islands GPUs. Besides supporting OpenCL programming and debugging, an additional goal is to deliver a reliable product for teaching the details of parallel programming execution on heterogeneous systems. Given the move to many-core architectures in the industry, this toolset is timely and addresses a growing gap in our educational infrastructure. The tool is also designed to support the research community, providing analysis of performance bottlenecks of OpenCL programs. We also incorporated the option to produce visualization graphs which provide deeper insight into application performance and hardware resource utilization.

References

  1. Nvidia gpu occupancy calculator. ttp://developer.download.nvidia.com/.Google ScholarGoogle Scholar
  2. AMD Graphics Cores Next (GCN) Architecture, June 2012. White paper.Google ScholarGoogle Scholar
  3. G. Adams. Dlxview--(preliminary) user manual. http://yara.ecn.purdue.edu/teamaaa/dlxview.Google ScholarGoogle Scholar
  4. N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha. Garnet: A detailed on-chip network model inside a full-system simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, pages 33--42. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  5. AMD. AMD Accelerated Parallel Processing OpenCL Programming Guide. http://developer.amd.com/GPU/AMDAPPSDK/, Jan. 2011.Google ScholarGoogle Scholar
  6. A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In IEEE International Symposium on Performance Analysis of Systems and Software, 2009., pages 163--174. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. ACM SIGARCH Computer Architecture News, 25(3):13--25, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. I. Garcia, S. Rodríguez, A. Pérez, and A. García. p88110: A graphical simulator for computer architecture and organization courses. Education, IEEE Transactions on, 52(2):248--256, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi. Orion 2.0: A power-area simulator for interconnection networks. Institute of Electrical and Electronics Engineers, 2011.Google ScholarGoogle Scholar
  10. M. Marty, B. Beckmann, L. Yen, A. Alameldeen, M. Xu, and K. Moore. Gems: Multifacet's general execution-driven multiprocessor simulator. In International Symposium on Computer Architecture, 2006.Google ScholarGoogle Scholar
  11. M. Mohiyuddin. Tuning hardware and software for multiprocessors. PhD thesis, University of California, Berkeley, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Á. V. Rodriguez, J. M. S. Pérez, and J. A. G. Pulido. An educational tool for testing caches on symmetric multiprocessors. Microprocessors and Microsystems, 25(4):187--194, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Schulz, B. S. White, S. A. McKee, H.-H. S. Lee, and J. Jeitner. Owl: next generation system monitoring. In Proceedings of the 2nd conference on Computing frontiers, pages 116--124. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Skrien. Cpu sim 3.1: A tool for simulating computer architectures for computer organization classes. Journal on Educational Resources in Computing (JERIC), 1(4):46--59, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Team. The gtk+ project.". http://www.gtk.org.Google ScholarGoogle Scholar
  16. I. Tollis, P. Eades, G. Di Battista, and L. Tollis. Graph drawing: algorithms for the visualization of graphs, volume 1. Prentice Hall New York, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. Multi2sim: A simulation framework for cpu-gpu computing. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pages 335--344. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Ubal, J. Sahuquillo, S. Petit, P. Lopez, Z. Chen, and D. R. Kaeli. The multi2sim simulation framework: A cpu-gpu model for heterogeneous computing. https://www.multi2sim.org.Google ScholarGoogle Scholar
  19. D. Uluski, M. Moffie, and D. Kaeli. Characterizing antivirus workload execution. ACM SIGARCH Computer Architecture News, 33(1):90--98, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Wilkening, V. Sridharan, S. Li, F. Previlon, S. Gurumurthi, and D. R. Kaeli. Calculating architectural vulnerability factors for spatial multi-bit transient faults. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pages 293--305. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News, volume 23, pages 24--36. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Zhang and G. B. Adams III. An interactive, visual simulator for the dlx pipeline. In Proceedings of the 1997 workshop on Computer architecture education, page 2. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. K. Ziabari, J. L. Abéllan, R. Ubal, C. Chen, A. Joshi, and D. Kaeli. Leveraging silicon-photonic noc for designing scalable gpus. In 29th International Conference on Supercomputing. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Visualization of OpenCL application execution on CPU-GPU systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WCAE '15: Proceedings of the Workshop on Computer Architecture Education
            June 2015
            64 pages
            ISBN:9781450337175
            DOI:10.1145/2795122

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 June 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            WCAE '15 Paper Acceptance Rate9of10submissions,90%Overall Acceptance Rate9of10submissions,90%

            Upcoming Conference

            ISCA '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader