ABSTRACT
A detailed and accurate characterization of the parallelism of applications is essential for predicting their wall-time on different platforms, both for an application running in isolation and for a set of consolidated applications executing on the same platform. However, prevailing profilers are often based on sampling and do not provide exact information on the parallelism of the profiled application. In this paper we present a novel profiler that logs all thread scheduling activities within the operating system kernel. These logs enable us to accurately characterize applications' parallelism on a given platform by computing the number of threads that are active at each moment. We also present a simple mathematical prediction model to estimate wall-time for program execution on a k2-core machine using profiles collected using a k1-core machine (of the same architecture and running at the same clock speed). We use our profiler to assess the parallelism of several CPU-bound DaCapo benchmarks and evaluate the accuracy of our prediction model.
- D. Ansaloni, L. Y. Chen, E. Smirni, and W. Binder. Model-driven consolidation of Java workloads on multicores. In Proc. of DSN, pages 229--234, 2012. Google ScholarDigital Library
- W. Binder, J. Hulaas, P. Moret, and A. Villazón. Platform-independent profiling in a virtual execution environment. Softw., Pract. Exper., 39(1):47--79, 2009. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. SIGPLAN Not., 41(10):169--190, Oct. 2006. Google ScholarDigital Library
- L. Y. Chen, D. Ansaloni, E. Smirni, A. Yokokawa, and W. Binder. Achieving application-centric performance targets via consolidation on multicores: Myth or reality? In Proc. of HPDC, pages 229--234, 2012. Google ScholarDigital Library
- J. Cook, J. Cook, and W. Alkohlani. A statistical performance model of the Opteron processor. SIGMETRICS Perform. Eval. Rev., 38(4):75--80, Mar. 2011. Google ScholarDigital Library
- T. Dey, W. Wang, J. Davidson, and M. Soffa. Characterizing multi-threaded applications based on shared-resource contention. In Proc. of ISPASS, pages 76--86, 2011. Google ScholarDigital Library
- A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. SIGPLAN Not., 42(10):57--76, Oct. 2007. Google ScholarDigital Library
- A. Grama, G. Karypis, V. Kumar, and A. Gupta. Introduction to Parallel Computing. Pearson Education, 2003.Google Scholar
- M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. IEEE COMPUTER, 2008. Google ScholarDigital Library
- D. Jeon, S. Garcia, C. Louie, and M. B. Taylor. Kismet: parallel speedup estimates for serial programs. SIGPLAN Not., 46(10):519--536, Oct. 2011. Google ScholarDigital Library
- T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! SIGPLAN Not., 44(3):265--276, Mar. 2009. Google ScholarDigital Library
- A. Peternier, D. Bonetta, W. Binder, and C. Pautasso. Overseer: Low-level hardware monitoring and management for Java. In Proc. of PPPJ, pages 143--146, Denmark, 2011. Google ScholarDigital Library
- N. Tallent and J. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In Proc. of PPoPP, pages 229--240, 2009. Google ScholarDigital Library
- A. Wong, D. Rexachs, and E. Luque. Pas2p tool, parallel application signature for performance prediction. In Proc. of PARA, pages 293--302, 2012. Google ScholarDigital Library
- R. Yang, J. Antony, and A. P. Rendell. A simple performance model for multithreaded applications executing on non-uniform memory access computers. In Proc. of HPCC, pages 79--86, 2009. Google ScholarDigital Library
Index Terms
- Parallelism profiling and wall-time prediction for multi-threaded applications
Recommendations
Cross-Accelerator Performance Profiling
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at ScaleThe computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the ...
Enabling Profiling For SYCL Applications
IWOCL '18: Proceedings of the International Workshop on OpenCLSince GPGPU devices have become mainstream, more and more software is being written to target many-core devices. Developers are now required to think in parallel in order to run applications with maximum performance, however, the ability to target a ...
Low overhead and context sensitive profiling of GPU-accelerated applications
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingAs we near the end of Moore's law scaling, the next-generation computing platforms are increasingly exploring heterogeneous processors for acceleration. Graphics Processing Units (GPUs) are the most widely used accelerators. Meanwhile, applications are ...
Comments