Abstract
A sophisticated approach for the parallel execution of irregular applications on parallel shared memory machines is the decomposition into fine-grained tasks. These tasks can be executed using a task pool which handles the scheduling of the tasks independently of the application. In this paper we present a transparent way to profile irregular applications using task pools without modifying the source code of the application. We show that it is possible to identify critical tasks which prevent scalability and to locate bottlenecks inside the application. We show that the profiling information can be used to determine a coarse estimation of the execution time for a given number of processors.
Chapter PDF
Similar content being viewed by others
References
Hoffmann, R., Korch, M., Rauber, T.: Performance Evaluation of Task Pools Based on Hardware Synchronization. In: Proceedings of the 2004 Supercomputing Conference (SC 2004), Pittsburgh, PA (2004)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 24–36 (1995)
Hanrahan, P., Salzman, D., Aupperle, L.: A Rapid Hierarchical Radiosity Algorithm. In: Proceedings of SIGGRAPH (1991)
Brunst, H., Kranzlmüller, D., Nagel, W.E.: Tools for Scalable Parallel Program Analysis - Vampir VNG and DeWiz. In: Juhasz, Z., Kacsuk, P., Kranzlmüller, D. (eds.) DAPSYS. Kluwer International Series in Engineering and Computer Science, vol. 777, pp. 93–102. Springer, Heidelberg (2004)
Marin, G., Mellor-Crummey, J.: Cross-Architecture Performance Predictions for Scientific Applications Using Parameterized Models. In: Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems - Sigmetrics 2004, New York, NY, pp. 2–13 (June 2004)
Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 Supercomputing Conference (SC 2001), IEEE/ACM SIGARCH, p. 37 (2001)
Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active Harmony: Towards Automated Performance Tuning. In: Supercomputing 2002. Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Los Alamitos, CA, USA, pp. 1–11. IEEE Computer Society Press, Los Alamitos (2002)
Whaley, R.C., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. Technical report, University of Tennessee (1999)
Faroughi, N.: Multi-Cache Profiling of Parallel Processing Programs Using Simics. In: Arabnia, H.R. (ed.) Proceedings of the PDPTA, pp. 499–505. CSREA Press (2006)
Malony, A., Shende, S.S., Morris, A.: Phase-Based Parallel Performance Profiling. In: Joubert, G.R., Nagel, W.E., Peters, F.J., Plata, O.G., Tirado, P., Zapata, E.L. (eds.) Proceedings of the PARCO. John von Neumann Institute for Computing Series, vol. 33, pp. 203–210. Central Institute for Applied Mathematics, Jülich, Germany (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoffmann, R., Rauber, T. (2007). Profiling of Task-Based Applications on Shared Memory Machines: Scalability and Bottlenecks. In: Kermarrec, AM., Bougé, L., Priol, T. (eds) Euro-Par 2007 Parallel Processing. Euro-Par 2007. Lecture Notes in Computer Science, vol 4641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74466-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-74466-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74465-8
Online ISBN: 978-3-540-74466-5
eBook Packages: Computer ScienceComputer Science (R0)