Abstract
Modern HPC systems, such as Cray’s XE and IBM’s Blue Gene line, feature sophisticated network architectures, often in the form of high dimensional tori. In order to fully exploit the performance of these systems, it is necessary to carefully map an application’s communication structure to the underlying network topology. In this step, both latency (i.e., physical distance between nodes) and bandwidth (i.e., number of concurrently used links) have to be taken into account, leading to mappings that are often non-intuitive. To help developers with this complex problem, we are developing a set of tools that aim at helping users understand the communication behavior of their codes, map them onto the network architecture, and create better-performing topology-aware node mappings. In this paper, we present initial steps towards this goal, including a measurement environment capturing both communication patterns and network metrics within the same run, a methodology to compare these measurements, and a visualization tool that helps users understand the impact of their application’s characteristics on the network behavior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aleliunas, R., Rosenberg, A.L.: On embedding rectangular grids in square grids. IEEE Trans. Comput. 31(9), 907–913 (1982)
Bhanot, G., Gara, A., Heidelberger, P., Lawless, E., Sexton, J.C., Walkup, R.: Optimizing task layout on the Blue Gene/L supercomputer. IBM J. Res. Dev. 49(2/3), 489–500 (2005)
Bhatele, A.: Automating topology aware mapping for supercomputers. Ph.D. thesis, Department of Computer Science, University of Illinois. http://hdl.handle.net/2142/16578 (2010)
Bhatele, A., Bohm, E., Kale, L.V.: Optimizing communication for charm++ applications by reducing network contention. Concurr. Comput. Pract. Exp. 23(2), 211–222 (2011)
Bhatele, A., Kalé, L.V., Kumar, S.: Dynamic topology aware load balancing algorithms for molecular dynamics applications. In: 23rd ACM International Conference on Supercomputing, Yorktown Heights, NY (2009)
Bokhari, S.H.: On the mapping problem. IEEE Trans. Comput. 30(3), 207–214 (1981)
Evans, C.C.: The official YAML web site. http://yaml.org/ (2011)
Falgout, R., Yang, U.: Hypre: a library of high performance preconditioners. In: Proceedings of the International Conference on Computational Science (ICCS), Amesterdam, The Netherlands. Part III, Lecture Notes in Computer Science, vol. 2331, pp. 632–641 (2002)
Gahvari, H., Baker, A., Schulz, M., Yang, U.M., Jordan, K., Gropp, W.: Scalable fine-grained call path tracing. In: Proceedings of the International Conference on Supercomputing, Tucson, AZ (2011)
Gygi, F., Draeger, E., Schulz, M., de Supinski, B., Gunnels, J., Austel, V., Sexton, J., Franchetti, F., Kral, S., Lorenz, J., Überhuber, C.: Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform. In: Proceedings of IEEE/ACM Supercomputing 2006 (SC06), Tamp, FL (2006)
Kale, L.V., Zheng, G., Lee, C.W., Kumar, S.: Scaling applications to massively parallel machines using projections performance analysis tool. Future Gener. Comput. Syst. 22, 347–358 (2006). Special issue on: Large-scale system performance modeling and analysis
Lee, S.-Y., Aggarwal, J.K.: A mapping strategy for parallel processing. IEEE Trans. Comput. 36(4), 433–442 (1987)
Mohr, B., Wolf, F.: KOJAK – a tool set for automatic performance analysis of parallel programs. In: Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par 2003), Klagenfurt, Austria, pp. 1301–1304 (2003)
Schulz, M., de Supinski, B.R.: P N MPI Tools: a whole lot greater than the sum of their parts. In: Proceedings of Supercomputing 2007 (SC07), Reno, NV (2007)
Schulz, M., Levine, J.A., Bremer, P.T., Gamblin, T., Pascucci, V.: Interpreting performance data across intuitive domains. In: International Conference on Parallel Processing (ICPP), Taipei City, Taiwan (2011)
Spear, W., Malony, A.D., Lee, C.W., Biersdorff, S., Shende, S.: An approach to creating performance visualizations in a parallel profile analysis tool. In: Workshop on Productivity and Performance (PROPER), Bordeaux, France (2011)
Wolf, F., Wylie, B., Abraham, E., Becker, D., Frings, W., Fuerlinger, K., Geimer, M., Hermanns, M.A., Mohr, B., Moore, S., Szebenyi, Z.: Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications. In: Proceedings of the 2nd HLRS Parallel Tools Workshop, Stuttgart. http://www.cs.utk.edu/~karl/research/pubs/WOLF_2008_Scalasca.pdf (2008)
Yu, H., Chung, I.H., Moreira, J.: Topology mapping for blue Gene/L supercomputer. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 116. ACM, New York (2006). http://doi.acm.org/10.1145/1188455.1188576
Acknowledgements
This work was partially performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-508551).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schulz, M. et al. (2012). Creating a Tool Set for Optimizing Topology-Aware Node Mappings. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds) Tools for High Performance Computing 2011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31476-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-31476-6_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31475-9
Online ISBN: 978-3-642-31476-6
eBook Packages: Computer ScienceComputer Science (R0)