A Case Study of Communication Optimizations on 3D Mesh Interconnects

Bhatelé, Abhinav; Bohm, Eric; Kalé, Laxmikant V.

doi:10.1007/978-3-642-03869-3_94

Abhinav Bhatelé¹⁷,
Eric Bohm¹⁷ &
Laxmikant V. Kalé¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5704))

Included in the following conference series:

European Conference on Parallel Processing

1310 Accesses

Abstract

Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops traveled. Yet, we, and others have recently shown that in presence of contention, message latencies can grow substantially large. Hence task mapping strategies should take the topology of the machine into account on large machines. In this paper, we present topology aware mapping as a technique to optimize communication on 3-dimensional mesh interconnects and hence improve performance.

Our methodology is facilitated by the idea of object-based decomposition used in Charm++ which separates the processes of decomposition from mapping of computation to processors and allows a more flexible mapping based on communication patterns between objects. Exploiting this and the topology of the allocated job partition, we present mapping strategies for a production code, OpenAtom to improve overall performance and scaling. OpenAtom presents complex communication scenarios of interaction involving multiple groups of objects and makes the mapping task a challenge. Results are presented for OpenAtom on up to 16,384 processors of Blue Gene/L, 8,192 processors of Blue Gene/P and 2,048 processors of Cray XT3.

Download to read the full chapter text

Chapter PDF

TAMM: A New Topology-Aware Mapping Method for Parallel Applications on the Tianhe-2A Supercomputer

Netloc: A Tool for Topology-Aware Process Mapping

Optimal circulant graphs as low-latency network topologies

Article 21 March 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Greenberg, R.I., Oh, H.C.: Universal wormhole routing. IEEE Transactions on Parallel and Distributed Systems 08(3), 254–262 (1997)
Article Google Scholar
Ni, L.M., McKinley, P.K.: A survey of wormhole routing techniques in direct networks. Computer 26(2), 62–76 (1993)
Article Google Scholar
Bhatele, A., Kale, L.V.: An Evaluation of the Effect of Interconnect Topologies on Message Latencies in Large Supercomputers. In: Proceedings of Workshop on Large-Scale Parallel Processing (IPDPS 2009) (May 2009)
Google Scholar
Kalé, L., Krishnan, S.: CHARM++: A Portable Concurrent Object Oriented System Based on C++. In: Paepcke, A. (ed.) Proceedings of OOPSLA 1993, September 1993, pp. 91–108. ACM Press, New York (1993)
Google Scholar
Bhandarkar, M., Kale, L.V., de Sturler, E., Hoeflinger, J.: Object-Based Adaptive Load Balancing for MPI Programs. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2074, pp. 108–117. Springer, Heidelberg (2001)
Chapter Google Scholar
Pasquarello, A., Hybertsen, M.S., Car, R.: Interface structure between silicon and its oxide by first-principles molecular dynamics. Nature 396, 58 (1998)
Article Google Scholar
De Santis, L., Carloni, P.: Serine proteases: An ab initio molecular dynamics study. Proteins 37, 611 (1999)
Article Google Scholar
Saitta, A.M., Soper, P.D., Wasserman, E., Klein, M.L.: Influence of a knot on the strength of a polymer strand. Nature 399, 46 (1999)
Article Google Scholar
Rothlisberger, U., Carloni, P., Doclo, K., Parinello, M.: A comparative study of galactose oxidase and active site analogs based on QM/MM Car Parrinello simulations. J. Biol. Inorg. Chem. 5, 236 (2000)
Article Google Scholar
Bokhari, S.H.: On the mapping problem. IEEE Trans. Computers 30(3), 207–214 (1981)
Article MathSciNet Google Scholar
Lee, S.Y., Aggarwal, J.K.: A mapping strategy for parallel processing. IEEE Trans. Computers 36(4), 433–442 (1987)
Google Scholar
Ercal, F., Ramanujam, J., Sadayappan, P.: Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the 3rd conference on Hypercube concurrent computers and applications, pp. 210–221. ACM Press, New York (1988)
Google Scholar
Berman, F., Snyder, L.: On mapping parallel algorithms into parallel architectures. Journal of Parallel and Distributed Computing 4(5), 439–458 (1987)
Article Google Scholar
Bollinger, S.W., Midkiff, S.F.: Processor and link assignment in multicomputers using simulated annealing. In: ICPP (1), pp. 1–7 (1988)
Google Scholar
Arunkumar, S., Chockalingam, T.: Randomized heuristics for the mapping problem. International Journal of High Speed Computing (IJHSC) 4(4), 289–300 (1992)
Article MATH Google Scholar
Bhanot, G., Gara, A., Heidelberger, P., Lawless, E., Sexton, J.C., Walkup, R.: Optimizing task layout on the Blue Gene/L supercomputer. IBM Journal of Research and Development 49(2/3), 489–500 (2005)
Article Google Scholar
Gygi, F., Draeger, E.W., Schulz, M., Supinski, B.R.D., Gunnels, J.A., Austel, V., Sexton, J.C., Franchetti, F., Kral, S., Ueberhuber, C., Lorenz, J.: Large-Scale Electronic Structure Calculations of High-Z Metals on the Blue Gene/L Platform. In: Proceedings of the International Conference in Supercomputing. ACM Press, New York (2006)
Google Scholar
Bhatelé, A., Kalé, L.V., Kumar, S.: Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications. In: 23rd ACM International Conference on Supercomputing (2009)
Google Scholar
Smith, B.E., Bode, B.: Performance Effects of Node Mappings on the IBM Blue Gene/L Machine. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 1005–1013. Springer, Heidelberg (2005)
Chapter Google Scholar
Yu, H., Chung, I.H., Moreira, J.: Topology mapping for Blue Gene/L supercomputer. In: SC 2006: Proceedings of the, ACM/IEEE conference on Supercomputing, p. 116. ACM, New York (2006)
Chapter Google Scholar
Weisser, D., Nystrom, N., Vizino, C., Brown, S.T., Urbanic, J.: Optimizing Job Placement on the Cray XT3. In: 48th Cray User Group Proceedings (2006)
Google Scholar
Bhatelé, A., Kalé, L.V.: Benefits of Topology Aware Mapping for Mesh Interconnects. Parallel Processing Letters (Special issue on Large-Scale Parallel Processing) 18(4), 549–566 (2008)
MathSciNet Google Scholar
Bohm, E., Bhatele, A., Kale, L.V., Tuckerman, M.E., Kumar, S., Gunnels, J.A., Martyna, G.J.: Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM Journal of Research and Development: Applications of Massively Parallel Systems 52(1/2), 159–174 (2008)
Article Google Scholar
IBM Blue Gene Team: Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development 52(1/2) (2008)
Google Scholar
Tuckerman, M.E.: Ab initio molecular dynamics: Basic concepts, current trends and novel applications. J. Phys. Condensed Matter 14, R1297 (2002)
Article MathSciNet Google Scholar
Dongarra, J., Luszczek, P.: Introduction to the HPC Challenge Benchmark Suite. Technical Report UT-CS-05-544, University of Tennessee, Dept. of Computer Science (2005)
Google Scholar
Salapura, V., Ganesan, K., Gara, A., Gschwind, M., Sexton, J., Walkup, R.: Next-Generation Performance Counters: Towards Monitoring Over Thousand Concurrent Events. In: IEEE International Symposium on Performance Analysis of Systems and Software, April 2008, pp. 139–146 (2008)
Google Scholar
Catlett, C., et al.: TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications. In: Grandinetti, L. (ed.) HPC and Grids in Action. IOS Press, Amsterdam (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Abhinav Bhatelé, Eric Bohm & Laxmikant V. Kalé

Authors

Abhinav Bhatelé
View author publications
You can also search for this author in PubMed Google Scholar
Eric Bohm
View author publications
You can also search for this author in PubMed Google Scholar
Laxmikant V. Kalé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Technology, Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Henk Sips , Dick Epema & Hai-Xiang Lin , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhatelé, A., Bohm, E., Kalé, L.V. (2009). A Case Study of Communication Optimizations on 3D Mesh Interconnects. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_94

Download citation

DOI: https://doi.org/10.1007/978-3-642-03869-3_94
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics