skip to main content
10.1145/2616498.2616541acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Performance of Applications using Dual-Rail InfiniBand 3D Torus network on the Gordon Supercomputer

Published: 13 July 2014 Publication History

Abstract

Multi-rail InfiniBand networks provide options to improve bandwidth, increase reliability, and lower latency for multi-core nodes. The Gordon supercomputer at SDSC, with its dual-rail InfiniBand 3-D torus network, is used to evaluate the performance impact of using multiple rails. The study was performed using the OSU micro-benchmarks, the P3FFT application kernel, and scientific applications LAMMPS and AMBER. The micro-benchmarks confirmed the bandwidth and latency performance benefits. At the application level, performance improvements depended on the communication level and profile.

References

[1]
Liu, J., Vishnu, A. and Panda, D.K. 2004. Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing (SC '04). IEEE Computer Society, Washington, DC, USA, 33-. DOI=10.1109/SC.2004.15 http://dx.doi.org/10.1109/SC.2004.15.
[2]
Raikar, S.P., Subramoni, H., Kandalla, K., Vienne, J., Panda, D.K. "Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters," Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, vol., no., pp.1160,1167, 21--25 May 2012 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6270768&isnumber=6270391
[3]
HPC Advisory Council, "NAMD Performance Benchmark and Profiling", 2011.
[4]
Schreiber, O., Raymond, M. and Kodiyala, S., "LS-DYNA® Performance Improvements with Multi-Rail MPI on SGI® Altix® ICE clusters", 10th International LS-DYNA® Users Conference, June 8, 2008. http://www.dynalook.com/international-conf-2008/ComputingTechnology-5.pdf.
[5]
Liu, J. "LAMMPS on Advanced SGI® Architectures", SGI Whitepaper, https://www.sgi.com/pdfs/4314.pdf.
[6]
Nukada, A., Sato, K. and Matsuoka, S. 2012. Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 44, 10 pages.
[7]
Hatazaki, T., "Tsubame-2 - a 2.4 PFLOPS peak performance system," Optical Fiber Communication Conference and Exposition (OFC/NFOEC), 2011 and the National Fiber Optic Engineers Conference, vol., no., pp.1,3, 6--10 March 2011, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5875682&isnumber=5875055.
[8]
Pekurovsky, D. "P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions", SIAM Journal on Scientific Computing 2012, Vol. 34, No. 4, pp. C192--C209.
[9]
Vetter, J.S. and M.O. McCracken, "Statistical Scalability Analysis of Communication Operations in Distributed Applications," Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP), 2001.
[10]
SPC/E benchmark is available from http://lammps.sandia.gov/bench.html.

Cited By

View all
  • (2020)Performance drop at executing communication-intensive parallel algorithmsThe Journal of Supercomputing10.1007/s11227-019-03142-876:9(6834-6859)Online publication date: 1-Sep-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment
July 2014
445 pages
ISBN:9781450328937
DOI:10.1145/2616498
  • General Chair:
  • Scott Lathrop,
  • Program Chair:
  • Jay Alameda
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • NSF: National Science Foundation
  • Drexel University
  • Indiana University: Indiana University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3-D Torus network
  2. Application performance
  3. Benchmarks
  4. Dual-rail

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

XSEDE '14

Acceptance Rates

XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;
Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Performance drop at executing communication-intensive parallel algorithmsThe Journal of Supercomputing10.1007/s11227-019-03142-876:9(6834-6859)Online publication date: 1-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media