skip to main content
column

Towards autotuning by alternating communication methods

Published: 08 October 2012 Publication History

Abstract

Interconnects in emerging high performance computing systems feature hardware support for one-sided, asynchronous communication and global address space programming models in order to improve parallel efficiency and productivity by allowing communication and computation overlap and outof- order delivery. In practice though, complex interactions between the software stack and the communication hardware make it challenging to obtain optimum performance for a full application expressed with a one-sided programming paradigm. Here, we present a proof-of-concept study for an autotuning framework that instantiates hybrid kernels based on refactored codes using available communication libraries or languages on a Cray XE6 and a SGI Altix UV 1000. We validate our approach by improving performance for bandwidth- and latency-bound kernels of interest in quantum physics and astrophysics by up to 35% and 80% respectively.

References

[1]
Intel MPI Benchmarks. User Guide and Methodology Description. 2008. http://software.intel.com/enus/articles/intel-mpi-benchmarks/.
[2]
SGI Altix UV GRU Development Kit Programmers Guide (007-5668-003). 2010. http://techpubs.sgi.com/.
[3]
UPC Operations Microbenchmarking Suite 1.0. User's Manual. May 2010. CESGA. http://forge.cesga.es/projects/uoms.
[4]
Using the GNI and DMAPP APIs (S-2446-31). Jun 2010. Available at http://docs.cray.com.
[5]
K. E. Batcher. Sorting Networks and their Applications. In AFIPS '68 (Spring), pages 307--314, New York, NY, USA, 1968. ACM.
[6]
Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. Introducing OpenSHMEM: SHMEM for the PGAS Community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS '10, New York, NY, USA, 2010. ACM.
[7]
Wei-Yu Chen, Dan Bonachea, Costin Iancu, and Katherine Yelick. Automatic Nonblocking Communication for Partitioned Global Address Space Programs. In Proceedings of the 21st annual international conference on Supercomputing, ICS '07, pages 158--167, NY, USA, 2007. ACM.
[8]
Anthony Danalis, Lori Pollock, Martin Swany, and John Cavazos. MPI-aware Compiler Optimizations for Improving Communication-computation Overlap. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 316--325, New York, NY, USA, 2009. ACM.
[9]
James Dinan, Pavan Balaji, Ewing Lusk, P. Sadayappan, and Rajeev Thakur. Hybrid Parallel Programming with MPI and Unified Parallel C. In Proceedings of the 7th ACM international conference on Computing frontiers, CF '10, pages 177--186, New York, NY, USA, 2010. ACM.
[10]
Michael A. Heroux, Douglas W. Doerfler, Paul S. Crozier, James M. Willenbring, H. Carter Edwards, Alan Williams, Mahesh Rajan, Eric R. Keiter, Heidi K. Thornquist, and Robert W. Numrich. Improving Performance via Mini-applications. Technical report, Sandia National Laboratories, 2009.
[11]
Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, and Wolfgang Rehm. Optimizing a Conjugate Gradient Solver with Non-blocking Collective Operations. Parallel Computing, 33(9):624--633, 2007.
[12]
S. Kamil, Cy Chan, L. Oliker, J. Shalf, and S. Williams. An Auto-tuning Framework for Parallel Multicore Stencil Computations. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12, 2010.
[13]
Yinan Li, Jack Dongarra, Stanimire Tomov, Gabrielle Allen, Jaroslaw Nabrzyski, Edward Seidel, Geert van Albada, and Peter Sloot. Computational Science -- ICCS 2009, volume 5544, chapter A Note on Auto-tuning GEMM for GPUs, pages 884--892. Springer, 2009.
[14]
R. Moessner and S. L. Sondhi. Ising Models of Quantum Frustration. Phys. Rev. B, 63:224401, May 2001.
[15]
Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--13, Washington, DC, USA, 2010. IEEE Computer Society.
[16]
Markus Püschel, Franz Franchetti, and Yevgen Voronenko. Encyclopedia of Parallel Computing, chapter Spiral. Springer, 2011.
[17]
Speck, R., Gibbon, P., and Hoffmann, M. Efficiency and Scalability of the Parallel Barnes-Hut Tree Code PEPC. Advances in Parallel Computing, pp. 35--42, 2010.
[18]
Vinod Tipparaju and Jarek Nieplocha. Optimizing All-to-all Collective Communication by Exploiting Concurrency in Modern Networks. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, SC '05, Washington, DC, USA, 2005. IEEE Computer Society.

Cited By

View all
  • (2011)Towards autotuning by alternating communication methodsProceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems10.1145/2088457.2088464(9-10)Online publication date: 13-Nov-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 2
September 2012
129 pages
ISSN:0163-5999
DOI:10.1145/2381056
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2012
Published in SIGMETRICS Volume 40, Issue 2

Check for updates

Author Tags

  1. PGAS
  2. autotuning
  3. one-sided communication

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Towards autotuning by alternating communication methodsProceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems10.1145/2088457.2088464(9-10)Online publication date: 13-Nov-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media