column

Towards autotuning by alternating communication methods

Authors:

Thomas C. SchulthessAuthors Info & Claims

ACM SIGMETRICS Performance Evaluation Review, Volume 40, Issue 2

Pages 80 - 85

https://doi.org/10.1145/2381056.2381075

Published: 08 October 2012 Publication History

Abstract

Interconnects in emerging high performance computing systems feature hardware support for one-sided, asynchronous communication and global address space programming models in order to improve parallel efficiency and productivity by allowing communication and computation overlap and outof- order delivery. In practice though, complex interactions between the software stack and the communication hardware make it challenging to obtain optimum performance for a full application expressed with a one-sided programming paradigm. Here, we present a proof-of-concept study for an autotuning framework that instantiates hybrid kernels based on refactored codes using available communication libraries or languages on a Cray XE6 and a SGI Altix UV 1000. We validate our approach by improving performance for bandwidth- and latency-bound kernels of interest in quantum physics and astrophysics by up to 35% and 80% respectively.

References

[1]

Intel MPI Benchmarks. User Guide and Methodology Description. 2008. http://software.intel.com/enus/articles/intel-mpi-benchmarks/.

[2]

SGI Altix UV GRU Development Kit Programmers Guide (007-5668-003). 2010. http://techpubs.sgi.com/.

[3]

UPC Operations Microbenchmarking Suite 1.0. User's Manual. May 2010. CESGA. http://forge.cesga.es/projects/uoms.

[4]

Using the GNI and DMAPP APIs (S-2446-31). Jun 2010. Available at http://docs.cray.com.

[5]

K. E. Batcher. Sorting Networks and their Applications. In AFIPS '68 (Spring), pages 307--314, New York, NY, USA, 1968. ACM.

Digital Library

[6]

Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. Introducing OpenSHMEM: SHMEM for the PGAS Community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS '10, New York, NY, USA, 2010. ACM.

Digital Library

[7]

Wei-Yu Chen, Dan Bonachea, Costin Iancu, and Katherine Yelick. Automatic Nonblocking Communication for Partitioned Global Address Space Programs. In Proceedings of the 21st annual international conference on Supercomputing, ICS '07, pages 158--167, NY, USA, 2007. ACM.

Digital Library

[8]

Anthony Danalis, Lori Pollock, Martin Swany, and John Cavazos. MPI-aware Compiler Optimizations for Improving Communication-computation Overlap. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 316--325, New York, NY, USA, 2009. ACM.

Digital Library

[9]

James Dinan, Pavan Balaji, Ewing Lusk, P. Sadayappan, and Rajeev Thakur. Hybrid Parallel Programming with MPI and Unified Parallel C. In Proceedings of the 7th ACM international conference on Computing frontiers, CF '10, pages 177--186, New York, NY, USA, 2010. ACM.

Digital Library

[10]

Michael A. Heroux, Douglas W. Doerfler, Paul S. Crozier, James M. Willenbring, H. Carter Edwards, Alan Williams, Mahesh Rajan, Eric R. Keiter, Heidi K. Thornquist, and Robert W. Numrich. Improving Performance via Mini-applications. Technical report, Sandia National Laboratories, 2009.

[11]

Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, and Wolfgang Rehm. Optimizing a Conjugate Gradient Solver with Non-blocking Collective Operations. Parallel Computing, 33(9):624--633, 2007.

Digital Library

[12]

S. Kamil, Cy Chan, L. Oliker, J. Shalf, and S. Williams. An Auto-tuning Framework for Parallel Multicore Stencil Computations. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12, 2010.

[13]

Yinan Li, Jack Dongarra, Stanimire Tomov, Gabrielle Allen, Jaroslaw Nabrzyski, Edward Seidel, Geert van Albada, and Peter Sloot. Computational Science -- ICCS 2009, volume 5544, chapter A Note on Auto-tuning GEMM for GPUs, pages 884--892. Springer, 2009.

Digital Library

[14]

R. Moessner and S. L. Sondhi. Ising Models of Quantum Frustration. Phys. Rev. B, 63:224401, May 2001.

[15]

Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--13, Washington, DC, USA, 2010. IEEE Computer Society.

Digital Library

[16]

Markus Püschel, Franz Franchetti, and Yevgen Voronenko. Encyclopedia of Parallel Computing, chapter Spiral. Springer, 2011.

Digital Library

[17]

Speck, R., Gibbon, P., and Hoffmann, M. Efficiency and Scalability of the Parallel Barnes-Hut Tree Code PEPC. Advances in Parallel Computing, pp. 35--42, 2010.

[18]

Vinod Tipparaju and Jarek Nieplocha. Optimizing All-to-all Collective Communication by Exploiting Concurrency in Modern Networks. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, SC '05, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

Cited By

Tineo AAlam SSchulthess TJarvis S(2011)Towards autotuning by alternating communication methodsProceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems10.1145/2088457.2088464(9-10)Online publication date: 13-Nov-2011
https://dl.acm.org/doi/10.1145/2088457.2088464

Index Terms

Towards autotuning by alternating communication methods

Recommendations

Towards autotuning by alternating communication methods
PMBS '11: Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems

Interconnects in emerging high performance computing systems feature hardware support for one-sided, asynchronous communication and global address space programming models in order to improve parallel efficiency and productivity by allowing ...
Productivity and performance using partitioned global address space languages
PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computation

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a ...
Asynchronous PGAS runtime for Myrinet networks
PGAS '10: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model

PGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed for scalability and ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review

ACM SIGMETRICS Performance Evaluation Review Volume 40, Issue 2

September 2012

129 pages

ISSN:0163-5999

DOI:10.1145/2381056

Issue’s Table of Contents

Copyright © 2012 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2012

Published in SIGMETRICS Volume 40, Issue 2

Check for updates

Author Tags

Qualifiers

Column

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
84
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tineo AAlam SSchulthess TJarvis S(2011)Towards autotuning by alternating communication methodsProceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems10.1145/2088457.2088464(9-10)Online publication date: 13-Nov-2011
https://dl.acm.org/doi/10.1145/2088457.2088464

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents