skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tail queues: A multi-threaded matching architecture

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.5158· OSTI ID:1496973
ORCiD logo [1];  [1];  [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)

As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1496973
Report Number(s):
SAND-2019-1466J; 672473
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 32, Issue 3; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 4 works
Citation information provided by
Web of Science

References (23)

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming journal February 2010
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications conference November 2014
An evaluation of MPI message rate on hybrid-core processors journal November 2014
Adaptive and Dynamic Design for MPI Tag Matching conference September 2016
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors conference May 2017
FG-MPI: Fine-grain MPI for multicore and clusters
  • Kamal, Humaira; Wagner, Alan
  • Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2010.5470773
conference April 2010
Enabling communication concurrency through flexible MPI endpoints journal September 2014
Synchronization without contention journal April 1991
Characterizing MPI matching via trace-based simulation conference January 2017
Multigrid Smoothers for Ultraparallel Computing journal January 2011
A high-performance, portable implementation of the MPI message passing interface standard journal September 1996
CHARM++: a portable concurrent object oriented system based on C++
  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93 https://doi.org/10.1145/165854.165874
conference January 1993
MPI+Threads: runtime contention and remedies journal January 2015
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
  • Vaidyanathan, Karthikeyan; Kalamkar, Dhiraj D.; Pamnany, Kiran
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807602
conference January 2015
Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor conference August 2015
Thread-safety in an MPI implementation: Requirements and analysis journal September 2007
Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints
  • Sridharan, Srinivas; Dinan, James; Kalamkar, Dhiraj D.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.45
conference November 2014
Scalable parallel programming with CUDA introduction conference August 2008
A fast and resource-conscious MPI message queue mechanism for large-scale jobs journal January 2014
Synchronization without contention journal April 1991
MPI+Threads: runtime contention and remedies
  • Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie
  • PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/2688500.2688522
conference January 2015
Synchronization without contention journal April 1991
Characterizing MPI matching via trace-based simulation journal September 2018

Cited By (1)

PAMPAR: A new parallel benchmark for performance and energy consumption evaluation
  • Marques Garcia, Adriano; Schepke, Claudio; Girardi, Alessandro
  • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20 https://doi.org/10.1002/cpe.5504
journal October 2019

Similar Records

Programming future architectures : dusty decks, memory walls, and the speed of light.
Conference · Mon Aug 01 00:00:00 EDT 2005 · OSTI ID:1496973

Approximate Weighted Matching On Emerging Manycore and Multithreaded Architectures
Journal Article · Fri Nov 30 00:00:00 EST 2012 · International Journal of High Performance Computing Applications, 26 (4 ):413-430 · OSTI ID:1496973

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1496973