Tail queues: A multi-threaded matching architecture

Dosanjh, Matthew G. F.; Grant, Ryan E.; Schonbein, Whit; Bridges, Patrick G.

doi:10.1002/cpe.5158

Title: Tail queues: A multi-threaded matching architecture

Journal Article · Wed Feb 06 00:00:00 EST 2019 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.5158· OSTI ID:1496973

^[1]; Grant, Ryan E. ^[1]; Schonbein, Whit ^[1]; Bridges, Patrick G. ^[2]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
Univ. of New Mexico, Albuquerque, NM (United States)

As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1496973

Report Number(s):: SAND-2019-1466J; 672473

Journal Information:: Concurrency and Computation. Practice and Experience, Vol. 32, Issue 3; ISSN 1532-0626

Publisher:: WileyCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 4 works

Citation information provided by
Web of Science

References (23)

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming Balaji, Pavan; Buntinas, Darius; Goodell, David The International Journal of High Performance Computing Applications, Vol. 24, Issue 1 https://doi.org/10.1177/1094342009360206	journal	February 2010
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan E. 2014 Workshop on Exascale MPI at Supercomputing Conference (ExaMPI) https://doi.org/10.1109/ExaMPI.2014.6	conference	November 2014
An evaluation of MPI message rate on hybrid-core processors Barrett, Brian W.; Brightwell, Ron; Grant, Ryan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014552085	journal	November 2014
Adaptive and Dynamic Design for MPI Tag Matching Bayatpour, M.; Subramoni, H.; Chakraborty, S. 2016 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2016.69	conference	September 2016
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors Klenk, Benjamin; Froening, Holger; Eberle, Hans 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2017.94	conference	May 2017
FG-MPI: Fine-grain MPI for multicore and clusters Kamal, Humaira; Wagner, Alan Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2010.5470773	conference	April 2010
Enabling communication concurrency through flexible MPI endpoints Dinan, James; Grant, Ryan E.; Balaji, Pavan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014548772	journal	September 2014
Synchronization without contention Mellor-Crummey, John M.; Scott, Michael L. ACM SIGPLAN Notices, Vol. 26, Issue 4 https://doi.org/10.1145/106973.106999	journal	April 1991
Characterizing MPI matching via trace-based simulation Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17 https://doi.org/10.1145/3127024.3127040	conference	January 2017
Multigrid Smoothers for Ultraparallel Computing Baker, Allison H.; Falgout, Robert D.; Kolev, Tzanio V. SIAM Journal on Scientific Computing, Vol. 33, Issue 5 https://doi.org/10.1137/100798806	journal	January 2011
A high-performance, portable implementation of the MPI message passing interface standard Gropp, William; Lusk, Ewing; Doss, Nathan Parallel Computing, Vol. 22, Issue 6 https://doi.org/10.1016/0167-8191(96)00024-5	journal	September 1996
CHARM++: a portable concurrent object oriented system based on C++ Kale, Laxmikant V.; Krishnan, Sanjeev Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93 https://doi.org/10.1145/165854.165874	conference	January 1993
MPI+Threads: runtime contention and remedies Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie ACM SIGPLAN Notices, Vol. 50, Issue 8 https://doi.org/10.1145/2858788.2688522	journal	January 2015
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading Vaidyanathan, Karthikeyan; Kalamkar, Dhiraj D.; Pamnany, Kiran Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807602	conference	January 2015
Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor Sodani, Avinash 2015 IEEE Hot Chips 27 Symposium (HCS) https://doi.org/10.1109/HOTCHIPS.2015.7477467	conference	August 2015
Thread-safety in an MPI implementation: Requirements and analysis Gropp, William; Thakur, Rajeev Parallel Computing, Vol. 33, Issue 9 https://doi.org/10.1016/j.parco.2007.07.002	journal	September 2007
Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints Sridharan, Srinivas; Dinan, James; Kalamkar, Dhiraj D. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.45	conference	November 2014
Scalable parallel programming with CUDA introduction Nickolls, John 2008 IEEE Hot Chips 20 Symposium (HCS) https://doi.org/10.1109/HOTCHIPS.2008.7476518	conference	August 2008
A fast and resource-conscious MPI message queue mechanism for large-scale jobs Zounmevo, Judicael A.; Afsahi, Ahmad Future Generation Computer Systems, Vol. 30 https://doi.org/10.1016/j.future.2013.07.003	journal	January 2014
Synchronization without contention Mellor-Crummey, John M.; Scott, Michael L. ACM SIGARCH Computer Architecture News, Vol. 19, Issue 2 https://doi.org/10.1145/106975.106999	journal	April 1991
MPI+Threads: runtime contention and remedies Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/2688500.2688522	conference	January 2015
Synchronization without contention Mellor-Crummey, John M.; Scott, Michael L. ACM SIGOPS Operating Systems Review, Vol. 25, Issue Special Issue https://doi.org/10.1145/106974.106999	journal	April 1991
Characterizing MPI matching via trace-based simulation Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin Parallel Computing, Vol. 77 https://doi.org/10.1016/j.parco.2018.05.005	journal	September 2018

Cited By (1)

PAMPAR: A new parallel benchmark for performance and energy consumption evaluation Marques Garcia, Adriano; Schepke, Claudio; Girardi, Alessandro Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20 https://doi.org/10.1002/cpe.5504	journal	October 2019

Similar Records

Programming future architectures : dusty decks, memory walls, and the speed of light.

Conference · Mon Aug 01 00:00:00 EDT 2005 · OSTI ID:1496973

Rodrigues, Arun F

Approximate Weighted Matching On Emerging Manycore and Multithreaded Architectures

Journal Article · Fri Nov 30 00:00:00 EST 2012 · International Journal of High Performance Computing Applications, 26 (4 ):413-430 · OSTI ID:1496973

Halappanavar, Mahantesh; Feo, John T; Villa, Oreste; +2 more

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1496973

Shen, Xipeng

Related Subjects

97 MATHEMATICS AND COMPUTING
high performance computing
many core
MPI
networks

Title: Tail queues: A multi-threaded matching architecture

Citation Formats

References (23)

Cited By (1)

Similar Records

Related Subjects