Abstract
We study different approaches to implement an optimal, stable two-way merge algorithm for distributed-memory parallel architectures. The algorithm takes as input two ordered sequences, which are distributed blockwise across all available processes such that each process owns a block of elements of each sequence. The task for each process is to produce an ordered block of elements from the stable merge of the input sequences. We present an optimal, perfectly load-balanced, stable parallel algorithm that accomplishes this task. We describe three different implementation alternatives using one-sided communication of the Message-Passing Interface (MPI). Further, we discuss problematic issues with the current MPI 2.2 one-sided interface and enabling features that may be found in future versions of the MPI standard. Experimental results on a large IBM Blue Gene/P supercomputer show perfect scalability of our implementation: with a fixed input size per process the running time remains (almost) constant with increasing number of processes, and with a fixed total problem size our implementation improves the time to solution for up to 32,768 MPI processes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akl, S.G., Santoro, N.: Optimal parallel merging and sorting without memory conflicts. IEEE Transactions on Computers C-36(11), 1367–1369 (1987)
Deo, N., Jain, A., Medidi, M.: An optimal parallel algorithm for merging using multiselection. Information Processing Letters 50(2), 81–87 (1994)
Deo, N., Sarkar, D.: Parallel algorithms for merging and sorting. Information Sciences 56(1-3), 151–161 (1991)
Gerbessiotis, A.V., Siniolakis, C.J.: Merging on the BSP model. Parallel Computing 27(6), 809–822 (2001)
Hagerup, T., Rüb, C.: Optimal merging and sorting on the EREW PRAM. Information Processing Letters 33, 181–185 (1989)
Katajainen, J., Levcopoulos, C., Petersson, O.: Space-efficient parallel merging. Informatique Théoretique et Applications 27(4), 295–310 (1993)
MPI Forum. MPI: A Message-Passing Interface Standard. Version 2.2 (September 4, 2009), www.mpi-forum.org
Nieplocha, J., Tipparaju, V., Krishnan, M., Panda, D.K.: High performance remote memory access communication: The ARMCI approach. International Journal on High Performance Computing Applications 20(2), 233–253 (2006)
Poole, S.W., Hernandez, O., Kuehn, J.A., Shipman, G.M., Curtis, A., Feind, K.: OpenSHMEM - toward a unified RMA model. In: Padua, D.A. (ed.) Encyclopedia of Parallel Computing, pp. 1379–1391. Springer (2011)
Shiloach, Y., Vishkin, U.: Finding the maximum, merging and sorting in a parallel computation model. Journal of Algorithms 2, 88–102 (1981)
Varman, P.J., Iyer, B.R., Haderle, D.J., Dunn, S.M.: Parallel merging: algorithm and implementation results. Parallel Computing 15(1-3), 165–177 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Siebert, C., Träff, J.L. (2012). Efficient MPI Implementation of a Parallel, Stable Merge Algorithm. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-33518-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)