Scalable and Efficient Parallel Selection

Siebert, Christian

doi:10.1007/978-3-642-55224-3_20

Scalable and Efficient Parallel Selection

Christian Siebert¹⁹

Conference paper
First Online: 01 January 2014

1565 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8384))

Abstract

Selection algorithms find the \(k^{\mathrm {th}}\) smallest element from a set of elements. Although there are optimal parallel selection algorithms available for theoretical machines, these algorithms are not only difficult to implement but also inefficient in practice. Consequently, scalable applications can only use few special cases such as minimum and maximum, where efficient implementations exist. To overcome such limitations, we propose a general parallel selection algorithm that scales even on today’s largest supercomputers. Our approach is based on an efficient, unbiased median approximation method, recently introduced as median-of-3 reduction, and Hoare’s sequential QuickSelect idea from \(1961\). The resulting algorithm scales with a time complexity of \(\mathcal {O}(\log ^2 n)\) for \(n\) distributed elements while needing only \(\mathcal {O}(1)\) space. Furthermore, we prove it to be a practical solution by explaining implementation details and showing performance results for up to \(458,752\) processor cores.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In statistics, the \(k^{th}\) order statistic of a sample is equal to its \(k^{th}\) smallest value, and the position of this value is called rank. Unfortunately, rank is also used in MPI to identify a process. To disambiguate, we use the terms position and MPI rank.
2.
We use MPI terminology: assuming \(x_i\) is the input at MPI rank \(i\) then Allreduce computes the sum \(\sum _{j=0}^{p-1} x_j\) and Exscan computes the prefix sum \(\sum _{j=0}^{i-1} x_j\) in parallel.

References

Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7(4), 448–461 (1973)
Article MATH MathSciNet Google Scholar
Fouz, M., Kufleitner, M., Manthey, B., Jahromi, N.Z.: On smoothed analysis of quicksort and Hoare’s find. Comput. Comb. 5609, 158–167 (2009)
MathSciNet Google Scholar
Frazer, W.D., McKellar, A.C.: Samplesort: a sampling approach to minimal storage tree sorting. J. ACM 17(3), 496–507 (1970)
Article MATH MathSciNet Google Scholar
Han, Y.: Optimal parallel selection. ACM Trans. Algorithms 3(4) (2007)
Google Scholar
Hoare, C.A.R.: Algorithm 63 (Partition) and Algorithm 65 (Find). Commun. ACM 4(7), 321–322 (1961)
Article Google Scholar
Kirschenhofer, P., Prodinger, H., Martínez, C.: Analysis of Hoare’s FIND algorithm with Median-of-Three partition. Random Struct. Alg. 10, 143–156 (1997)
Article MATH Google Scholar
Rabenseifner, R.: Optimization of collective reduction operations. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3036, pp. 1–9. Springer, Heidelberg (2004)
Chapter Google Scholar
Sack, P., Gropp, W.: A scalable \({\rm MPI}\_{\rm Comm}\_{\rm split}\) algorithm for exascale computing. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 1–10. Springer, Heidelberg (2010)
Google Scholar
Sanders, P., Träff, J.L.: Parallel Prefix (Scan) algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 49–57. Springer, Heidelberg (2006)
Google Scholar
Siebert, C., Wolf, F.: Parallel sorting with minimal data. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 170–177. Springer, Heidelberg (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Parallel Programming, Department of Computer Science, RWTH Aachen University, Aachen, Germany
Christian Siebert

Authors

Christian Siebert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Siebert .

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Department of Computer Science, Knoxville, Tennessee, USA
Jack Dongarra
Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski
Technical University of Denmark Informatics and Mathematical Modelling, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siebert, C. (2014). Scalable and Efficient Parallel Selection. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55224-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-55224-3_20
Published: 06 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55223-6
Online ISBN: 978-3-642-55224-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics