Sorting and Permuting without Bank Conflicts on GPUs

Afshani, Peyman; Sitchinava, Nodari

doi:10.1007/978-3-662-48350-3_2

Peyman Afshani¹⁵ &
Nodari Sitchinava¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9294))

2405 Accesses
4 Citations

Abstract

In this paper, we look at the complexity of designing algorithms without any bank conflicts in the shared memory of Graphical Processing Units (GPUs). Given input of size n, w processors and w memory banks, we study three fundamental problems: sorting, permuting and w-way partitioning (defined as sorting an input containing exactly n/w copies of every integer in [w]).

We solve sorting in optimal $O(\frac{n}{w} \log n)$ time. When n ≥ w ², we solve the partitioning problem optimally in O(n/w) time. We also present a general solution for the partitioning problem which takes $O(\frac{n}{w} \log^3_{n/w} w)$ time. Finally, we solve the permutation problem using a randomized algorithm in $O(\frac{n}{w} \log\log\log_{n/w} n)$ time. Our results show evidence that when working with banked memory architectures, there is a separation between these problems and the permutation and partitioning problems are not as easy as simple parallel scanning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Maximum Clique Solver Using Bitsets on GPUs

Optimization of multi-class 0/1 knapsack problem on GPUs by improving memory access efficiency

Article 22 March 2022

Parallel Sorting for GPUs

References

Afshani, P., Sitchinava, N.: Sorting and permuting without bank conflicts on GPUs. CoRR abs/1507.01391 (2015), http://arxiv.org/abs/1507.01391
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31, 1116–1127 (1988)
Article MathSciNet Google Scholar
Arge, L., Goodrich, M.T., Nelson, M.J., Sitchinava, N.: Fundamental parallel algorithms for private-cache chip multiprocessors. In: 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 197–206 (2008)
Google Scholar
Batcher, K.E.: Sorting networks and their applications. In: AFIPS Spring Joint Computer Conference, pp. 307–314
Google Scholar
Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: 19th ACM-SIAM Symp. on Discrete Algorithms, pp. 501–510 (2008)
Google Scholar
Catanzaro, B., Keller, A., Garland, M.: A decomposition for in-place matrix transposition. In: 19th ACM SIGPLAN Principles and Practices of Parallel Programming (PPoPP), pp. 193–206 (2014)
Google Scholar
Cole, R.: Parallel merge sort. In: 27th IEEE Symposium on Foundations of Computer Science. pp. 511–516 (1986)
Google Scholar
Dotsenko, Y., Govindaraju, N.K., Sloan, P.P., Boyd, C., Manfedelli, J.: Fast Scan Algorithms on Graphics Processors. In: 22nd International Conference on Supercomputing, pp. 205–213 (2008)
Google Scholar
Flynn, M.: Some computer organizations and their effectiveness. IEEE Transactions on Computers C 21(9), 948–960 (1972)
Article MATH Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: 40th IEEE Symp. on Foundations of Comp. Sci., pp. 285–298 (1999)
Google Scholar
GPGPU.org: Research papers on gpgpu.org, http://gpgpu.org/tag/papers
Greiner, G.: Sparse Matrix Computations and their I/O Complexity. Dissertation, Technische Universität München, München (2012)
Google Scholar
Haque, S., Maza, M., Xie, N.: A many-core machine model for designing algorithms with minimum parallelism overheads. In: High Performance Computing Symposium (2013)
Google Scholar
JáJá, J.: An Introduction to Parallel Algorithms. Addison Wesley (1992)
Google Scholar
Knuth, D.E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley (1973)
Google Scholar
Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays, Trees, and Hypercubes. Morgan-Kaufmann, San Mateo (1991)
MATH Google Scholar
Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Generation Computer Systems 30, 202–215 (2014)
Article Google Scholar
Nakano, K.: Simple memory machine models for gpus. In: 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 794–803 (2012)
Google Scholar
NVIDIA Corp.: CUDA C Best Practices Guide. Version 7.0 (March 2015)
Google Scholar
Pagh, A., Pagh, R.: Uniform hashing in constant time and optimal space. SIAM Journal on Computing 38(1), 85–96 (2008)
Article MathSciNet MATH Google Scholar
Sen, S., Scherson, I.D., Shamir, A.: Shear Sort: A True Two-Dimensional Sorting Techniques for VLSI Networks. In: International Conference on Parallel Processing, pp. 903–908 (1986)
Google Scholar
Sitchinava, N., Weichert, V.: Provably efficient GPU algorithms. CoRR abs/1306.5076 (2013), http://arxiv.org/abs/1306.5076

Download references

Author information

Authors and Affiliations

MADALGO, Aarhus University, Aarhus, Denmark
Peyman Afshani
University of Hawaii, Manoa, HI, USA
Nodari Sitchinava

Authors

Peyman Afshani
View author publications
You can also search for this author in PubMed Google Scholar
Nodari Sitchinava
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Technology, Eindhoven, The Netherlands
Nikhil Bansal
Sapienza University of Rome, Rome, Italy
Irene Finocchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afshani, P., Sitchinava, N. (2015). Sorting and Permuting without Bank Conflicts on GPUs. In: Bansal, N., Finocchi, I. (eds) Algorithms - ESA 2015. Lecture Notes in Computer Science(), vol 9294. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48350-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-48350-3_2
Published: 12 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48349-7
Online ISBN: 978-3-662-48350-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics