Abstract
In this paper, we look at the complexity of designing algorithms without any bank conflicts in the shared memory of Graphical Processing Units (GPUs). Given input of size n, w processors and w memory banks, we study three fundamental problems: sorting, permuting and w-way partitioning (defined as sorting an input containing exactly n/w copies of every integer in [w]).
We solve sorting in optimal \(O(\frac{n}{w} \log n)\) time. When n ≥ w 2, we solve the partitioning problem optimally in O(n/w) time. We also present a general solution for the partitioning problem which takes \(O(\frac{n}{w} \log^3_{n/w} w)\) time. Finally, we solve the permutation problem using a randomized algorithm in \(O(\frac{n}{w} \log\log\log_{n/w} n)\) time. Our results show evidence that when working with banked memory architectures, there is a separation between these problems and the permutation and partitioning problems are not as easy as simple parallel scanning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afshani, P., Sitchinava, N.: Sorting and permuting without bank conflicts on GPUs. CoRR abs/1507.01391 (2015), http://arxiv.org/abs/1507.01391
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31, 1116–1127 (1988)
Arge, L., Goodrich, M.T., Nelson, M.J., Sitchinava, N.: Fundamental parallel algorithms for private-cache chip multiprocessors. In: 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 197–206 (2008)
Batcher, K.E.: Sorting networks and their applications. In: AFIPS Spring Joint Computer Conference, pp. 307–314
Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: 19th ACM-SIAM Symp. on Discrete Algorithms, pp. 501–510 (2008)
Catanzaro, B., Keller, A., Garland, M.: A decomposition for in-place matrix transposition. In: 19th ACM SIGPLAN Principles and Practices of Parallel Programming (PPoPP), pp. 193–206 (2014)
Cole, R.: Parallel merge sort. In: 27th IEEE Symposium on Foundations of Computer Science. pp. 511–516 (1986)
Dotsenko, Y., Govindaraju, N.K., Sloan, P.P., Boyd, C., Manfedelli, J.: Fast Scan Algorithms on Graphics Processors. In: 22nd International Conference on Supercomputing, pp. 205–213 (2008)
Flynn, M.: Some computer organizations and their effectiveness. IEEE Transactions on Computers C 21(9), 948–960 (1972)
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: 40th IEEE Symp. on Foundations of Comp. Sci., pp. 285–298 (1999)
GPGPU.org: Research papers on gpgpu.org, http://gpgpu.org/tag/papers
Greiner, G.: Sparse Matrix Computations and their I/O Complexity. Dissertation, Technische Universität München, München (2012)
Haque, S., Maza, M., Xie, N.: A many-core machine model for designing algorithms with minimum parallelism overheads. In: High Performance Computing Symposium (2013)
JáJá, J.: An Introduction to Parallel Algorithms. Addison Wesley (1992)
Knuth, D.E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley (1973)
Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays, Trees, and Hypercubes. Morgan-Kaufmann, San Mateo (1991)
Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Generation Computer Systems 30, 202–215 (2014)
Nakano, K.: Simple memory machine models for gpus. In: 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 794–803 (2012)
NVIDIA Corp.: CUDA C Best Practices Guide. Version 7.0 (March 2015)
Pagh, A., Pagh, R.: Uniform hashing in constant time and optimal space. SIAM Journal on Computing 38(1), 85–96 (2008)
Sen, S., Scherson, I.D., Shamir, A.: Shear Sort: A True Two-Dimensional Sorting Techniques for VLSI Networks. In: International Conference on Parallel Processing, pp. 903–908 (1986)
Sitchinava, N., Weichert, V.: Provably efficient GPU algorithms. CoRR abs/1306.5076 (2013), http://arxiv.org/abs/1306.5076
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Afshani, P., Sitchinava, N. (2015). Sorting and Permuting without Bank Conflicts on GPUs. In: Bansal, N., Finocchi, I. (eds) Algorithms - ESA 2015. Lecture Notes in Computer Science(), vol 9294. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48350-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-48350-3_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48349-7
Online ISBN: 978-3-662-48350-3
eBook Packages: Computer ScienceComputer Science (R0)