Skip to main content

Sorting and Permuting without Bank Conflicts on GPUs

  • Conference paper
  • First Online:
Algorithms - ESA 2015

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9294))

Abstract

In this paper, we look at the complexity of designing algorithms without any bank conflicts in the shared memory of Graphical Processing Units (GPUs). Given input of size n, w processors and w memory banks, we study three fundamental problems: sorting, permuting and w-way partitioning (defined as sorting an input containing exactly n/w copies of every integer in [w]).

We solve sorting in optimal \(O(\frac{n}{w} \log n)\) time. When n ≥ w 2, we solve the partitioning problem optimally in O(n/w) time. We also present a general solution for the partitioning problem which takes \(O(\frac{n}{w} \log^3_{n/w} w)\) time. Finally, we solve the permutation problem using a randomized algorithm in \(O(\frac{n}{w} \log\log\log_{n/w} n)\) time. Our results show evidence that when working with banked memory architectures, there is a separation between these problems and the permutation and partitioning problems are not as easy as simple parallel scanning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Afshani, P., Sitchinava, N.: Sorting and permuting without bank conflicts on GPUs. CoRR abs/1507.01391 (2015), http://arxiv.org/abs/1507.01391

  2. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31, 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  3. Arge, L., Goodrich, M.T., Nelson, M.J., Sitchinava, N.: Fundamental parallel algorithms for private-cache chip multiprocessors. In: 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 197–206 (2008)

    Google Scholar 

  4. Batcher, K.E.: Sorting networks and their applications. In: AFIPS Spring Joint Computer Conference, pp. 307–314

    Google Scholar 

  5. Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: 19th ACM-SIAM Symp. on Discrete Algorithms, pp. 501–510 (2008)

    Google Scholar 

  6. Catanzaro, B., Keller, A., Garland, M.: A decomposition for in-place matrix transposition. In: 19th ACM SIGPLAN Principles and Practices of Parallel Programming (PPoPP), pp. 193–206 (2014)

    Google Scholar 

  7. Cole, R.: Parallel merge sort. In: 27th IEEE Symposium on Foundations of Computer Science. pp. 511–516 (1986)

    Google Scholar 

  8. Dotsenko, Y., Govindaraju, N.K., Sloan, P.P., Boyd, C., Manfedelli, J.: Fast Scan Algorithms on Graphics Processors. In: 22nd International Conference on Supercomputing, pp. 205–213 (2008)

    Google Scholar 

  9. Flynn, M.: Some computer organizations and their effectiveness. IEEE Transactions on Computers C 21(9), 948–960 (1972)

    Article  MATH  Google Scholar 

  10. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: 40th IEEE Symp. on Foundations of Comp. Sci., pp. 285–298 (1999)

    Google Scholar 

  11. GPGPU.org: Research papers on gpgpu.org, http://gpgpu.org/tag/papers

  12. Greiner, G.: Sparse Matrix Computations and their I/O Complexity. Dissertation, Technische Universität München, München (2012)

    Google Scholar 

  13. Haque, S., Maza, M., Xie, N.: A many-core machine model for designing algorithms with minimum parallelism overheads. In: High Performance Computing Symposium (2013)

    Google Scholar 

  14. JáJá, J.: An Introduction to Parallel Algorithms. Addison Wesley (1992)

    Google Scholar 

  15. Knuth, D.E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley (1973)

    Google Scholar 

  16. Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays, Trees, and Hypercubes. Morgan-Kaufmann, San Mateo (1991)

    MATH  Google Scholar 

  17. Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Generation Computer Systems 30, 202–215 (2014)

    Article  Google Scholar 

  18. Nakano, K.: Simple memory machine models for gpus. In: 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 794–803 (2012)

    Google Scholar 

  19. NVIDIA Corp.: CUDA C Best Practices Guide. Version 7.0 (March 2015)

    Google Scholar 

  20. Pagh, A., Pagh, R.: Uniform hashing in constant time and optimal space. SIAM Journal on Computing 38(1), 85–96 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  21. Sen, S., Scherson, I.D., Shamir, A.: Shear Sort: A True Two-Dimensional Sorting Techniques for VLSI Networks. In: International Conference on Parallel Processing, pp. 903–908 (1986)

    Google Scholar 

  22. Sitchinava, N., Weichert, V.: Provably efficient GPU algorithms. CoRR abs/1306.5076 (2013), http://arxiv.org/abs/1306.5076

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Afshani, P., Sitchinava, N. (2015). Sorting and Permuting without Bank Conflicts on GPUs. In: Bansal, N., Finocchi, I. (eds) Algorithms - ESA 2015. Lecture Notes in Computer Science(), vol 9294. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48350-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48350-3_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48349-7

  • Online ISBN: 978-3-662-48350-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics