skip to main content
research-article

GPU-Quicksort: A practical Quicksort algorithm for graphics processors

Published: 05 January 2010 Publication History

Abstract

In this article, we describe GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multicore graphics processors. Quicksort has previously been considered an inefficient sorting solution for graphics processors, but we show that in CUDA, NVIDIA's programing platform for general-purpose computations on graphical processors, GPU-Quicksort performs better than the fastest-known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.

References

[1]
Bilardi, G. and Nicolau, A. 1989. Adaptive bitonic sorting: An optimal parallel algorithm for shared memory machines. SIAM J. Comput. 18, 2, 216--228.
[2]
Blelloch, G. E. 1993. Prefix sums and their applications. In Synthesis of Parallel Algorithms, J. H. Reif, Ed. Morgan Kaufmann, San Francisco.
[3]
Cederman, D. and Tsigas, P. 2007. GPU Quicksort Library. http://www.cs.chalmers.se/~dcs/gpuqsortdcs.html.
[4]
Dowd, M., Perl, Y., Rudolph, L., and Saks, M. 1989. The periodic balanced sorting network. J. ACM 36, 4, 738--757.
[5]
Evans, D. J. and Dunbar, R. C. 1982. The parallel Quicksort algorithm Part I - Run time analysis. Int. J. Comput. Math. 12, 19--55.
[6]
Govindaraju, N., Raghuvanshi, N., Henson, M., and Manocha, D. 2005. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Tech. rep., Univ. of North Carolina-Chapel Hill.
[7]
Govindaraju, N. K., Gray, J., Kumar, R., and Manocha, D. 2006. GPUTeraSort: High-performance graphics coprocessor sorting for large database management. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 325--336.
[8]
Govindaraju, N. K., Raghuvanshi, N., and Manocha, D. 2005. Fast and approximate stream mining of quantiles and frequencies using graphics processors. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 611--622.
[9]
Gress, A. and Zachmann, G. 2006. GPU-ABiSort: Optimal parallel sorting on stream architectures. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. IEEE, Los Alamitos.
[10]
Harris, M., Sengupta, S., and Owens, J. D. 2007. Parallel prefix sum (scan) with CUDA. In GPU Gems 3, H. Nguyen, Ed. Addison Wesley, Upper Saddle River.
[11]
Heidelberger, P., Norton, A., and Robinson, J. T. 1990. Parallel Quicksort using fetch-and-add. IEEE Trans. Comput. 39, 1, 133--138.
[12]
Helman, D. R., Bader, D. A., and Jájá, J. 1998. A randomized parallel sorting algorithm with an experimental study. J. Parallel Distrib. Comput. 52, 1, 1--23.
[13]
Hoare, C. A. R. 1961. Algorithm 64: Quicksort. Commun. ACM 4, 7, 321.
[14]
Hoare, C. A. R. 1962. Quicksort. Comput. J. 5, 4, 10--15.
[15]
Jaja, J. 1992. Introduction to Parallel Algorithms. Addison-Wesley, Upper Saddle River.
[16]
Kapasi, U. J., Dally, W. J., Rixner, S., Mattson, P. R., Owens, J. D., and Khailany, B. 2000. Efficient conditional operations for data-parallel architectures. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Micro-architecture. ACM, New York, 159--170.
[17]
Khronos Group. 2008. OpenCL (Open Computing Language). http://www.khronos.org/opencl/.
[18]
Kipfer, P., Segal, M., and Westermann, R. 2004. UberFlow: A GPU-based particle engine. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware. ACM, New York, 115--122.
[19]
Kipfer, P. and Westermann, R. 2005. Improved GPU sorting. In GPUGems 2, M. Pharr, Ed. Addison-Wesley, Upper Saddle River, 733--746.
[20]
Matsumoto, M. and Nishimura, T. 1998. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. Trans. Model.Comput. Simul. 8, 1, 3--30.
[21]
Musser, D. R. 1997. Introspective sorting and selection algorithms. Software—Practice and Experience 27, 8, 983--993.
[22]
Purcell, T. J., Donner, C., Cammarano, M., Jensen, H. W., and Hanrahan, P. 2003. Photon mapping on programmable graphics hardware. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware. ACM, New York, 41--50.
[23]
Sedgewick, R. 1978. Implementing quicksort programs. Communications of the ACM 21, 10, 847--857.
[24]
Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for GPU computing. In Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware. ACM, New York, 97--106.
[25]
Singleton, R. C. 1969. Algorithm 347: An efficient algorithm for sorting with minimal storage. Commun. ACM 12, 3, 185--186.
[26]
Sintorn, E. and Assarsson, U. 2007. Fast parallel GPU-sorting using a hybrid algorithm. In Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units. ACM, New York.
[27]
Stanford. 2008. The Stanford 3D scanning repository. http://www.graphics.stanford.edu/data/3Dscanrep.
[28]
Tsigas, P. and Zhang, Y. 2003. A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000. In Proceedings of the 11th Euromicro- Conference on Parallel Distributed and Network-based Processing. IEEE, Los Alamitos, 372--381.

Cited By

View all
  • (2023)Accelerating Sorting on GPUs: A Scalable CUDA Quicksort Revision2023 22nd International Symposium INFOTEH-JAHORINA (INFOTEH)10.1109/INFOTEH57020.2023.10094180(1-5)Online publication date: 15-Mar-2023
  • (2023)Enhancing Performance of CUDA Quicksort Through Pivot Selection and Branching Avoidance Methods2023 XXIX International Conference on Information, Communication and Automation Technologies (ICAT)10.1109/ICAT57854.2023.10171304(1-5)Online publication date: 11-Jun-2023
  • (2023)New GPU Sorting Algorithm Using Sorted MatrixProcedia Computer Science10.1016/j.procs.2023.01.146218:C(1682-1691)Online publication date: 1-Jan-2023
  • Show More Cited By

Index Terms

  1. GPU-Quicksort: A practical Quicksort algorithm for graphics processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Journal of Experimental Algorithmics
      ACM Journal of Experimental Algorithmics  Volume 14, Issue
      2009
      613 pages
      ISSN:1084-6654
      EISSN:1084-6654
      DOI:10.1145/1498698
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 January 2010
      Accepted: 01 May 2009
      Revised: 01 March 2009
      Received: 01 December 2008
      Published in JEA Volume 14

      Author Tags

      1. CUDA
      2. GPGPU
      3. Sorting
      4. multicore
      5. quicksort

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)85
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Accelerating Sorting on GPUs: A Scalable CUDA Quicksort Revision2023 22nd International Symposium INFOTEH-JAHORINA (INFOTEH)10.1109/INFOTEH57020.2023.10094180(1-5)Online publication date: 15-Mar-2023
      • (2023)Enhancing Performance of CUDA Quicksort Through Pivot Selection and Branching Avoidance Methods2023 XXIX International Conference on Information, Communication and Automation Technologies (ICAT)10.1109/ICAT57854.2023.10171304(1-5)Online publication date: 11-Jun-2023
      • (2023)New GPU Sorting Algorithm Using Sorted MatrixProcedia Computer Science10.1016/j.procs.2023.01.146218:C(1682-1691)Online publication date: 1-Jan-2023
      • (2022)Sorting in Memristive MemoryACM Journal on Emerging Technologies in Computing Systems10.1145/351718118:4(1-21)Online publication date: 13-Oct-2022
      • (2022)Evaluating Multi-GPU Sorting with Modern InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517842(1795-1809)Online publication date: 10-Jun-2022
      • (2022)DARMProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741285(28-40)Online publication date: 2-Apr-2022
      • (2022)ReCSA: a dedicated sort accelerator using ReRAM-based content addressable memoryFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-1322-317:2Online publication date: 8-Aug-2022
      • (2021)Preemptive Parallel Job Scheduling for Heterogeneous Systems Supporting Urgent ComputingIEEE Access10.1109/ACCESS.2021.30531629(17557-17571)Online publication date: 2021
      • (2019)Database Techniques for New HardwareAdvanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics10.4018/978-1-5225-7598-6.ch040(546-562)Online publication date: 2019
      • (2018)Database Techniques for New HardwareEncyclopedia of Information Science and Technology, Fourth Edition10.4018/978-1-5225-2255-3.ch169(1947-1961)Online publication date: 2018
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media