Survey of GPU Based Sorting Algorithms

Singh, Dhirendra Pratap; Joshi, Ishan; Choudhary, Jaytrilok

doi:10.1007/s10766-017-0502-5

Survey of GPU Based Sorting Algorithms

Published: 11 April 2017

Volume 46, pages 1017–1034, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Dhirendra Pratap Singh¹,
Ishan Joshi¹ &
Jaytrilok Choudhary¹

2694 Accesses
29 Citations
Explore all metrics

Abstract

Parallel sorting algorithms are widely studied nowadays. After the introduction of parallel processors such as graphics processing unit (GPU) and easy to use parallel programming languages such as CUDA and OpenCL, literature on parallel sorting algorithms has become vast and richer with new ideas and techniques applied to solve the famous problem of sorting. This paper presents a survey of GPU based sorting algorithms. Four sorting algorithms have been selected for this survey: Radix sort, Merge sort, Sample sort and Quick sort. Methods used in those algorithms are described in brief. The performance of these algorithms as claimed by their authors is also presented. A comparative analysis based on the literature is depicted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 351–362 (2010)
Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Proc. Lett. 21(2), 245–272 (2011)
Article MathSciNet Google Scholar
Ha, L., Kruger, L., Silva, C.T.: Fast four-way parallel radix sorting on GPUs. Comput. Graph. Forum 28(8), 2368–2378 (2009)
Article Google Scholar
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2009)
Huang, B., Gao, J., Li, X.: An empirically optimized radix sort for GPU. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 234–241 (2009)
Harris, M., Owens, J., Sengupta, S., Zhang, Y., Davidson, A.: Cudpp: Cuda Data Parallel Primitives Library (2007). Accessed Aug 2015
Harris, M.: Optimizing Parallel Reduction in CUDA. Technical Report. NVIDIA Developer Technology Website/projects/reduction/doc/reduction.pdf (2007)
Merrill, D.G., Grimshaw, A.S.: Revisiting sorting for GPGPU stream architectures. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 545–546 (2010)
Gamma, E., Johnson, R., Helm, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley, Reading (1994)
MATH Google Scholar
Grand, S.L.: Broad-phase collision detection with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, pp. 677–697. Addison Wesley, Reading (2007)
Google Scholar
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 97–106 (2007)
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: High performance graphics co-processor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2006)
Cederman, D., Tsigas, P.: Gpu-quicksort: A practical quicksort algorithm for graphics processors. J. Exp. Algorithmics 14(1.4) (2009)
Article Google Scholar
Herf, M.: Radix Tricks. http://stereopsis.com/radix.html (2001). Accessed Jan 2016
GCC. Standard Template Library. http://gcc.gnu.org (2008). Accessed Nov 2015
Intel threading building blocks 2.1. http://www.threadbuildingbuildingblocks.org (2008). Accessed Sept 2015
Sintorn, E., Assarsson, U.: Fast parallel GPU-sorting using a hybrid algorithm. J. Parallel Distrib. Comput. 68(10), 1381–1388 (2008)
Article Google Scholar
Ye, X., Fan, D., Lin, W., Yuan, N., Ienne, P.: High performance comparison-based sorting algorithm on many-core GPUs. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)
Musser, D.R.: Introspective sorting and selection algorithms. Softw. Pract. Exp. 27(8), 983–993 (1997)
Article Google Scholar
Baraglia, R., Capannini, G., Nardini, F.M., Silvestri, F.: Sorting using bitonic network with CUDA. In: Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR), Boston, USA (2009)
Dusseau, A.C., Culler, D.E., Schauser, K.E., Martin, R.P.: Fast parallel sorting under LogP: experience with the CM-5. IEEE Trans. Parallel Distrib. Syst. 7(8), 791–804 (1996)
Article Google Scholar
Blelloch, G.E., Leiserson, C.E., Maggs, B.M., Plaxton, C.G., Smith, S.J., Zagha, M.: A comparison of sorting algorithms for the connection machine CM-2. In: Proceedings of the Third Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 3–16 (1991)
Leischner, N., Osipov, V., Sanders, P.: GPU sample sort. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)
Dehne, F., Zaboli, H.: Deterministic sample sort for GPUs. Parallel Process. Lett. 22(3), CoRR. arXiv:1002.4464 (2012)
Article MathSciNet Google Scholar
Chen, S., Qin, J., Xie, Y., Zhao, J., Heng, P.A.: A fast and flexible sorting algorithm with cuda. In: Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, pp. 281–290 (2009)
Chapter Google Scholar
Blelloch, G.E.: Vector Models for Data-Parallel Computing. MIT Press, Cambridge (1990)
Google Scholar
Cuda toolkit documentation 6.5. http://docs.nvidia.com/cuda/cuda-samples/index.html. Accessed July 2015
Manca, E., Manconi, A., Orro, A., Armano, G., Milanesi, L.: CUDA-quicksort: an improved GPU-based implementation of quicksort. Concurr. Comput.: Pract. Exp. 28(1), 21–43 (2016)
Article Google Scholar
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, pp. 851–876. Addison Wesley, Reading (2007)
Google Scholar
Govindaraju, N.K., Raghuvanshi, N., Henson, M., Manocha, D.: A Cache-efficient Sorting Algorithm for Database and Data Mining Computations Using Graphics Processors. University of North Carolina (2005)
Batcher, K.E.: Sorting networks and their applications. In: Proceedings of the 1968 Spring Joint Computer Conference, pp. 307–314 (1968)

Download references

Author information

Authors and Affiliations

Maulana Azad National Institute of Technology, Bhopal, India
Dhirendra Pratap Singh, Ishan Joshi & Jaytrilok Choudhary

Authors

Dhirendra Pratap Singh
View author publications
You can also search for this author inPubMed Google Scholar
Ishan Joshi
View author publications
You can also search for this author inPubMed Google Scholar
Jaytrilok Choudhary
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Dhirendra Pratap Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, D.P., Joshi, I. & Choudhary, J. Survey of GPU Based Sorting Algorithms. Int J Parallel Prog 46, 1017–1034 (2018). https://doi.org/10.1007/s10766-017-0502-5

Download citation

Received: 11 August 2016
Accepted: 06 April 2017
Published: 11 April 2017
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10766-017-0502-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey of GPU Based Sorting Algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Sorting for GPUs

A comparison-free sorting algorithm on CPUs and GPUs

Efficiency Comparison of Modern Computer Languages: Sorting Benchmark

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Survey of GPU Based Sorting Algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Sorting for GPUs

A comparison-free sorting algorithm on CPUs and GPUs

Efficiency Comparison of Modern Computer Languages: Sorting Benchmark

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now