Abstract
Parallel sorting algorithms are widely studied nowadays. After the introduction of parallel processors such as graphics processing unit (GPU) and easy to use parallel programming languages such as CUDA and OpenCL, literature on parallel sorting algorithms has become vast and richer with new ideas and techniques applied to solve the famous problem of sorting. This paper presents a survey of GPU based sorting algorithms. Four sorting algorithms have been selected for this survey: Radix sort, Merge sort, Sample sort and Quick sort. Methods used in those algorithms are described in brief. The performance of these algorithms as claimed by their authors is also presented. A comparative analysis based on the literature is depicted.

Similar content being viewed by others
References
Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 351–362 (2010)
Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Proc. Lett. 21(2), 245–272 (2011)
Ha, L., Kruger, L., Silva, C.T.: Fast four-way parallel radix sorting on GPUs. Comput. Graph. Forum 28(8), 2368–2378 (2009)
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2009)
Huang, B., Gao, J., Li, X.: An empirically optimized radix sort for GPU. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 234–241 (2009)
Harris, M., Owens, J., Sengupta, S., Zhang, Y., Davidson, A.: Cudpp: Cuda Data Parallel Primitives Library (2007). Accessed Aug 2015
Harris, M.: Optimizing Parallel Reduction in CUDA. Technical Report. NVIDIA Developer Technology Website/projects/reduction/doc/reduction.pdf (2007)
Merrill, D.G., Grimshaw, A.S.: Revisiting sorting for GPGPU stream architectures. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 545–546 (2010)
Gamma, E., Johnson, R., Helm, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley, Reading (1994)
Grand, S.L.: Broad-phase collision detection with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, pp. 677–697. Addison Wesley, Reading (2007)
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 97–106 (2007)
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: High performance graphics co-processor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2006)
Cederman, D., Tsigas, P.: Gpu-quicksort: A practical quicksort algorithm for graphics processors. J. Exp. Algorithmics 14(1.4) (2009)
Herf, M.: Radix Tricks. http://stereopsis.com/radix.html (2001). Accessed Jan 2016
GCC. Standard Template Library. http://gcc.gnu.org (2008). Accessed Nov 2015
Intel threading building blocks 2.1. http://www.threadbuildingbuildingblocks.org (2008). Accessed Sept 2015
Sintorn, E., Assarsson, U.: Fast parallel GPU-sorting using a hybrid algorithm. J. Parallel Distrib. Comput. 68(10), 1381–1388 (2008)
Ye, X., Fan, D., Lin, W., Yuan, N., Ienne, P.: High performance comparison-based sorting algorithm on many-core GPUs. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)
Musser, D.R.: Introspective sorting and selection algorithms. Softw. Pract. Exp. 27(8), 983–993 (1997)
Baraglia, R., Capannini, G., Nardini, F.M., Silvestri, F.: Sorting using bitonic network with CUDA. In: Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR), Boston, USA (2009)
Dusseau, A.C., Culler, D.E., Schauser, K.E., Martin, R.P.: Fast parallel sorting under LogP: experience with the CM-5. IEEE Trans. Parallel Distrib. Syst. 7(8), 791–804 (1996)
Blelloch, G.E., Leiserson, C.E., Maggs, B.M., Plaxton, C.G., Smith, S.J., Zagha, M.: A comparison of sorting algorithms for the connection machine CM-2. In: Proceedings of the Third Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 3–16 (1991)
Leischner, N., Osipov, V., Sanders, P.: GPU sample sort. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)
Dehne, F., Zaboli, H.: Deterministic sample sort for GPUs. Parallel Process. Lett. 22(3), CoRR. arXiv:1002.4464 (2012)
Chen, S., Qin, J., Xie, Y., Zhao, J., Heng, P.A.: A fast and flexible sorting algorithm with cuda. In: Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, pp. 281–290 (2009)
Blelloch, G.E.: Vector Models for Data-Parallel Computing. MIT Press, Cambridge (1990)
Cuda toolkit documentation 6.5. http://docs.nvidia.com/cuda/cuda-samples/index.html. Accessed July 2015
Manca, E., Manconi, A., Orro, A., Armano, G., Milanesi, L.: CUDA-quicksort: an improved GPU-based implementation of quicksort. Concurr. Comput.: Pract. Exp. 28(1), 21–43 (2016)
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, pp. 851–876. Addison Wesley, Reading (2007)
Govindaraju, N.K., Raghuvanshi, N., Henson, M., Manocha, D.: A Cache-efficient Sorting Algorithm for Database and Data Mining Computations Using Graphics Processors. University of North Carolina (2005)
Batcher, K.E.: Sorting networks and their applications. In: Proceedings of the 1968 Spring Joint Computer Conference, pp. 307–314 (1968)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Singh, D.P., Joshi, I. & Choudhary, J. Survey of GPU Based Sorting Algorithms. Int J Parallel Prog 46, 1017–1034 (2018). https://doi.org/10.1007/s10766-017-0502-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-017-0502-5