skip to main content
10.1145/3635035.3635051acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Efficient GPU-Implementation of H-P Sort Based on Improved Histogram Computation

Published: 19 January 2024 Publication History

Abstract

We present an enhanced GPU implementation of the H-P sort algorithm, which is a widely used method for integer sorting based on histogram computation and prefix sum calculation. This work extends a previous high-performance GPU version of the algorithm, presented at the 2021 ICPP international conference. Through algorithmic refinements and optimizations, we demonstrate significant performance improvements compared to the conventional H-P sort. It achieves a maximum speedup of 3.45x over conventional H-P Sort. Furthermore, in scenarios where the H-P sort had previously lagged behind the fastest available library, our improved implementation now surpasses it in terms of execution speed. This advancement in GPU-based sorting techniques contributes to more efficient data processing in various computational applications, emphasizing the practical significance of this work.

References

[1]
Guy E. Blelloch. 1990. Prefix Sums and Their Applications. Technical Report CMU-CS-90-190. School of Computer Science, Carnegie Mellon University.
[2]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2010. Introduction to Algorithms Third Edition. PHI Learning.
[3]
S. C. Eisenstat. 2007. O(log *n) algorithms on a Sum-CRCW PRAM. Computing 79 (2007), 93–97.
[4]
N. Faujdar and S. Ghrera. 2016. Performance evaluation of parallel count sort using GPU computing with CUDA. Indian Journal of Science and Technoogy 9, 15 (2016), 1–12.
[5]
J. Gómez-Luna, J.M. González-Linares, J.I. Benavides, and et al.2013. An optimized approach to histogram computation on GPU. Machine Vision and Applications 24 (2013), 899–908.
[6]
T. Henriksen, S. Hellfritzsch, P. Sadayappan, and C. Oancea. 2020. Compiling Generalized Histograms for GPU. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14.
[7]
D. E. Knuth. 1998. The art of computer programming, vol.3: searching and sorting (second ed.). Addison-Wesley Professional.
[8]
V. Kolonias, A. G. Voyiatzis, G. Goulas, and E. Housos. 2011. Design and implementation of an efficient integer count sort in CUDA GPUs. Concurrency and Computation: Practice and Experience 23 (2011), 2365–2381.
[9]
S. Koppaka, D. Mudigere, S. Narasimhan, and B. Narayanan. 2010. Fast Histograms using Adaptive CUDA Streams. arXiv:1011.0235 [cs.DC] (2010).
[10]
S. Kozakai, N. Fujimoto, and K. Wada. 2021. Efficient GPU-implementation for integer sorting based on histogram and prefix-sums. In Proceedings of the 50th International Conference on Parallel Processing (ICPP). 1–11.
[11]
V. Podlozhnyuk. 2007. Histogram Calculation in CUDA. (2007).
[12]
R. Shams and R. A. Kennedy. 2007. Efficient histogram algorithms for NVIDIA CUDA compatible devices. In Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS). 418–422.
[13]
W. Sum and Z. Ma. 2009. Count sort for GPU computing. In Proceedings of 2009 15th International Conferece on Parallel and Distributed Systems (ICPDS). 919–924.
[14]
J. Svenningsson, B. J. Svensson, and M. Sheeran. 2013. Counting and occurrence sort for GPUs using an embedded language. In Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing, FHPC’13. 37–46.
[15]
M. Ugljesa, G. Isaac, P. Nikola, R. Alex, and T. Milo. 2013. Parallelizing general histogram application for CUDA architectures. In 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). 11–18.
[16]
K. K. Yong and S. S. O. Talib. 2016. Histogram optimization with CUDA. In 2016 IEEE Industrial Electronics and Applications Conference (IEACon). 312–318.

Cited By

View all
  • (2024)PARALLEL DEQUE-FREE PARTITION DUAL-DEQUE MERGE SORTING ALGORITHMJournal of Southwest Jiaotong University10.35741/issn.0258-2724.59.2.3259:2Online publication date: 2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCAsia '24: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
January 2024
185 pages
ISBN:9798400708893
DOI:10.1145/3635035
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 January 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. H-P sort
  3. Histogram
  4. Integer sorting
  5. Prefix sum
  6. parallel sorting

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HPCAsia 2024

Acceptance Rates

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)6
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PARALLEL DEQUE-FREE PARTITION DUAL-DEQUE MERGE SORTING ALGORITHMJournal of Southwest Jiaotong University10.35741/issn.0258-2724.59.2.3259:2Online publication date: 2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media