Skip to main content
Log in

An optimized approach to histogram computation on GPU

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

A histogram is a compact representation of the distribution of data in an image with a full range of applications in diverse fields. Histogram generation is an inherently sequential operation where every pixel votes in a reduced set of bins. This makes finding efficient parallel implementations very desirable but challenging, because on graphics processing units thousands of threads may be atomically updating a short number of histogram bins. Under these circumstances, collisions among threads will be very frequent and such collisions will serialize thread execution, seriously damaging the performance. In this paper we propose a highly optimized approach to histogram calculation, which tackles such performance bottlenecks. It uses histogram replication for eliminating position conflicts, padding to reduce bank conflicts, and an improved access to input data called interleaved read access. Our so-called \({\mathcal{R}}\) -per-block approach to histogram calculation has been successfully compared to the main state-of-the-art works using four histogram-based image processing kernels and two real image databases. Results show that our proposal is between 1.4 and 15.7 faster than every previous implementation for histograms of up to 4,096 bins.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hateren J.H.v., Schaaf A.v.d.: Independent component filters of natural images compared with simple cells in primary visual cortex. Proc. Biol. Sci. 265(1394), 359–366 (1998)

    Article  Google Scholar 

  2. Idris, F., Panchanathan, S.: Review of image and video indexing techniques. J. Vis. Commun. Image Represent. 8(2), 146–166 (1997). doi:10.1006/jvci.1997.0355.http://www.sciencedirect.com/science/article/B6WMK-45KKSGK-5/2/df25df5374b5ce44616de5550980b9d2

  3. Khronos group: OpenCL (2011). http://www.khronos.org/opencl/

  4. Nugteren, C., van den Braak, G.J., Corporaal, H., Mesman, B.: High performance predictable histogramming on gpus: exploring and evaluating algorithm trade-offs. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp. 1:1–1:8. ACM, New York (2011). http://doi.acm.org/10.1145/1964179.1964181

  5. NVIDIA: Fermi compute architecture. White paper (2009). http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

  6. NVIDIA: CUDA C Best Practices Guide 4.0 (2011). http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Best_Practices_Guide.pdf

  7. NVIDIA: CUDA C Programming Guide 4.0 (2011). http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf

  8. NVIDIA: CUDA Zone (2011). http://developer.nvidia.com/category/zone/cuda-zone

  9. Olmos A., Frederick A.: A biologically inspired algorithm for the recovery of shading and reflectance images. Perception 33(12), 1463 (2004)

    Article  Google Scholar 

  10. Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Pattern Recogn. 26(9), 1277–1294 (1993). http://www.scopus.co.. Cited by (since 1996): 975

    Google Scholar 

  11. Podlozhnyuk, V.: Histogram calculation in CUDA. White paper (2007). http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/histogram256/doc/histogram.pdf

  12. Shams, R., Kennedy, R.A.: Efficient histogram algorithms for NVIDIA CUDA compatible devices. In: Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), pp. 418–422. Gold Coast, Australia (2007)

  13. Shams R., Sadeghi P., Kennedy R.A., Hartley R.: Parallel computation of mutual information on the GPU with application to real-time registration of 3D medical images. Comput. Methods Programs Biomed. 99(2), 133–146 (2010)

    Article  Google Scholar 

  14. West J., Fitzpatrick J.M., Wang M.Y., Dawant B.M., Maurer C.R., Kessler R.M., Maciunas R.J., Barillot C., Lemoine D., Collignon A., Maes F., Sumanaweera T.S., Harkness B., Hemler P.F., Hill D.L.G., Hawkes D.J., Studholme C., Maintz J.B.A., Viergever M.A., Mal G., Pennec X., Noz M.E., Maguire G.Q., Pollack M., Pelizzari C.A., Robb R.A., Hanson D., Woods R.P.: Comparison and evaluation of retrospective intermodality brain image registration techniques. J. Comput. Assist. Tomogr. 21, 554–566 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Gómez-Luna.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gómez-Luna, J., González-Linares, J.M., Benavides, J.I. et al. An optimized approach to histogram computation on GPU. Machine Vision and Applications 24, 899–908 (2013). https://doi.org/10.1007/s00138-012-0443-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-012-0443-3

Keywords

Navigation