Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing

Zhao, Di

doi:10.1007/s11227-015-1443-7

Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing

Published: 29 May 2015

Volume 71, pages 3440–3455, (2015)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Di Zhao¹

277 Accesses
5 Citations
Explore all metrics

Abstract

Mobile GPU applications usually constrain by the real-time requirement. However, FLOPS of mobile GPU is limited by the size and power supply of the SoC systems. Same to desktop GPUs, the mobile GPU consists of an on-chip memory hierarchy, and proper usage of memory hierarchy accelerates mobile GPU applications such as Discrete Wavelet Transform (DWT) to satisfy the real-time requirement. In this paper, by taking advantage of GPU shared memory in Tegra K1, a mobile GPU from Nvidia, we develop Bank Conflict Free Shared Memory Parallel DWT for mobile GPU applications. Computational results show that, with the display resolution of \(640 \times 350\) (EGA), Bank Conflict Free Shared Memory Parallel DWT is significantly faster than SoC CPU-based DWT. Computational results also show that, with the display resolution of \(320\times 200\) (CGA), \(640\times 480\) (VGA), \(800\times 600\) (SVGA) and \(1024\times 768\) (XGA), Bank Conflict Free Shared Memory Parallel DWT can generally satisfy the real-time requirement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective 3-D fast fourier transform framework for multi-GPU accelerated distributed-memory systems

Article 13 May 2022

GPU-Accelerated Language and Communication Support by FPGA

Fast MAP-Based Super-Resolution Image Reconstruction on GPU-CUDA

References

Bordawekar R, Bondhugula U, Rao R (2010) Believe it or not: mult-core CPUs can match GPU performance for a FLOP-intensive application! In: Proceedings of the 19th international conference on Parallel architectures and compilation techniques, 2010. ACM, Vienna, Austria, pp. 537–538
Huang Q et al (2008) GPU as a general purpose computing resource. In: Ninth international conference on parallel and distributed computing, applications and technologies, 2008. PDCAT 2008
Suda R et al (2009) Aspects of GPU for general purpose high performance computing. In: Proceedings of the 2009 Asia and South Pacific Design Automation Conference. 2009. IEEE Press, Yokohama, Japan, pp 216–223
Collange S, Defour D, Tisserand A (2009) Power consumption of GPUs from a software perspective. In: Allen G et al (eds) Computational science—ICCS 2009. Springer, Berlin, pp 914–923
Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU programming. Pearson education, Boston
Google Scholar
Gou C, Gaydadjiev GN (2013) Addressing GPU on-chip shared memory bank conflicts using elastic pipeline. Int J Parallel Program 41(3):400–429
Article Google Scholar
Yuen DA et al (2013) GPU solutions to multi-scale problems in science and engineering. Springer, Berlin
Book Google Scholar
Lobeiras J, Amor M, Doallo R (2011) Performance evaluation of GPU memory hierarchy using the FFT. In: The 11th international conference on computational and mathematical methods in science and engineering, CMMSE 2011
Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Arch News 37(3):152–163
Article MathSciNet Google Scholar
Ryoo S et al (2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN symposium on principles and practice of parallel programming. ACM, Salt Lake City, UT, USA, pp 73–82
Luebke D (2008) CUDA: scalable parallel programming for high-performance scientific computing. In: 5th IEEE international symposium on biomedical imaging: from nano to macro, 2008. ISBI 2008
Ryoo S et al (2008) Program optimization space pruning for a multithreaded gpu. In: Proceedings of the 6th annual IEEE/ACM international symposium on code generation and optimization, 2008. ACM, Boston, MA, USA, pp 195–204
Baghsorkhi SS et al (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114
Article Google Scholar
Zhao D, Yu J (2015) Efficiently solving tri-diagonal system by chunked cyclic reduction and single-GPU shared memory. J Supercomput 71(2):369–390
Shi L et al (2012) vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans Comput 61(6):804–816
Article MathSciNet Google Scholar
Gou C, Gaydadjiev GN (2011) Elastic pipeline: addressing GPU on-chip shared memory bank conflicts. In: Proceedings of the 8th ACM international conference on computing frontiers, 2011. ACM, Ischia, Italy, pp 1–11
Yang Y et al (2010) A GPGPU compiler for memory optimization and parallelism management. SIGPLAN Not 45(6):86–97
Article Google Scholar
Che S et al (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Article Google Scholar
Han TD, Abdelrahman TS (2009) hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd workshop on general purpose processing on graphics processing units, 2009. ACM, Washington, DC, pp 52–61
Mei C, Jiang H, Jenness J (2010) CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: IEEE international symposium on parallel and distributed processing, workshops and Phd forum (IPDPSW), 2010
Govindaraju NK et al (2006) A memory model for scientific algorithms on graphics processors. In: SC 2006 Conference, Proceedings of the ACM/IEEE
Gupta V et al (2009) GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM workshop on system-level virtualization for high performance computing, 2009. ACM, Nuremburg, Germany, pp 17–24
Chen D, Chen W, Zheng W (2012) CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs. Sci China Inf Sci 55(3):663–676
Article Google Scholar
Karantasis KI, Polychronopoulos ED, Ekaterinaris JA (2014) High order accurate simulation of compressible flows on GPU clusters over software distributed shared memory. Comput Fluids 93:18–29
Article MathSciNet Google Scholar
Ji F, Ma X (2011) Using shared memory to accelerate MapReduce on graphics processing units. In: 2011 IEEE international parallel and distributed processing symposium (IPDPS), IEEE
Che S, Sheaffer JW, Skadron K (2011) Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011. ACM, Seattle, Washington, pp 1–11
Lee W-J et al (2012) SGRT: a scalable mobile GPU architecture based on ray tracing. In: ACM SIGGRAPH 2012 posters, 2012. ACM, Los Angeles, California
Lee W-J et al (2013) SGRT: a mobile GPU architecture for real-time ray tracing. In: Proceedings of the 5th high-performance graphics conference, 2013. ACM, Anaheim, California, pp 109–119
Nah J-H et al (2010) MobiRT: an implementation of OpenGL ES-based CPU–GPU hybrid ray tracer for mobile devices. In: ACM SIGGRAPH ASIA 2010 sketches, 2010. ACM, Seoul, Republic of Korea, pp 1–2
Singhal N et al (2011) Design and optimization of image processing algorithms on mobile GPU. In: ACM SIGGRAPH 2011 posters, 2011. ACM, Vancouver, British Columbia, Canada, pp 1–1
Abramov A et al (2012) Real-time segmentation of stereo videos on a portable system with a mobile GPU. IEEE Trans Circuits Syst Video Technol 22(9):1292–1305
Article Google Scholar
Singhal N, Yoo JW, Choi HY, Park IK (2010) Implementation and optimization of image processing algorithms on handheld GPU. In: 2010 17th IEEE international conference on image processing (ICIP)
Bachoo A (2010) Using the CPU and GPU for real-time video enhancement on a mobile computer. In: 2010 IEEE 10th international conference on signal processing (ICSP)
López MB et al (2014) Interactive multi-frame reconstruction for mobile devices. Multimed Tools Appl 69(1):31–51
Article Google Scholar
Rister B, Wang G, Wu M, Cavallaro JR (2013) A fast and efficient sift detector using the mobile GPU. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Cheng K-T, Wang Y-C (2011) Using mobile GPU for general-purpose computing—a case study of face recognition on smartphones. In: 2011 international symposium on VLSI design, automation and test (VLSI-DAT)
Wang G et al (2013) Accelerating computer vision algorithms using OpenCL framework on the mobile GPU—a case study. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Wang Y-C, Donyanavard B, Cheng K-T (2012) Energy-aware real-time face recognition system on mobile CPU-GPU platform. In: Kutulakos KN (ed) Trends and topics in computer vision. Springer, Berlin, pp 411–422
Wang Y-C, Cheng K-T (2011) Energy-optimized mapping of application to smartphone platform—a case study of mobile face recognition. In: 2011 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW)
Wang Y-C, Pang S, Cheng K-T (2010) A GPU-accelerated face annotation system for smartphones. In: Proceedings of the international conference on Multimedia, 2010. ACM, Firenze, Italy, pp 1667–1668
Hartl A et al (2011) Rapid reconstruction of small objects on mobile phones. In: 2011 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW)
Nvidia (2014) NVIDIA Tegra K1 A new era in mobile computing. NVIDIA Corporation, San Jose, California
Zhao D et al (2014) Acceleration of l1-regularization MRI reconstruction by lookup table and GPU shared memory based DWT. In: GPU technology conference, 2014, San Jose California

Download references

Acknowledgments

We thank Nvidia for Jetson TK1 development board through the Tegra K1 CUDA Vision Challenge 2014–2015.

Author information

Authors and Affiliations

Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100094, China
Di Zhao

Authors

Di Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Di Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, D. Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing. J Supercomput 71, 3440–3455 (2015). https://doi.org/10.1007/s11227-015-1443-7

Download citation

Published: 29 May 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11227-015-1443-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing

Abstract

Access this article

Similar content being viewed by others

An effective 3-D fast fourier transform framework for multi-GPU accelerated distributed-memory systems

GPU-Accelerated Language and Communication Support by FPGA

Fast MAP-Based Super-Resolution Image Reconstruction on GPU-CUDA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing

Abstract

Access this article

Similar content being viewed by others

An effective 3-D fast fourier transform framework for multi-GPU accelerated distributed-memory systems

GPU-Accelerated Language and Communication Support by FPGA

Fast MAP-Based Super-Resolution Image Reconstruction on GPU-CUDA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation