skip to main content
10.1145/1513895.1513900acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

Performance analysis of accelerated image registration using GPGPU

Published: 08 March 2009 Publication History

Abstract

This paper presents a performance analysis of an accelerated 2-D rigid image registration implementation that employs the Compute Unified Device Architecture (CUDA) programming environment to take advantage of the parallel processing capabilities of NVIDIA's Tesla C870 GPU. We explain the underlying structure of the GPU implementation and compare its performance and accuracy against a fast CPU-based implementation. Our experimental results demonstrate that our GPU version is capable of up to 90x speedup with bilinear interpolation and 30x speedup with bicubic interpolation while maintaining a high level of accuracy. This compares favorably to recent image registration studies, but it also indicates that our implementation only reaches about 70% of theorectical peak performance. To analyze our results, we utilize profiling data to identify some of the underlying limitations of CUDA that prohibit peak performance. At the end, we emphasize the need to manage memory resources carefully to fully utilize the GPU and obtain maximum speedup.

References

[1]
Free Software Foundation. GNU scientific library. http://www.gnu.org/software/gsl/.
[2]
J. Fung and S. Mann. Using graphics devices in reverse: GPU-based image processing and computer vision. In IEEE Int'l Conf. on Multimedia & Expo, pages 9--12, 2008.
[3]
A. A. Goshtasby. 2-D and 3-D Image Registration. Wiley-Interscience, 2005.
[4]
M. Harris. Mapping computational concepts to GPUs. In GPU Gems 2, pages 493--508. Addison Wesley, 2005.
[5]
K. E. Hillesland and A. Lastra. GPU floating-point paranoia. In GP2 ACM Workshop on General Purpose Computing on Graphics Processors, page 8, 2004.
[6]
F. Ino, J. Gomita, Y. Kawasaki, and K. Hagihara. A GPGPU approach for accelerating 2-d/3-d rigid registration of medical images. In International Symposium on Parallel and Distributed Processing and Applications (ISPA), pages 939--950, 2006.
[7]
A. Kubias, F. Deinzer, T. Feldmann, D. Paulus, B. Schreiber, and T. Brunner. 2d/3d image registration on the GPU. Pattern Recognition and Image Analysis, 18(3):381--389, 2008.
[8]
P. Muyan-Özçelik, J. D. Owens, J. Xia, and S. S. Samant. Fast deformable registration on the GPU: A CUDA implementation of demons. In International Conference on Computational Science and Its Applications (ICCSA), pages 223--233, 2008.
[9]
NVIDIA. NVIDIA CUDA Compute Unified Device Architecture, Programming Guide, Version 2.0. NVIDIA, 2008.
[10]
A. Obukhov and A. Kharlamov. Dct8x8. NVIDIA Software Development Kit (SDK), 2008.
[11]
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. GPU computing. Proceedings of the IEEE, 96(5):879--899, 2008.
[12]
W. Plishker, O. Dandekar, S. Bhattacharyya, and R. Shekhar. Towards a heterogeneous medical image registration acceleration platform. Biomedical Circuits and Systems Conference (BIOCAS), pages 231--234, Nov. 2007.
[13]
C. Sigg and M. Hadwiger. Fast third-order texture filtering. In GPU Gems 2, pages 313--317. Addison Wesley, 2005.
[14]
T. Sugiura, D. Deguichi, T. Kitasaka, K. Mori, and Y. Suenaga. A method for accelerating bronchoscope tracking based on image registration by GPGPU. In Augmented environments for Medical Imaging including Augmented Reality in Computer-aided Surgery (AMI-ARCS) 2008 (Medical Image Computing and Computer Assisted Intervention 2008), 2008.
[15]
P. Thévenaz, U. Ruttimann, and M. Unser. A pyramid approach to subpixel registration based on intensity. IEEE Transactions on Image Processing, 7(1):27--41, January 1998.
[16]
V. Volkov and J. W. Demmel. Benchmarking GPUs to tune dense linear algebra. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.
[17]
B. Zitová and J. Flusser. Image registration methods: a survey. Image and Vision Computing, 21(11):977--1000, October 2003.

Cited By

View all
  • (2016)Accelerated catadioptric omnidirectional view image unwrapping processing using GPU parallelisationJournal of Real-Time Image Processing10.1007/s11554-013-0390-x12:1(55-69)Online publication date: 1-Jun-2016
  • (2015)Design and Verification of Heterogeneous Streaming Parallel Mechanisms on Kepler CUDA2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.333(2256-2262)Online publication date: Oct-2015
  • (2015)Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modelingJournal of Real-Time Image Processing10.1007/s11554-012-0272-710:3(485-500)Online publication date: 1-Sep-2015
  • Show More Cited By

Index Terms

  1. Performance analysis of accelerated image registration using GPGPU

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
      March 2009
      107 pages
      ISBN:9781605585178
      DOI:10.1145/1513895
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 March 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. CUDA
      2. GPGPU
      3. image registration
      4. performance analysis

      Qualifiers

      • Research-article

      Conference

      GPGPU '09

      Acceptance Rates

      Overall Acceptance Rate 57 of 129 submissions, 44%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Accelerated catadioptric omnidirectional view image unwrapping processing using GPU parallelisationJournal of Real-Time Image Processing10.1007/s11554-013-0390-x12:1(55-69)Online publication date: 1-Jun-2016
      • (2015)Design and Verification of Heterogeneous Streaming Parallel Mechanisms on Kepler CUDA2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.333(2256-2262)Online publication date: Oct-2015
      • (2015)Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modelingJournal of Real-Time Image Processing10.1007/s11554-012-0272-710:3(485-500)Online publication date: 1-Sep-2015
      • (2012)Integrated processing of contrast pulse sequencing ultrasound imaging for enhanced active contrast of hollow gas filled silica nanoshells and microshellsJournal of Vacuum Science & Technology B, Nanotechnology and Microelectronics: Materials, Processing, Measurement, and Phenomena10.1116/1.369483530:2Online publication date: 23-Mar-2012
      • (2012)Design space exploration towards a realtime and energy-aware GPGPU-based analysis of biosensor dataComputer Science - Research and Development10.1007/s00450-011-0187-827:4(309-317)Online publication date: 1-Nov-2012
      • (2011)True 4D image denoising on the GPUJournal of Biomedical Imaging10.1155/2011/9528192011(8-8)Online publication date: 1-Jan-2011
      • (2011)Gauss-Newton image registration with CUDA2011 18th IEEE International Conference on Electronics, Circuits, and Systems10.1109/ICECS.2011.6122274(305-309)Online publication date: Dec-2011
      • (2010)Remote sensing image registration techniquesProceedings of the 4th international conference on Image and signal processing10.5555/1875769.1875784(103-112)Online publication date: 30-Jun-2010
      • (2010)Phase based volume registration using cuda2010 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2010.5495134(658-661)Online publication date: Mar-2010
      • (2010)Local acceleration in Distributed Geographic Information Processing with CUDA2010 18th International Conference on Geoinformatics10.1109/GEOINFORMATICS.2010.5567746(1-6)Online publication date: Jun-2010
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media