Skip to main content
Log in

Optimizing a medical image registration algorithm based on profiling data for real-time performance

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image registration is a commonly task in medical image analysis. Therefore, a significant number of algorithms have been developed to perform rigid and non-rigid image registration. Particularly, the free-form deformation algorithm is frequently used to carry out non-rigid registration task; however, it is a computationally very intensive algorithm. In this work, we describe an approach based on profiling data to identify potential parts of this algorithm for which parallel implementations can be developed. The proposed approach assesses the efficient of the algorithm by applying performance analysis techniques commonly available in traditional computer operating systems. Hence, this article provides guidelines to support researchers working on medical image processing and analysis to achieve real-time non-rigid image registration applications using common computing systems. According to our experimental findings, significant speedups can be accomplished by parallelizing sequential snippets, i.e., code regions that are executed more than once. For the selected costly functions previously identified in the studied free-form deformation algorithm, the developed parallelization decreased the runtime by up to seven times relatively to the related single thread based implementation. The implementations were developed based on the Open Multi-Processing application programming interface. In conclusion, this study confirms that based on the call graph visualization and detected performance bottlenecks, one can easily find and evaluate snippets which are potential optimization targets in addition to throughput in memory accesses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. gprof2dot is an open source script written in Python used to convert the output from a range of profiles into a dot graph. This script can be freely downloaded at https://github.com/jrfonseca/gprof2dot.

  2. dot is a Graphviz feature for producing hierarchical drawings of directed graphs. Graphviz is an open source visualization software for representing structural information such as diagrams of abstract graphs. More information is available at http://graphviz.org.

  3. An executable version of the FFD algorithm used for comparison purpose, by performing quantitative analysis based on the DSC value, can be downloaded from Daniel Rueckert’s webpage: http://www.doc.ic.ac.uk/~dr

References

  1. Ball T, Larus JR (1994) Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems 16(4):1319–1360. https://doi.org/10.1145/183432.183527

    Article  Google Scholar 

  2. Bezemer CP, Pouwelse J, Gregg B (2015) Understanding software performance regressions using differential flame graphs. In: 22nd International conference on software analysis, evolution, and reengineering (SANER), pp 535–539 https://doi.org/10.1109/SANER.2015.7081872

  3. Carass A, Roy S, Jog A, Cuzzocreo JL, Magrath E, Gherman A, Button J et al (2017) Longitudinal multiple sclerosis lesion segmentation: Resource and challenge. NeuroImage 148:77–102. https://doi.org/10.1016/j.neuroimage.2016.12.064

    Article  Google Scholar 

  4. Christensen GE (1998) MIMD vs. SIMD parallel processing: A case study in 3D medical image registration. Parallel Computing 24:1369–1383. https://doi.org/10.1016/S0167-8191(98)00062-3

    Article  Google Scholar 

  5. Dandekar O, Shekhar R (2007) FPGA-accelerated deformable image registration for improved target-delineation during CT-guided interventions. IEEE Transactions on Biomedical Circuits and Systems 1(2):116–127. https://doi.org/10.1109/TBCAS.2007.909023

    Article  Google Scholar 

  6. Dimakopoulou M, Eranian S, Koziris N, Bambos N (2016) Reliable and efficient performance monitoring in Linux. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, IEEE Press, pp 1–13

  7. Eklund A, Dufort P, Forsberg D, LaConte SM (2013) Medical image processing on the GPU - past, present and future. Medical Image Analysis 17(8):1073–1094. https://doi.org/10.1016/j.media.2013.05.008

    Article  Google Scholar 

  8. El-Gamal FEZA, Elmogy M, Atwan A (2016) Current trends in medical image registration and fusion. Egyptian Informatics Journal 17(1):99–124. https://doi.org/10.1016/j.eij.2015.09.002

    Article  Google Scholar 

  9. Ellingwood ND, Yin Y, Smith M, Lin CL (2016) Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs. Computer Methods and Programs in Biomedicine 127:290–300. https://doi.org/10.1016/j.cmpb.2015.12.018

    Article  Google Scholar 

  10. Gebali F (2011) Algorithms and parallel computing. John Wiley & Sons, New York. https://doi.org/10.1002/9780470932025

    Book  MATH  Google Scholar 

  11. Gong L, Kulikowski CA (2012) High-performance medical imaging informatics. Methods of Information in Medicine 51(3):258–259

    Article  Google Scholar 

  12. Graham SL, Kessler PB, McKusick MK (2004) gprof: A call graph execution profiler. ACM SIGPLAN Notes 39(4):49–57. https://doi.org/10.1145/989393.989401

    Article  Google Scholar 

  13. Gregg B (2016) The flame graph: This visualization of software execution is a new necessity for performance profiling and debugging. ACM Queue Magazine 14(2):91–110. https://doi.org/10.1145/2927299.2927301

    Article  MathSciNet  Google Scholar 

  14. Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38. https://doi.org/10.1109/MC.2008.209

    Article  Google Scholar 

  15. Kirk D, Hwu WM (2010) Programming massively parallel processors: A hands-on approach. Elsevier, Amsterdam

    Google Scholar 

  16. Kruskal JB, Landwehr JM (1983) Icicle plots: Better displays for hierarchical clustering. The American Statistician 37(2):162–168. https://doi.org/10.2307/2685881

    Article  Google Scholar 

  17. Lapeer RJ, Shah SK, Rowland RS (2010) An optimised radial basis function algorithm for fast non-rigid registration of medical images. Computers in Biology and Medicine 40(1):1–7. https://doi.org/10.1016/j.compbiomed.2009.10.002

    Article  Google Scholar 

  18. Li A, Kumar A, Ha Y, Corporaal H (2015) Correlation ratio based volume image registration on GPUs. Microprocessors and Microsystems 39(8):998–1011. https://doi.org/10.1016/j.micpro.2015.04.002

    Article  Google Scholar 

  19. Li Z, Atre R, Huda Z, Jannesari A, Wolf F (2016) Unveiling parallelization opportunities in sequential programs. Journal of Systems and Software 117:282–295. https://doi.org/10.1016/j.jss.2016.03.045

    Article  Google Scholar 

  20. Mafi R, Sirouspour S (2014) GPU-based acceleration of computations in nonlinear finite element deformation analysis. International Journal for Numerical Methods in Biomedical Engineering 30(3):365–381. https://doi.org/10.1002/cnm.2607

    Article  Google Scholar 

  21. McInerney T, Terzopoulos D (1996) Deformable models in medical image analysis: a survey. Medical Image Analysis 1(2):91–108. https://doi.org/10.1016/S1361-8415(96)80007-7

    Article  Google Scholar 

  22. Meng L (2014) Acceleration method of 3D medical images registration based on compute unified device architecture. Bio-Medical Materials and Engineering 24(1):1109–1116. https://doi.org/10.3233/BME-130910

    Article  Google Scholar 

  23. Mittal S, Vetter JS (2015) A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47(4):69:1-69:35. https://doi.org/10.1145/2788396

    Article  Google Scholar 

  24. Modat M, Ridgway GR, Taylor ZA, Lehmann M, Barnes J, Hawkes DJ, Fox NC, Ourselin S (2010) Fast free-form deformation using graphics processing units. Computer Methods and Programs in Biomedicine 98(3):278–284. https://doi.org/10.1016/j.cmpb.2009.09.002

    Article  Google Scholar 

  25. Oliveira FP, Tavares JMR (2014) Medical image registration: a review. Computer Methods in Biomechanics and Biomedical Engineering 17(2):73–93. https://doi.org/10.1080/10255842.2012.670855

    Article  Google Scholar 

  26. Palomar R, Gómez-Luna J, Cheikh FA, Olivares-Bueno J, Elle OJ (2017) High-performance computation of bézier surfaces on parallel and heterogeneous platforms. Int J Parallel Program. https://doi.org/10.1007/s10766-017-0506-1

  27. Parraguez SPP (2015) Fast and robust methods for non-rigid registration of medical images. PhD thesis, Imperial College of Science

  28. Rehman T, Haber E, Pryor G, Melonakos J, Tannenbaum A (2009) 3Dnonrigid registration via optimal mass transport on the GPU. Medical Image Analysis 13(6):931–940. https://doi.org/10.1016/j.media.2008.10.008

    Article  Google Scholar 

  29. Rohlfing T, Maurer CR (2003) Nonrigid image registration in shared-memory multiprocessor environments with application to brains, breasts, and bees. IEEE Transactions on Information Technology in Biomedicine 7(1):16–25. https://doi.org/10.1109/TITB.2003.808506

    Article  Google Scholar 

  30. Rohou E (2012) Tiptop: Hardware performance counters for the masses. In: 41st international conference on parallel processing workshops, pp 404–413 https://doi.org/10.1109/ICPPW.2012.58

  31. Rohrer J, Gong L (2009) Accelerating 3D nonrigid registration using the cell broadband engine processor. IBM J R Dev 53(5) https://doi.org/10.1147/JRD.2009.5429078

  32. Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ (1999) Nonrigid registration using free-form deformations: application to breast MR images. IEEE Transactions on Medical Imaging 18(8):712–721. https://doi.org/10.1109/42.796284

    Article  Google Scholar 

  33. Rul S, Vandierendonck H, Bosschere KD (2010) A profile-based tool for finding pipeline parallelism in sequential programs. Parallel Computing 36(9):531–551. https://doi.org/10.1016/j.parco.2010.05.006

    Article  MATH  Google Scholar 

  34. Salomon M, Heitz F, Perrin GR, Armspach JP (2005) A massively parallel approach to deformable matching of 3D medical images via stochastic differential equations. Parallel Computing 31(1):45–71. https://doi.org/10.1016/j.parco.2004.12.003

    Article  MathSciNet  Google Scholar 

  35. Schulz M, de Supinski BR (2007) Practical differential Profiling. Springer, Berlin, pp 97–106. https://doi.org/10.1007/978-3-540-74466-5_12

    Book  Google Scholar 

  36. Shackleford J, Kandasamy N, Sharp G (2013) High performance deformable image registration algorithms for manycore processors. Morgan Kaufmann Publishers Inc., San Mateo. https://doi.org/10.1016/B978-0-12-407741-6.00007-4

    Book  Google Scholar 

  37. Shams R, Sadeghi P, Kennedy RA, Hartley RI (2010) A survey of medical image registration on multicore and the GPU. IEEE Signal Processing Magazine 27(2):50–60. https://doi.org/10.1109/MSP.2009.935387

    Article  Google Scholar 

  38. Shams R, Sadeghi P, Kennedy R, Hartley R (2010) Parallel computation of mutual information on the GPU with application to real-time registration of 3D medical images. Computer Methods and Programs in Biomedicine 99(2):133–146. https://doi.org/10.1016/j.cmpb.2009.11.004

    Article  Google Scholar 

  39. Shi L, Liu W, Zhang H, Xie Y, Wang D, Shi L, Liu W, Zhang H, Xie Y, Wang D (2012) A survey of GPU-based medical image computing techniques. Quant Imaging Med Surg 2(3)

  40. Snape P, Pszczolkowski S, Zafeiriou S, Tzimiropoulos G, Ledig C, Rueckert D (2016) A robust similarity measure for volumetric image registration with outliers. Image and Vision Computing 52(C):97–113. https://doi.org/10.1016/j.imavis.2016.05.006

    Article  Google Scholar 

  41. Spivey JM (2004) Fast, accurate call graph profiling. Software: Practice and Experience 34(3):249–264. https://doi.org/10.1002/spe.562

    Article  Google Scholar 

  42. Vadja A (2011) Programming many-core chip. Springer, Berlin. https://doi.org/10.1007/978-1-4419-9739-5

    Book  Google Scholar 

  43. Warfield SK, Jolesz FA, Kikinis R (1998) A high performance computing approach to the registration of medical imaging data. Parallel Computing 24:1345–1368. https://doi.org/10.1016/S0167-8191(98)00061-1

    Article  Google Scholar 

Download references

Acknowledgements

The first author gratefully acknowledges the following institutions for the support received: Universidade do Estado de Mato Grosso (UNEMAT), in Brazil, and National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq), process grant 234306/2014-9 under reference #2010/15691-0.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Manuel R. S. Tavares.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gulo, C.A.S.J., Sementille, A.C. & Tavares, J.M.R.S. Optimizing a medical image registration algorithm based on profiling data for real-time performance. Multimed Tools Appl 81, 2603–2620 (2022). https://doi.org/10.1007/s11042-021-11699-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11699-x

Keywords

Navigation