Abstract
Image registration is a commonly task in medical image analysis. Therefore, a significant number of algorithms have been developed to perform rigid and non-rigid image registration. Particularly, the free-form deformation algorithm is frequently used to carry out non-rigid registration task; however, it is a computationally very intensive algorithm. In this work, we describe an approach based on profiling data to identify potential parts of this algorithm for which parallel implementations can be developed. The proposed approach assesses the efficient of the algorithm by applying performance analysis techniques commonly available in traditional computer operating systems. Hence, this article provides guidelines to support researchers working on medical image processing and analysis to achieve real-time non-rigid image registration applications using common computing systems. According to our experimental findings, significant speedups can be accomplished by parallelizing sequential snippets, i.e., code regions that are executed more than once. For the selected costly functions previously identified in the studied free-form deformation algorithm, the developed parallelization decreased the runtime by up to seven times relatively to the related single thread based implementation. The implementations were developed based on the Open Multi-Processing application programming interface. In conclusion, this study confirms that based on the call graph visualization and detected performance bottlenecks, one can easily find and evaluate snippets which are potential optimization targets in addition to throughput in memory accesses.
Similar content being viewed by others
Notes
gprof2dot is an open source script written in Python used to convert the output from a range of profiles into a dot graph. This script can be freely downloaded at https://github.com/jrfonseca/gprof2dot.
dot is a Graphviz feature for producing hierarchical drawings of directed graphs. Graphviz is an open source visualization software for representing structural information such as diagrams of abstract graphs. More information is available at http://graphviz.org.
An executable version of the FFD algorithm used for comparison purpose, by performing quantitative analysis based on the DSC value, can be downloaded from Daniel Rueckert’s webpage: http://www.doc.ic.ac.uk/~dr
References
Ball T, Larus JR (1994) Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems 16(4):1319–1360. https://doi.org/10.1145/183432.183527
Bezemer CP, Pouwelse J, Gregg B (2015) Understanding software performance regressions using differential flame graphs. In: 22nd International conference on software analysis, evolution, and reengineering (SANER), pp 535–539 https://doi.org/10.1109/SANER.2015.7081872
Carass A, Roy S, Jog A, Cuzzocreo JL, Magrath E, Gherman A, Button J et al (2017) Longitudinal multiple sclerosis lesion segmentation: Resource and challenge. NeuroImage 148:77–102. https://doi.org/10.1016/j.neuroimage.2016.12.064
Christensen GE (1998) MIMD vs. SIMD parallel processing: A case study in 3D medical image registration. Parallel Computing 24:1369–1383. https://doi.org/10.1016/S0167-8191(98)00062-3
Dandekar O, Shekhar R (2007) FPGA-accelerated deformable image registration for improved target-delineation during CT-guided interventions. IEEE Transactions on Biomedical Circuits and Systems 1(2):116–127. https://doi.org/10.1109/TBCAS.2007.909023
Dimakopoulou M, Eranian S, Koziris N, Bambos N (2016) Reliable and efficient performance monitoring in Linux. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, IEEE Press, pp 1–13
Eklund A, Dufort P, Forsberg D, LaConte SM (2013) Medical image processing on the GPU - past, present and future. Medical Image Analysis 17(8):1073–1094. https://doi.org/10.1016/j.media.2013.05.008
El-Gamal FEZA, Elmogy M, Atwan A (2016) Current trends in medical image registration and fusion. Egyptian Informatics Journal 17(1):99–124. https://doi.org/10.1016/j.eij.2015.09.002
Ellingwood ND, Yin Y, Smith M, Lin CL (2016) Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs. Computer Methods and Programs in Biomedicine 127:290–300. https://doi.org/10.1016/j.cmpb.2015.12.018
Gebali F (2011) Algorithms and parallel computing. John Wiley & Sons, New York. https://doi.org/10.1002/9780470932025
Gong L, Kulikowski CA (2012) High-performance medical imaging informatics. Methods of Information in Medicine 51(3):258–259
Graham SL, Kessler PB, McKusick MK (2004) gprof: A call graph execution profiler. ACM SIGPLAN Notes 39(4):49–57. https://doi.org/10.1145/989393.989401
Gregg B (2016) The flame graph: This visualization of software execution is a new necessity for performance profiling and debugging. ACM Queue Magazine 14(2):91–110. https://doi.org/10.1145/2927299.2927301
Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38. https://doi.org/10.1109/MC.2008.209
Kirk D, Hwu WM (2010) Programming massively parallel processors: A hands-on approach. Elsevier, Amsterdam
Kruskal JB, Landwehr JM (1983) Icicle plots: Better displays for hierarchical clustering. The American Statistician 37(2):162–168. https://doi.org/10.2307/2685881
Lapeer RJ, Shah SK, Rowland RS (2010) An optimised radial basis function algorithm for fast non-rigid registration of medical images. Computers in Biology and Medicine 40(1):1–7. https://doi.org/10.1016/j.compbiomed.2009.10.002
Li A, Kumar A, Ha Y, Corporaal H (2015) Correlation ratio based volume image registration on GPUs. Microprocessors and Microsystems 39(8):998–1011. https://doi.org/10.1016/j.micpro.2015.04.002
Li Z, Atre R, Huda Z, Jannesari A, Wolf F (2016) Unveiling parallelization opportunities in sequential programs. Journal of Systems and Software 117:282–295. https://doi.org/10.1016/j.jss.2016.03.045
Mafi R, Sirouspour S (2014) GPU-based acceleration of computations in nonlinear finite element deformation analysis. International Journal for Numerical Methods in Biomedical Engineering 30(3):365–381. https://doi.org/10.1002/cnm.2607
McInerney T, Terzopoulos D (1996) Deformable models in medical image analysis: a survey. Medical Image Analysis 1(2):91–108. https://doi.org/10.1016/S1361-8415(96)80007-7
Meng L (2014) Acceleration method of 3D medical images registration based on compute unified device architecture. Bio-Medical Materials and Engineering 24(1):1109–1116. https://doi.org/10.3233/BME-130910
Mittal S, Vetter JS (2015) A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47(4):69:1-69:35. https://doi.org/10.1145/2788396
Modat M, Ridgway GR, Taylor ZA, Lehmann M, Barnes J, Hawkes DJ, Fox NC, Ourselin S (2010) Fast free-form deformation using graphics processing units. Computer Methods and Programs in Biomedicine 98(3):278–284. https://doi.org/10.1016/j.cmpb.2009.09.002
Oliveira FP, Tavares JMR (2014) Medical image registration: a review. Computer Methods in Biomechanics and Biomedical Engineering 17(2):73–93. https://doi.org/10.1080/10255842.2012.670855
Palomar R, Gómez-Luna J, Cheikh FA, Olivares-Bueno J, Elle OJ (2017) High-performance computation of bézier surfaces on parallel and heterogeneous platforms. Int J Parallel Program. https://doi.org/10.1007/s10766-017-0506-1
Parraguez SPP (2015) Fast and robust methods for non-rigid registration of medical images. PhD thesis, Imperial College of Science
Rehman T, Haber E, Pryor G, Melonakos J, Tannenbaum A (2009) 3Dnonrigid registration via optimal mass transport on the GPU. Medical Image Analysis 13(6):931–940. https://doi.org/10.1016/j.media.2008.10.008
Rohlfing T, Maurer CR (2003) Nonrigid image registration in shared-memory multiprocessor environments with application to brains, breasts, and bees. IEEE Transactions on Information Technology in Biomedicine 7(1):16–25. https://doi.org/10.1109/TITB.2003.808506
Rohou E (2012) Tiptop: Hardware performance counters for the masses. In: 41st international conference on parallel processing workshops, pp 404–413 https://doi.org/10.1109/ICPPW.2012.58
Rohrer J, Gong L (2009) Accelerating 3D nonrigid registration using the cell broadband engine processor. IBM J R Dev 53(5) https://doi.org/10.1147/JRD.2009.5429078
Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ (1999) Nonrigid registration using free-form deformations: application to breast MR images. IEEE Transactions on Medical Imaging 18(8):712–721. https://doi.org/10.1109/42.796284
Rul S, Vandierendonck H, Bosschere KD (2010) A profile-based tool for finding pipeline parallelism in sequential programs. Parallel Computing 36(9):531–551. https://doi.org/10.1016/j.parco.2010.05.006
Salomon M, Heitz F, Perrin GR, Armspach JP (2005) A massively parallel approach to deformable matching of 3D medical images via stochastic differential equations. Parallel Computing 31(1):45–71. https://doi.org/10.1016/j.parco.2004.12.003
Schulz M, de Supinski BR (2007) Practical differential Profiling. Springer, Berlin, pp 97–106. https://doi.org/10.1007/978-3-540-74466-5_12
Shackleford J, Kandasamy N, Sharp G (2013) High performance deformable image registration algorithms for manycore processors. Morgan Kaufmann Publishers Inc., San Mateo. https://doi.org/10.1016/B978-0-12-407741-6.00007-4
Shams R, Sadeghi P, Kennedy RA, Hartley RI (2010) A survey of medical image registration on multicore and the GPU. IEEE Signal Processing Magazine 27(2):50–60. https://doi.org/10.1109/MSP.2009.935387
Shams R, Sadeghi P, Kennedy R, Hartley R (2010) Parallel computation of mutual information on the GPU with application to real-time registration of 3D medical images. Computer Methods and Programs in Biomedicine 99(2):133–146. https://doi.org/10.1016/j.cmpb.2009.11.004
Shi L, Liu W, Zhang H, Xie Y, Wang D, Shi L, Liu W, Zhang H, Xie Y, Wang D (2012) A survey of GPU-based medical image computing techniques. Quant Imaging Med Surg 2(3)
Snape P, Pszczolkowski S, Zafeiriou S, Tzimiropoulos G, Ledig C, Rueckert D (2016) A robust similarity measure for volumetric image registration with outliers. Image and Vision Computing 52(C):97–113. https://doi.org/10.1016/j.imavis.2016.05.006
Spivey JM (2004) Fast, accurate call graph profiling. Software: Practice and Experience 34(3):249–264. https://doi.org/10.1002/spe.562
Vadja A (2011) Programming many-core chip. Springer, Berlin. https://doi.org/10.1007/978-1-4419-9739-5
Warfield SK, Jolesz FA, Kikinis R (1998) A high performance computing approach to the registration of medical imaging data. Parallel Computing 24:1345–1368. https://doi.org/10.1016/S0167-8191(98)00061-1
Acknowledgements
The first author gratefully acknowledges the following institutions for the support received: Universidade do Estado de Mato Grosso (UNEMAT), in Brazil, and National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq), process grant 234306/2014-9 under reference #2010/15691-0.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gulo, C.A.S.J., Sementille, A.C. & Tavares, J.M.R.S. Optimizing a medical image registration algorithm based on profiling data for real-time performance. Multimed Tools Appl 81, 2603–2620 (2022). https://doi.org/10.1007/s11042-021-11699-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11699-x