Fast parallel vessel segmentation
Introduction
In medical imaging, vessel segmentation from liver slices is one of the challenging tasks. Seeded region growing (SRG) is a widely used approach for semi automatic vessel segmentation [1], [2]. Delibasis et. al. [3] have proposed a tool based on a modified version of SRG algorithm, combined with a priori knowledge of the required shape. SRG starts with a set of pixels called seeds and grows a uniform, connected region from each seed. Key steps to SRG are to define seed(s) and a classifying criterion that relies on the image properties and user interaction [4]. SRG starts from a seed and finds the similar neighboring points based on the threshold criteria using 4 or 8 connectivity. Region is grown if the threshold criteria is satisfied. Similar neighbors are new seed points for the next iteration. This process is repeated until the region can not be grown further. In practice, it demands high computational cost to the large amount of dependent data to be processed in SRG especially in the medical image analysis and still requires efficient solutions [5].
SRG is an iterative process. SRG is invoked continuously until region can not be grown further. Iterative process in SRG, when implemented on GPU requires terminating kernel and relaunching from CPU (Kernel Termination and Relaunch (KTRL)) and data transfers between CPU and GPU [1], [4]. So our main objective is to reduce these data transfers using different inter block GPU synchronization (IBS) methods resulting in an efficient parallel implementation of SRG. IBS provides flexibility to move all the computations on GPU by providing visibility to updated intermediate data without any intervention from CPU.
In this paper, we propose persistent, grid-stride loop and IBS based GPU approach for SRG to avoid intermediate memory transfers between CPU and GPU. This also reduces processing over unnecessary image voxels providing significant speedup. Persistent thread block (PT) approach is basically dependent on number of active thread blocks and grid-stride loop becomes essential when the number of threads in the grid are not enough to process the image voxels independently [6], [7].
We implement parallel image gradient using grid-stride loop and propose gradient and shared memory based fast parallel SRG implemented entirely on GPU without any intermediate transfers between CPU and GPU. This is inspired by parallel processing on static region of interest (RoI) of tiles on GPU. We compare the proposed persistent based parallel SRG with KTRL for accurate vessel segmentation. The gradient based fast parallel SRG for 2D vessel segmentation is 1.9 × faster compared to the state-of-the-art.
The rest of the paper is structured as follows. Section 2 briefs relevant works and state-of-the-art with respect to SRG. Section 3 explains GPU approaches (KTRL and Static) for SRG implementation using persistence and grid-stride loop. The application of parallel SRG to vessel segmentation is discussed in the Section 4. Performance results and comparison of persistent and grid-stride loop based parallel SRG for vessel segmentation are mentioned in the Section 5. Section 6 concludes summarizing the main conclusions of this paper and indicating future directions. List of abbreviations with explanations are mentioned in Table 1.
Section snippets
Background and motivation
There are many works done on image segmentation recently which are based on snake based model [8], gradient vector flow [9], [10], and level set based Chan-Vese model [11]. Scientists have explored the snake model for segmentation. Snakes are defined as a set of points around a contour [8]. But the problem with the snake model is that the contour never sees the strong edges that are far away and the snake gets hung up due to many small noises in the image [8]. Hence researchers came up with the
Parallel SRG
GPU is a grid of block of threads. Thread is the smallest computational unit mapped on the cores and block of threads are mapped on the streaming multiprocessors (SMs). Each SM can occupy more than one block. The threads from independent blocks can access data via shared memory in the SM [21]. In order to communicate valid data between the blocks, these persistent blocks need to be synchronized via IBS through device memory. Persistence implies maximum number thread blocks that can be active at
Application to 2D vessel segmentation
The 2D segmentation algorithm is inspired by the gradient based SRG algorithm developed by Rai and Nair [29]. We proposed the fast parallel SRG based segmentation algorithm on GPU for vessel segmentation. We discuss the two important modules i.e. image gradient and SRG for the fast parallel 2D segmentation of vessels from CT liver images.
Performance evaluation
We propose persistent and grid-stride based GPU approaches for fast parallel 2D vessel segmentation. The performance results are obtained from KTRL and proposed persistent based GPU approach. We compare proposed approaches with KTRL. We use Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz RAM 24 GB, NVIDIA GPU 1050 (RAM 4GB), OpenCL 1.2 (ref. [30]) and CUDA Toolkit 10.1 for the implementation.
Conclusion
In this paper, we discuss SRG based vessel segmentation and its parallel implementation on GPU. We propose persistence and grid-stride loop based GPU approach for SRG providing significant speedup. Normally recursion/iterative calling of a kernel is generally a bad idea on GPUs. We use persistence and grid-stride approach as an alternate implementation for KTRL. We compare proposed GPU optimization strategy for SRG implementation. The proposed persistent and gradient based parallel SRG for 2D
Declaration of Competing Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Acknowledgements
The work is supported by the project High Performance soft tissue Navigation (HiPerNav). This project has received funding from the European Union Horizon 2020 research and innovation program under grant agreement No. 722068. We thank The Intervention Centre, Oslo University Hospital, Oslo, Norway for providing the CT images with ground truths for the clinical validation of vessel segmentation.
References (35)
- et al.
GPU acceleration of liver enhancement for tumor segmentation
Comput. Methods Programs Biomed.
(2020) - et al.
A novel method for planning liver resections using deformable Bézier surfaces and distance maps
Comput. Methods Programs Biomed.
(2017) - et al.
A novel tool for segmenting 3d medical images based on generalized cylinders and active surfaces
Comput. Methods Programs Biomed.
(2013) - et al.
Medical image segmentation on GPUs–a comprehensive review
Med. Image Anal.
(2015) - et al.
Enhancement of morphological snake based segmentation by imparting image attachment through scale-space continuity
Pattern Recognit.
(2015) - et al.
Mean shift based gradient vector flow for image segmentation
Comput. Vis. Image Underst.
(2013) - et al.
Combined endeavor of neutrosophic set and Chan-Vese model to extract accurate liver image from ct scan
Comput. Methods Programs Biomed.
(2017) - et al.
A medical image segmentation algorithm based on bi-directional region growing
Optik
(2015) - et al.
GPU-based single-cluster algorithm for the simulation of the ising model
J. Comput. Phys.
(2012) - et al.
An efficient parallel algorithm for graph-based image segmentation
International Conference on Computer Analysis of Images and Patterns
(2009)
A study of persistent threads style GPU programming for GPGPU workloads
Innovative Parallel Computing-Foundations & Applications of GPU, Manycore, and Heterogeneous Systems (INPAR 2012)
Free launch: optimizing GPU dynamic kernel launches through thread reuse
Proceedings of the 48th International Symposium on Microarchitecture
Real-time gradient vector flow on GPUs using OpenCL
J. Real-Time Image Process.
Three-dimensional blood vessel segmentation and centerline extraction based on two-dimensional cross-section analysis
Ann. Biomed. Eng.
GPU accelerated segmentation and centerline extraction of tubular structures from medical images
Int. J. Comput. Assisted Radiol. Surg.
Accelerating large graph algorithms on the GPU using CUDA
Cited by (11)
Enhancing pore network extraction performance via seed-based pore region growing segmentation
2024, Advances in Water ResourcesAccelerating Chan–Vese model with cross-modality guided contrast enhancement for liver segmentation
2020, Computers in Biology and MedicineCitation Excerpt :We use Intel(R) Core(TM) i7-7700HQ CPU @ 2.80 GHz RAM 24 GB, NVIDIA GPU GeForce GTX 1050 (RAM 4 GB) and CUDA Toolkit 10.1 for the implementation and we evaluate the performance of liver segmentation in the following section. Liver data for the research work has been acquired from The Intervention Centre, University of Oslo, Norway [3,41]. The ground truths for liver segmentation are provided by the clinician.
Accelerating B-spline interpolation on GPUs: Application to medical image registration
2020, Computer Methods and Programs in BiomedicineCitation Excerpt :GPUs provide higher throughput and power-efficiency than CPUs on multithreaded workloads [8]. The performance of medical imaging applications benefits significantly from GPUs [9–15,45]. For these reasons, several authors have used GPUs for BSI [6,16–19].
Performance evaluation of spatial fuzzy C-means clustering algorithm on GPU for image segmentation
2023, Multimedia Tools and ApplicationsCross-modality-guided contrast enhancement on liver segmentation
2023, Iberian Conference on Information Systems and Technologies, CISTI