Fast parallel vessel segmentation

doi:10.1016/j.cmpb.2020.105430

Computer Methods and Programs in Biomedicine

Volume 192, August 2020, 105430

https://doi.org/10.1016/j.cmpb.2020.105430 Get rights and content

Highlights

•
Parallel implementation of image gradient.
•
GPU acceleration of seeded region growing.
•
Gradient based parallel seeded region growing for vessel segmentation from CT liver images.
•
Evaluation of accurate vessel segmentation shows the speedup of 1.9x compared to the state-of-the-art.

Abstract

Background and Objective: Accurate and fast vessel segmentation from liver slices remain challenging and important tasks for clinicians. The algorithms from the literature are slow and less accurate. We propose fast parallel gradient based seeded region growing for vessel segmentation. Seeded region growing is tedious when the inter connectivity between the elements is unavoidable. Parallelizing region growing algorithms are essential towards achieving real time performance for the overall process of accurate vessel segmentation.

Methods: The parallel implementation of seeded region growing for vessel segmentation is iterative and hence time consuming process. Seeded region growing is implemented as kernel termination and relaunch on GPU due to its iterative mechanism. The iterative or recursive process in region growing is time consuming due to intermediate memory transfers between CPU and GPU. We propose persistent and grid-stride loop based parallel approach for region growing on GPU. We analyze static region of interest of tiles on GPU for the acceleration of seeded region growing.

Results: We aim fast parallel gradient based seeded region growing for vessel segmentation from CT liver slices. The proposed parallel approach is 1.9x faster compared to the state-of-the-art.

Conclusion: We discuss gradient based seeded region growing and its parallel implementation on GPU. The proposed parallel seeded region growing is fast compared to kernel termination and relaunch and accurate in comparison to Chan-Vese and Snake model for vessel segmentation.

Introduction

In medical imaging, vessel segmentation from liver slices is one of the challenging tasks. Seeded region growing (SRG) is a widely used approach for semi automatic vessel segmentation [1], [2]. Delibasis et. al. [3] have proposed a tool based on a modified version of SRG algorithm, combined with a priori knowledge of the required shape. SRG starts with a set of pixels called seeds and grows a uniform, connected region from each seed. Key steps to SRG are to define seed(s) and a classifying criterion that relies on the image properties and user interaction [4]. SRG starts from a seed and finds the similar neighboring points based on the threshold criteria using 4 or 8 connectivity. Region is grown if the threshold criteria is satisfied. Similar neighbors are new seed points for the next iteration. This process is repeated until the region can not be grown further. In practice, it demands high computational cost to the large amount of dependent data to be processed in SRG especially in the medical image analysis and still requires efficient solutions [5].

SRG is an iterative process. SRG is invoked continuously until region can not be grown further. Iterative process in SRG, when implemented on GPU requires terminating kernel and relaunching from CPU (Kernel Termination and Relaunch (KTRL)) and data transfers between CPU and GPU [1], [4]. So our main objective is to reduce these data transfers using different inter block GPU synchronization (IBS) methods resulting in an efficient parallel implementation of SRG. IBS provides flexibility to move all the computations on GPU by providing visibility to updated intermediate data without any intervention from CPU.

In this paper, we propose persistent, grid-stride loop and IBS based GPU approach for SRG to avoid intermediate memory transfers between CPU and GPU. This also reduces processing over unnecessary image voxels providing significant speedup. Persistent thread block (PT) approach is basically dependent on number of active thread blocks and grid-stride loop becomes essential when the number of threads in the grid are not enough to process the image voxels independently [6], [7].

We implement parallel image gradient using grid-stride loop and propose gradient and shared memory based fast parallel SRG implemented entirely on GPU without any intermediate transfers between CPU and GPU. This is inspired by parallel processing on static region of interest (RoI) of tiles on GPU. We compare the proposed persistent based parallel SRG with KTRL for accurate vessel segmentation. The gradient based fast parallel SRG for 2D vessel segmentation is 1.9 × faster compared to the state-of-the-art.

The rest of the paper is structured as follows. Section 2 briefs relevant works and state-of-the-art with respect to SRG. Section 3 explains GPU approaches (KTRL and Static) for SRG implementation using persistence and grid-stride loop. The application of parallel SRG to vessel segmentation is discussed in the Section 4. Performance results and comparison of persistent and grid-stride loop based parallel SRG for vessel segmentation are mentioned in the Section 5. Section 6 concludes summarizing the main conclusions of this paper and indicating future directions. List of abbreviations with explanations are mentioned in Table 1.

Section snippets

Background and motivation

There are many works done on image segmentation recently which are based on snake based model [8], gradient vector flow [9], [10], and level set based Chan-Vese model [11]. Scientists have explored the snake model for segmentation. Snakes are defined as a set of points around a contour [8]. But the problem with the snake model is that the contour never sees the strong edges that are far away and the snake gets hung up due to many small noises in the image [8]. Hence researchers came up with the

Parallel SRG

GPU is a grid of block of threads. Thread is the smallest computational unit mapped on the cores and block of threads are mapped on the streaming multiprocessors (SMs). Each SM can occupy more than one block. The threads from independent blocks can access data via shared memory in the SM [21]. In order to communicate valid data between the blocks, these persistent blocks need to be synchronized via IBS through device memory. Persistence implies maximum number thread blocks that can be active at

Application to 2D vessel segmentation

The 2D segmentation algorithm is inspired by the gradient based SRG algorithm developed by Rai and Nair [29]. We proposed the fast parallel SRG based segmentation algorithm on GPU for vessel segmentation. We discuss the two important modules i.e. image gradient and SRG for the fast parallel 2D segmentation of vessels from CT liver images.

Performance evaluation

We propose persistent and grid-stride based GPU approaches for fast parallel 2D vessel segmentation. The performance results are obtained from KTRL and proposed persistent based GPU approach. We compare proposed approaches with KTRL. We use Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz RAM 24 GB, NVIDIA GPU 1050 (RAM 4GB), OpenCL 1.2 (ref. [30]) and CUDA Toolkit 10.1 for the implementation.

Conclusion

In this paper, we discuss SRG based vessel segmentation and its parallel implementation on GPU. We propose persistence and grid-stride loop based GPU approach for SRG providing significant speedup. Normally recursion/iterative calling of a kernel is generally a bad idea on GPUs. We use persistence and grid-stride approach as an alternate implementation for KTRL. We compare proposed GPU optimization strategy for SRG implementation. The proposed persistent and gradient based parallel SRG for 2D

Declaration of Competing Interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Acknowledgements

The work is supported by the project High Performance soft tissue Navigation (HiPerNav). This project has received funding from the European Union Horizon 2020 research and innovation program under grant agreement No. 722068. We thank The Intervention Centre, Oslo University Hospital, Oslo, Norway for providing the CT images with ground truths for the clinical validation of vessel segmentation.

References (35)

N. Satpute et al.
GPU acceleration of liver enhancement for tumor segmentation
Comput. Methods Programs Biomed.
(2020)
R. Palomar et al.
A novel method for planning liver resections using deformable Bézier surfaces and distance maps
Comput. Methods Programs Biomed.
(2017)
K.K. Delibasis et al.
A novel tool for segmenting 3d medical images based on generalized cylinders and active surfaces
Comput. Methods Programs Biomed.
(2013)
E. Smistad et al.
Medical image segmentation on GPUs–a comprehensive review
Med. Image Anal.
(2015)
S. Roy et al.
Enhancement of morphological snake based segmentation by imparting image attachment through scale-space continuity
Pattern Recognit.
(2015)
H. Zhou et al.
Mean shift based gradient vector flow for image segmentation
Comput. Vis. Image Underst.
(2013)
S.K. Siri et al.
Combined endeavor of neutrosophic set and Chan-Vese model to extract accurate liver image from ct scan
Comput. Methods Programs Biomed.
(2017)
X. Zhang et al.
A medical image segmentation algorithm based on bi-directional region growing
Optik
(2015)
Y. Komura et al.
GPU-based single-cluster algorithm for the simulation of the ising model
J. Comput. Phys.
(2012)
J. Wassenberg et al.
An efficient parallel algorithm for graph-based image segmentation
International Conference on Computer Analysis of Images and Patterns
(2009)

K. Gupta et al.

A study of persistent threads style GPU programming for GPGPU workloads

Innovative Parallel Computing-Foundations & Applications of GPU, Manycore, and Heterogeneous Systems (INPAR 2012)

(2012)

G. Chen et al.

Free launch: optimizing GPU dynamic kernel launches through thread reuse

Proceedings of the 48th International Symposium on Microarchitecture

(2015)

E. Smistad et al.

Real-time gradient vector flow on GPUs using OpenCL

J. Real-Time Image Process.

(2015)

R.P. Kumar et al.

Three-dimensional blood vessel segmentation and centerline extraction based on two-dimensional cross-section analysis

Ann. Biomed. Eng.

(2015)

E. Smistad, Seeded region growing, 2015,...

E. Smistad et al.

GPU accelerated segmentation and centerline extraction of tubular structures from medical images

Int. J. Comput. Assisted Radiol. Surg.

(2014)

P. Harish et al.

Accelerating large graph algorithms on the GPU using CUDA

Cited by (11)

Enhancing pore network extraction performance via seed-based pore region growing segmentation
2024, Advances in Water Resources
Pore-scale modeling, aided by imaging advancements and computational power, is now a vital tool for comprehending fluid flow and transport in porous media. It allows detailed exploration of pore structures and analysis of fluid dynamics and mass transport at smallest scales. This work presents the Pore Region Growing (PREGO) Algorithm, a novel approach to address segmentation challenges encountered in large-sized 3D images used for pore network modeling. The algorithm overcomes limitations of traditional methods like the watershed algorithm, in both computational time and memory of the pore segmentation. Like the watershed method, the proposed work involves identifying seed points, then grows each to fill their respective basis. The algorithm differs from existing approaches in that it efficiently prioritizes and grows the seed points using a FIFO queue, resulting in well-defined segmentation with clear and smooth boundaries between neighboring regions. The output was validated by demonstrating the algorithm's robustness in predicting diffusive behavior and viscous fluid flow within the pore phase network, with excellent agreement in comparison to lattice Boltzmann simulations and experimental data. The proposed algorithm was also shown to accurately predict transport properties under varying porosity, and different types of real porous media images. The evaluation of the Intersection over Union (IoU) and Dice Similarity Coefficient (DSC) metrics revealed that the PREGO technique had a maximum error of 5.44%, whereas the Watershed method exhibited a comparatively higher error rate of 22.3%. Structural analysis of physical properties revealed PREGO captures pore and throat sizes in close agreement with existing approaches. Crucially, computational performance analysis demonstrated its efficiency, significantly reducing memory consumption by 50% and CPU time speed up of 3.35X compared to the standard watershed algorithm using a priority queue.
Accelerating Chan–Vese model with cross-modality guided contrast enhancement for liver segmentation
2020, Computers in Biology and Medicine
Citation Excerpt :
We use Intel(R) Core(TM) i7-7700HQ CPU @ 2.80 GHz RAM 24 GB, NVIDIA GPU GeForce GTX 1050 (RAM 4 GB) and CUDA Toolkit 10.1 for the implementation and we evaluate the performance of liver segmentation in the following section. Liver data for the research work has been acquired from The Intervention Centre, University of Oslo, Norway [3,41]. The ground truths for liver segmentation are provided by the clinician.
Accurate and fast liver segmentation remains a challenging and important task for clinicians. Segmentation algorithms are slow and inaccurate due to noise and low quality images in computed tomography (CT) abdominal scans. Chan–Vese is an active contour based powerful and flexible method for image segmentation due to superior noise robustness. However, it is quite slow due to time-consuming partial differential equations, especially for large medical datasets. This can pose a problem for a real-time implementation of liver segmentation and hence, an efficient parallel implementation is highly desirable. Another important aspect is the contrast of CT liver images. Liver slices are sometimes very low in contrast which reduces the overall quality of liver segmentation. Hence, we implement cross-modality guided liver contrast enhancement as a pre-processing step to liver segmentation. GPU implementation of Chan–Vese improves average speedup by 99.811 ( $\pm$ 7.65) times and 14.647 ( $\pm$ 1.155) times with and without enhancement respectively in comparison with the CPU. Average dice, sensitivity and accuracy of liver segmentation are 0.656, 0.816 and 0.822 respectively on the original liver images and 0.877, 0.964 and 0.956 respectively on the enhanced liver images improving the overall quality of liver segmentation.
Accelerating B-spline interpolation on GPUs: Application to medical image registration
2020, Computer Methods and Programs in Biomedicine
Citation Excerpt :
GPUs provide higher throughput and power-efficiency than CPUs on multithreaded workloads [8]. The performance of medical imaging applications benefits significantly from GPUs [9–15,45]. For these reasons, several authors have used GPUs for BSI [6,16–19].
B-spline interpolation (BSI) is a popular technique in the context of medical imaging due to its adaptability and robustness in 3D object modeling. A field that utilizes BSI is Image Guided Surgery (IGS). IGS provides navigation using medical images, which can be segmented and reconstructed into 3D models, often through BSI. Image registration tasks also use BSI to transform medical imaging data collected before the surgery and intra-operative data collected during the surgery into a common coordinate space. However, such IGS tasks are computationally demanding, especially when applied to 3D medical images, due to the complexity and amount of data involved. Therefore, optimization of IGS algorithms is greatly desirable, for example, to perform image registration tasks intra-operatively and to enable real-time applications. A traditional CPU does not have sufficient computing power to achieve these goals and, thus, it is preferable to rely on GPUs. In this paper, we introduce a novel GPU implementation of BSI to accelerate the calculation of the deformation field in non-rigid image registration algorithms.
Our BSI implementation on GPUs minimizes the data that needs to be moved between memory and processing cores during loading of the input grid, and leverages the large on-chip GPU register file for reuse of input values. Moreover, we re-formulate our method as trilinear interpolations to reduce computational complexity and increase accuracy. To provide pre-clinical validation of our method and demonstrate its benefits in medical applications, we integrate our improved BSI into a registration workflow for compensation of liver deformation (caused by pneumoperitoneum, i.e., inflation of the abdomen) and evaluate its performance.
Our approach improves the performance of BSI by an average of 6.5× and interpolation accuracy by 2× compared to three state-of-the-art GPU implementations. Through pre-clinical validation, we demonstrate that our optimized interpolation accelerates a non-rigid image registration algorithm, which is based on the Free Form Deformation (FFD) method, by up to 34%.
Our study shows that we can achieve significant performance and accuracy gains with our novel parallelization scheme that makes effective use of the GPU resources. We show that our method improves the performance of real medical imaging registration applications used in practice today.
Comparison of Otsu and an adapted Chan–Vese method to determine thyroid active volume using Monte Carlo generated SPECT images
2024, EJNMMI Physics
Performance evaluation of spatial fuzzy C-means clustering algorithm on GPU for image segmentation
2023, Multimedia Tools and Applications
Cross-modality-guided contrast enhancement on liver segmentation
2023, Iberian Conference on Information Systems and Technologies, CISTI

View all citing articles on Scopus

View full text

Fast parallel vessel segmentation

Highlights

Abstract

Introduction

Section snippets

Background and motivation

Parallel SRG

Application to 2D vessel segmentation

Performance evaluation

Conclusion

Declaration of Competing Interest

Acknowledgements

Comput. Methods Programs Biomed.

Comput. Methods Programs Biomed.

Comput. Methods Programs Biomed.

Med. Image Anal.

Pattern Recognit.

Comput. Vis. Image Underst.

Comput. Methods Programs Biomed.

Optik

J. Comput. Phys.

An efficient parallel algorithm for graph-based image segmentation

International Conference on Computer Analysis of Images and Patterns

A study of persistent threads style GPU programming for GPGPU workloads

Innovative Parallel Computing-Foundations & Applications of GPU, Manycore, and Heterogeneous Systems (INPAR 2012)

Free launch: optimizing GPU dynamic kernel launches through thread reuse

Proceedings of the 48th International Symposium on Microarchitecture

Real-time gradient vector flow on GPUs using OpenCL

J. Real-Time Image Process.

Three-dimensional blood vessel segmentation and centerline extraction based on two-dimensional cross-section analysis

Ann. Biomed. Eng.

GPU accelerated segmentation and centerline extraction of tubular structures from medical images

Int. J. Comput. Assisted Radiol. Surg.

Accelerating large graph algorithms on the GPU using CUDA