Fast parallel vessel segmentation

https://doi.org/10.1016/j.cmpb.2020.105430Get rights and content

Highlights

  • Parallel implementation of image gradient.

  • GPU acceleration of seeded region growing.

  • Gradient based parallel seeded region growing for vessel segmentation from CT liver images.

  • Evaluation of accurate vessel segmentation shows the speedup of 1.9x compared to the state-of-the-art.

Abstract

Background and Objective: Accurate and fast vessel segmentation from liver slices remain challenging and important tasks for clinicians. The algorithms from the literature are slow and less accurate. We propose fast parallel gradient based seeded region growing for vessel segmentation. Seeded region growing is tedious when the inter connectivity between the elements is unavoidable. Parallelizing region growing algorithms are essential towards achieving real time performance for the overall process of accurate vessel segmentation.

Methods: The parallel implementation of seeded region growing for vessel segmentation is iterative and hence time consuming process. Seeded region growing is implemented as kernel termination and relaunch on GPU due to its iterative mechanism. The iterative or recursive process in region growing is time consuming due to intermediate memory transfers between CPU and GPU. We propose persistent and grid-stride loop based parallel approach for region growing on GPU. We analyze static region of interest of tiles on GPU for the acceleration of seeded region growing.

Results: We aim fast parallel gradient based seeded region growing for vessel segmentation from CT liver slices. The proposed parallel approach is 1.9x faster compared to the state-of-the-art.

Conclusion: We discuss gradient based seeded region growing and its parallel implementation on GPU. The proposed parallel seeded region growing is fast compared to kernel termination and relaunch and accurate in comparison to Chan-Vese and Snake model for vessel segmentation.

Introduction

In medical imaging, vessel segmentation from liver slices is one of the challenging tasks. Seeded region growing (SRG) is a widely used approach for semi automatic vessel segmentation [1], [2]. Delibasis et. al. [3] have proposed a tool based on a modified version of SRG algorithm, combined with a priori knowledge of the required shape. SRG starts with a set of pixels called seeds and grows a uniform, connected region from each seed. Key steps to SRG are to define seed(s) and a classifying criterion that relies on the image properties and user interaction [4]. SRG starts from a seed and finds the similar neighboring points based on the threshold criteria using 4 or 8 connectivity. Region is grown if the threshold criteria is satisfied. Similar neighbors are new seed points for the next iteration. This process is repeated until the region can not be grown further. In practice, it demands high computational cost to the large amount of dependent data to be processed in SRG especially in the medical image analysis and still requires efficient solutions [5].

SRG is an iterative process. SRG is invoked continuously until region can not be grown further. Iterative process in SRG, when implemented on GPU requires terminating kernel and relaunching from CPU (Kernel Termination and Relaunch (KTRL)) and data transfers between CPU and GPU [1], [4]. So our main objective is to reduce these data transfers using different inter block GPU synchronization (IBS) methods resulting in an efficient parallel implementation of SRG. IBS provides flexibility to move all the computations on GPU by providing visibility to updated intermediate data without any intervention from CPU.

In this paper, we propose persistent, grid-stride loop and IBS based GPU approach for SRG to avoid intermediate memory transfers between CPU and GPU. This also reduces processing over unnecessary image voxels providing significant speedup. Persistent thread block (PT) approach is basically dependent on number of active thread blocks and grid-stride loop becomes essential when the number of threads in the grid are not enough to process the image voxels independently [6], [7].

We implement parallel image gradient using grid-stride loop and propose gradient and shared memory based fast parallel SRG implemented entirely on GPU without any intermediate transfers between CPU and GPU. This is inspired by parallel processing on static region of interest (RoI) of tiles on GPU. We compare the proposed persistent based parallel SRG with KTRL for accurate vessel segmentation. The gradient based fast parallel SRG for 2D vessel segmentation is 1.9 ×  faster compared to the state-of-the-art.

The rest of the paper is structured as follows. Section 2 briefs relevant works and state-of-the-art with respect to SRG. Section 3 explains GPU approaches (KTRL and Static) for SRG implementation using persistence and grid-stride loop. The application of parallel SRG to vessel segmentation is discussed in the Section 4. Performance results and comparison of persistent and grid-stride loop based parallel SRG for vessel segmentation are mentioned in the Section 5. Section 6 concludes summarizing the main conclusions of this paper and indicating future directions. List of abbreviations with explanations are mentioned in Table 1.

Section snippets

Background and motivation

There are many works done on image segmentation recently which are based on snake based model [8], gradient vector flow [9], [10], and level set based Chan-Vese model [11]. Scientists have explored the snake model for segmentation. Snakes are defined as a set of points around a contour [8]. But the problem with the snake model is that the contour never sees the strong edges that are far away and the snake gets hung up due to many small noises in the image [8]. Hence researchers came up with the

Parallel SRG

GPU is a grid of block of threads. Thread is the smallest computational unit mapped on the cores and block of threads are mapped on the streaming multiprocessors (SMs). Each SM can occupy more than one block. The threads from independent blocks can access data via shared memory in the SM [21]. In order to communicate valid data between the blocks, these persistent blocks need to be synchronized via IBS through device memory. Persistence implies maximum number thread blocks that can be active at

Application to 2D vessel segmentation

The 2D segmentation algorithm is inspired by the gradient based SRG algorithm developed by Rai and Nair [29]. We proposed the fast parallel SRG based segmentation algorithm on GPU for vessel segmentation. We discuss the two important modules i.e. image gradient and SRG for the fast parallel 2D segmentation of vessels from CT liver images.

Performance evaluation

We propose persistent and grid-stride based GPU approaches for fast parallel 2D vessel segmentation. The performance results are obtained from KTRL and proposed persistent based GPU approach. We compare proposed approaches with KTRL. We use Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz RAM 24 GB, NVIDIA GPU 1050 (RAM 4GB), OpenCL 1.2 (ref. [30]) and CUDA Toolkit 10.1 for the implementation.

Conclusion

In this paper, we discuss SRG based vessel segmentation and its parallel implementation on GPU. We propose persistence and grid-stride loop based GPU approach for SRG providing significant speedup. Normally recursion/iterative calling of a kernel is generally a bad idea on GPUs. We use persistence and grid-stride approach as an alternate implementation for KTRL. We compare proposed GPU optimization strategy for SRG implementation. The proposed persistent and gradient based parallel SRG for 2D

Declaration of Competing Interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Acknowledgements

The work is supported by the project High Performance soft tissue Navigation (HiPerNav). This project has received funding from the European Union Horizon 2020 research and innovation program under grant agreement No. 722068. We thank The Intervention Centre, Oslo University Hospital, Oslo, Norway for providing the CT images with ground truths for the clinical validation of vessel segmentation.

References (35)

  • K. Gupta et al.

    A study of persistent threads style GPU programming for GPGPU workloads

    Innovative Parallel Computing-Foundations & Applications of GPU, Manycore, and Heterogeneous Systems (INPAR 2012)

    (2012)
  • G. Chen et al.

    Free launch: optimizing GPU dynamic kernel launches through thread reuse

    Proceedings of the 48th International Symposium on Microarchitecture

    (2015)
  • E. Smistad et al.

    Real-time gradient vector flow on GPUs using OpenCL

    J. Real-Time Image Process.

    (2015)
  • R.P. Kumar et al.

    Three-dimensional blood vessel segmentation and centerline extraction based on two-dimensional cross-section analysis

    Ann. Biomed. Eng.

    (2015)
  • E. Smistad, Seeded region growing, 2015,...
  • E. Smistad et al.

    GPU accelerated segmentation and centerline extraction of tubular structures from medical images

    Int. J. Comput. Assisted Radiol. Surg.

    (2014)
  • P. Harish et al.

    Accelerating large graph algorithms on the GPU using CUDA

  • Cited by (11)

    • Accelerating Chan–Vese model with cross-modality guided contrast enhancement for liver segmentation

      2020, Computers in Biology and Medicine
      Citation Excerpt :

      We use Intel(R) Core(TM) i7-7700HQ CPU @ 2.80 GHz RAM 24 GB, NVIDIA GPU GeForce GTX 1050 (RAM 4 GB) and CUDA Toolkit 10.1 for the implementation and we evaluate the performance of liver segmentation in the following section. Liver data for the research work has been acquired from The Intervention Centre, University of Oslo, Norway [3,41]. The ground truths for liver segmentation are provided by the clinician.

    • Accelerating B-spline interpolation on GPUs: Application to medical image registration

      2020, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      GPUs provide higher throughput and power-efficiency than CPUs on multithreaded workloads [8]. The performance of medical imaging applications benefits significantly from GPUs [9–15,45]. For these reasons, several authors have used GPUs for BSI [6,16–19].

    • Cross-modality-guided contrast enhancement on liver segmentation

      2023, Iberian Conference on Information Systems and Technologies, CISTI
    View all citing articles on Scopus
    View full text