An efficient interactive multi-label segmentation tool for 2D and 3D medical images using fully connected conditional random field

https://doi.org/10.1016/j.cmpb.2021.106534Get rights and content

Highlights

  • Free software for interactive image segmentation of 2D and 3D medical images.

  • The software can be used as a post-processing tool to refine segmentation results produced by other methods (e.g. deep convolutional neural networks).

  • The software provides efficient user interaction for image segmentation.

Abstract

Objective: Image segmentation is a crucial and fundamental step in many medical image analysis tasks, such as tumor measurement, surgery planning, disease diagnosis, etc. To ensure the quality of image segmentation, most of the current solutions require labor-intensive manual processes by tracing the boundaries of the objects. The workload increases tremendously for the case of three dimensional (3D) image with multiple objects to be segmented. Method: In this paper, we introduce our developed interactive image segmentation tool that provides efficient segmentation of multiple labels for both 2D and 3D medical images. The core segmentation method is based on a fast implementation of the fully connected conditional random field. The software also enables automatic recommendation of the next slice to be annotated in 3D, leading to a higher efficiency. Results: We have evaluated the tool on many 2D and 3D medical image modalities (e.g. CT, MRI, ultrasound, X-ray, etc.) and different objects of interest (abdominal organs, tumor, bones, etc.), in terms of segmentation accuracy, repeatability and computational time. Conclusion: In contrast to other interactive image segmentation tools, our software produces high quality image segmentation results without the requirement of parameter tuning for each application. Both the software and source code are freely available for research purpose1. 1Software and source code download: https://drive.google.com/file/d/1JIzWkT3M-X7jeB8tTwVcEw240TGbJAvj/view?usp=sharing.

Introduction

Image segmentation has been an active research topic for a few decades in the areas of computer vision, medical image analysis and etc. In our work, we focus on the application of image segmentation in medical image analysis tasks, such as tumor quantification, surgery planning and disease diagnosis. There are a large number of research papers published in the topic of medical image segmentation, which can be categorized based on different criteria. In this paper, from the user point of view, we classify different methods into manual, semi-automatic and fully automatic.

Many commercial and open-source software offer manual delineations for medical images, including line tracing, polynomial curve fitting, area painting, etc. ITK-Snap [1] is an open-source and widely used tool that is mainly dedicated to medical image segmentation. It offers polygon and paintbrush tools for flexible editing of both 2D and 3D images. Another widely acknowledged open-source tool is 3D Slicer [2]. It provides manual tools such as area painting, level tracing and scissors, which are normally used as post-processing to refine segmentation results using threshold or region growing. These manual tools offer good quality control but require tremendous time and effort from the user.

There are many classical fully automatic methods [3], such as Ostu's threshold, K-means clustering, etc. These automatic methods are normally application dependent, which can not work robustly unless the object of interest has a homogeneous image intensity and well distinguishable from the other image regions. Many of these methods nowadays are used as a pre-processing step of other more sophisticated methods, such as region of interest extraction for removing redundant information and initial labeling for weakly supervised machine learning methods. Recently, deep learning methods have achieved the state-of-the-art performance in many image segmentation tasks [4], which are fully automatic in inference time. However, they often require a large number of annotated images for supervised model training, which require either manual or semi-automatic image segmentation to generate the training dataset.

Semi-automatic methods take the advantage of automatic segmentation and allow users to intervene with the segmentation process. One type of user interaction is initialization, such as drawing seeds or bounding boxes inside or around the target object. Then the seeds or initial contour evolve to the desired objects boundary by region growing [5] or minimizing an energy function (e.g. active contour [6], level sets [7], etc.). These methods do not offer post-segmentation user interactions to further refine the results and the parameter settings are highly application dependent. Another type of user interaction is to iteratively improve the segmentation results by adding scribbles to different classes (e.g grow cut [8], graph cut [9], etc.). At each iteration, the method propagates these labels to the whole image by optimizing an energy function. This is more or less guaranteed to achieve a satisfactory result with reduced workload compared to a manual process, which is desirable for medical image segmentation.

Our proposed solution is based on user interactions using scribbles, and the image segmentation task is converted into a graph optimization problem using conditional random field. We introduce related work in the literature and highlight our contributions in Section 1.1 and Section 1.2 respectively. In Section 2, detailed methodology of our method is described. The parameter setting and graphical user interface are introduced in Section 3. Evaluation results are presented in Section 4, followed by discussion and conclusions in Section 5.

Our method is based on an interactive strategy that dynamically respond to the users annotations. Hence we focus on discussing related work in this category. There are many types of user annotations, such as point-based, contour-based, scribble-based and bounding box based methods. We adopt the scribble-based interaction approach due to its high flexibility and user-friendly nature.

In our implementation, the image to be segmented is considered as a graph. Users annotations serve as a prior knowledge to determine the likelihood of individual pixel belonging to each of the labeled classes. Together with this prior information, the pixel-wise similarity and label consistency are normally modeled by Markov random filed or conditional random field (CRF). The image segmentation task is then converted into an energy optimization problem over a graph structure. Although exact inference of such a structure is intractable, a lot of efforts have been made to develop approximation algorithms, including iterated conditional modes [10], belief propagation [11], max-flow/min-cut [12] and filter-based inference [13], in which filter based mean field inference and graph cut are the two most popular solutions.

Boykov et al. [9] proposed the two mostly used graph cut algorithms: αβ-swap and α-expansion. In αβ-swap, for a pair of labels α and β, it exchanges the labels between an arbitrary set of pixels labeled α and another arbitrary set labeled β. The algorithm generates a labeling such that there is no swap move that decreases energy in the predefined graph. The αβ-swap method works well for a binary graph (two-labels) but difficult to be extended to multi-class segmentation. Alternatively, α-expansion is suitable for a multi-class problem. It starts with any labeling and runs through all labels iteratively. For each label α, it computes an optimal α-expansion move and accepts the move if the energy decreases. The algorithm is terminated when there is no expansion move that decreases the energy. Graph cut method has been applied to interactive image segmentation by Rother et al. [14], called grab cut. In grab cut, users only need to draw a bounding box around the object of interest, the foreground object is then segmented using graph cut. The segmentation result can be further refined using additional scribbles. Grab cut works superbly with minimal user input, but it is limited to binary class segmentation and the computational speed is slow in 3D. Kohli et al. [15] extended the class of energy functions for which the optimal α-expansion and αβ-swap moves can be computed in polynomial time. However, the inference speed and memory usage is still inefficient comparing to the mean field inference method, especially when there are multiple labels in 3D images.

Many methods of mean field approximations in computer vision have been proposed, such as object class segmentation [16]. The mean field algorithm approximates the exact distribution P using a distribution Q calculated as a product of independent marginal by minimizing the KL divergence D(Q|P). Although the approximation of P as a fully factored distribution is likely to lose some information in the distribution, this approximation is computationally efficient. Krähenbühl et al. [13]. developed a filter-based method for performing fast fully connected CRF optimization, which is the core algorithm used in our work.

More recently, many image segmentation methods have been proposed based on deep convolutional neural networks (DCNN). Majority of the methods are fully automatic but require a large number of annotated labels for fully supervised learning. There are a few work that allow user interactions. DeepCut [17] utilizes patch based DCNN and CRF to segment objects from bounding boxes. It only works for a binary class problem. Similarly, ScribbleSup [18] trains DCNN to learn and perform multi-object segmentation from user annotated scribbles. Both methods do not allow dynamic user interactions to iteratively refine the result. Wang et al. [19] proposed an interactive image segmentation method based on DCNNs. In their method, an initial segmentation result is obtained using fully supervised DCNN model. Another DCNN model is applied to take user interactions for refining the initial result in a CRF structure. The implementation is currently limited to binary labels. The main drawback of these DCNN-based methods is the requirement of a training process using either weakly or fully labeled data.

In summary, the key issues with the current interactive segmentation solutions for medical images are three folds. (1) Many image segmentation tools in computer vision work well in 2D images, but not many of them are applicable to 3D medical images, where the key barrier is computational time and effectiveness of user interactions in 3D. (2) Many solutions only focus on binary image segmentation, while multiple organs are often required to be segmented in medical images. (3) Many generic interactive image segmentation methods often work well in natural images with rich regional textures, which may not be directly applicable to medical images where multiple labels need to be assigned to image regions that have similar intensities. For example, carpel bone segmentation in the wrist [20]. Moreover, different parameter settings are normally required for different images. We aim to produce a generic medical image segmentation tool that works for both 2D and 3D images without any prior information or training process, and allows efficient user interactions.

To achieve the aim and address the aforementioned issues, the key contributions of our work are summarized as follows. (1) A fast CRF solver based on Gaussian approximation is adopted to achieve fast 2D and 3D image segmentation for multiple labels (up to 10 labels in the current implementation and can be easily extended.). (2) The software is featured with an automatic slice recommendation function to suggest the best slice to annotate, resulting in greatly improved image segmentation efficiency in 3D. (3) We have optimally tuned the parameters in the algorithm, so that an one size for all setting is achieved, meaning no parameter adjustment is required for different medical image segmentation tasks. The developed tool has been evaluated on a variety of 2D and 3D medical image modalities and applications, in terms of segmentation accuracy, repeatability and computational time.

Section snippets

Methodology

The goal of image segmentation is to label every pixel in the image with one of several predetermined object categories. In this paper, it is formulated as maximum a posteriori (MAP) inference in a CRF, which is defined over pixels in an image. The object classes (categories) are defined interactively by the user using scribbles. The CRF is constructed by combining a smoothness term that maximizes the label agreement between similar pixels and the likelihood of each pixel belonging to each of

Parameter settings

The meaning and values for all the hyper parameters described in Section 2 are listed in Table 1. The parameter values were determined by evaluating the software on various 2D and 3D images described in Section 4.1. In the context of medical image segmentation, the parameters were optimized in favor of responding to the users annotations to improve the segmentation accuracy rather than minimizing the number of users interactions. Hence we used relatively smaller values for σ2 and λ in Eq. (5)

Material

Combined (CT-MR) Healthy Abdominal Organ Segmentation dataset (CHAOS) [24]: The CHAOS challenge aims to segment liver, kidneys and spleen in the abdominal region in CT and MRI data. The manual annotation process was produced by two teams, both of which include a radiology expert and an experienced medical image processing scientist. After the manual annotation of both teams, a third radiology expert and another medical imaging scientist analyzed the labels, which were fine-tuned according to

Discussion and conclusions

In this paper, we have proposed to use an efficient fully connected CRF optimizer to achieve multi-class 2D and 3D medical image segmentation. Based on the CRF optimizer, we have developed an interactive image segmentation tool for medical image analysis. Our tool does not require parameter tuning for different image modalities and dimensions. It is also featured with a slice recommendation function to achieve efficient user interactions for 3D images. The method has been comprehensively

Declaration of Competing Interest

None.

References (35)

  • T. Chan et al.

    An active contour model without edges

    International Conference on Scale-Space Theories in Computer Vision

    (1999)
  • V. Vezhnevets et al.

    Growcut: Interactive multi-label nd image segmentation by cellular automata

    proc. of Graphicon

    (2005)
  • Y. Boykov et al.

    Fast approximate energy minimization via graph cuts

    IEEE Trans Pattern Anal Mach Intell

    (2001)
  • J. Besag

    On the statistical analysis of dirty pictures

    Journal of the Royal Statistical Society: Series B (Methodological)

    (1986)
  • J.S. Yedidia et al.

    Constructing free-energy approximations and generalized belief propagation algorithms

    IEEE Trans. Inf. Theory

    (2005)
  • Y. Boykov et al.

    An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision

    IEEE Trans Pattern Anal Mach Intell

    (2004)
  • P. Krähenbühl et al.

    Efficient inference in fully connected crfs with gaussian edge potentials

    Adv Neural Inf Process Syst

    (2011)
  • Cited by (8)

    View all citing articles on Scopus
    View full text