Keywords

1 Introduction

In 2019, 1,762,450 new cancer cases and 606,880 cancer deaths are projected to occur in the United States. For Hodgkin Lymphoma alone, 8,110 people are predicted to be diagnosed in 2019 and 1,000 people are likely to die [17].

Radiation therapy has a dominant role in cancer treatment and has always been a major part of the effort to cure cancer patients. The main goal of radiotherapy is to deliver a prescribed dose to the target volume, while sparing normal tissue [1]. Since radiotherapy is a personalised and localised treatment, the definition of tumour and target volumes is vital to its successful execution [3]. Contouring these regions is, however, a time consuming part of radiotherapy treatment planning [9] since in current clinical practice, this important task is typically performed visually on a slice-by-slice basis with very limited support of automated segmentation tools.

Computed Tomography (CT) is normally used as the basis for radiotherapy for two main reasons: (1), it can be used to improve the accuracy of dosimetry calculations, since it contains density information, allowing to calculate treatment beam, and (2) can be used to locate the patient with respect to the treatment machine, being more reliable in representing the shape and position compared with other image modalities [15]. Moreover, unlike other procedures, the patient can be scanned in the treatment position, which is an advantage [15].

We have thus developed an image processing pipeline to perform 3D segmentation of the lungs using CT information. The main contributions of the present work include:

  • a new image processing method to identify volumes of interest for each half of the body containing the right or the left lung (Sect. 3.1);

  • three new techniques to place a seed inside the lung (Sect. 3.2);

  • a new, iterative, 3D region growing algorithm that automatically determines the threshold (Sect. 3.3);

  • extensive evaluation of the results on a database with 132 lungs (Sect. 5).

The methods presented are simple, fast and do not need a training phase. Also, the parameter setting is made based on expert knowledge of the problem at hand. The three Seed Placement methods, combined with the Iterative Region Growing algorithm achieved, respectively, 74%, 74% and 92% of DICE and 72%, 72% and 90% of True Positive Rate.

This document is organised as follows. Section 2 presents the state of the art on lung segmentation in CT. Section 3 describes the here proposed method in all its components: separation of right and left body volumes in Sect. 3.1; three methods for seed placement in Sect. 3.2; and the new, iterative, region growing technique in Sect. 3.3. The experimental design is outlined in Sect. 4, while results are given in Sect. 5. The document finishes in Sect. 6 with some conclusions and directions for future work.

2 Related Work

A search was performed for works on lung segmentation in CT volumes published in the past 5 years. Most of the works [2, 7, 11, 20] use classifiers in their pipelines. This has the disadvantage of the need of a train phase and consequently, a (large) number of cases for the algorithms to learn with. Some other studies [6, 13, 21] make use of active contour techniques. These type of models typically need an arbitrary parameterization of the curves, thus, losing the opportunity to effectively use information present in the geometry of objects [4]. Other proposals include complex techniques such as Markov-Gibbs random field [18]. We believe that, with the increased use of digital imaging, and with its inherent higher quality, simpler approaches could be more adequate [5]. In the papers [8, 12], segmentation is performed in 2D and 3D connectivity is performed afterwards. This may lead to “jagged” and inconsistent final results.

Perhaps the most similar work to the one here proposed is the one presented in [14], where 3D Region Growing is also used. Their work, however, uses “of the shelf tools” belonging to ITK. Moreover, the experimental results are obtained on a database of only 30 full Chest CT exams.

As can be seen, several different algorithms have already been proposed. They are characterised by their complexity, high running time or segmentation in 2D, and need of large training database. This complexity is, in some cases, justified by the application on diseased lungs. No application on radiotherapy planning was found. Here, a new, tridimensional, simple method, based solely on image processing, is proposed. Not recurring to classification techniques, makes our proposal simpler, faster and there is no need of a large training set, requiring only some basic anatomic knowledge.

3 Lung Segmentation

The proposed algorithm is composed of three main blocks, as shown in Fig. 1. In “Laterality separation”, a copy is made of the initial volume, with each new volume containing only one of the lungs. In “Seed definition”, the initialisation of the following segmentation method is automatically determined. Three different methodologies are presented for this step. Lastly, in “Segmentation”, the lung volumes are identified. A new, iterative, region-growing-based technique is proposed for this step.

Fig. 1.
figure 1

Lung segmentation pipeline.

3.1 Separation Between Right and Left Lungs

Gray values are first transformed to Hounsfield Units (HU), a measure of radiodensity. Rescale Slope and Intercept, needed for this transformation, were retrieved from the DICOM header. The idea is then to threshold the volume in a way that only the lungs are present. Literature, however, is slightly discordant on the HU values of the lung. For instance, in [8], the interval \([-700, -400]\) is given, in [14], the interval \([-1000, -500]\) is used, while in [21], the interval \([-1000, -400]\) is mentioned. For this part of the work, the interval \([-800, -500]\) showed to produce good results, given that the method is quite robust to this selection. In this way, a mask \(M_{HU}\) is created with zeros except in the voxels for which its values belongs to the interval HU, there being one. A sum projection of this mask is then made, creating a “cumulative transverse plane”, as shown in the left part of Fig. 2. A sum projection of this plane is then done, creating a line profile, as shown in the right part of Fig. 2. A search for a local minimum gives the output of this part of the algorithm. Two volumes can now be created, by zeroing all the values to the left (or the right) of the found local minimum.

Fig. 2.
figure 2

Separation between right and left lungs. Left, cumulative transverse plane with separation line superimposed in dashed green; Right, cumulative profile, with local minimum as a green star and the separation line in dashed green. (Color figure online)

3.2 Placement of the Seed

Three different techniques are here proposed for the choice of a seed. These are described next.

Method 1. This technique uses anatomic and image acquisition knowledge and starts by placing the seed in the position \([\frac{1}{3}, \frac{2}{3}]\) for the right lung and \([\frac{2}{3},\frac{2}{3}]\) for the left lung, of the central coronal plane. A vertical search is then performed until an intensity corresponding to the theoretical value of the HU of the lungs, that is, a value in the range \([-800, -500]\) is found. When the seed is not found, it is added a tolerance to this interval, iteratively, until the search detects a value in the range \([-800-tolerance, -500+tolerance]\). An illustration of this method is given on the left side of Fig. 3.

Method 2. In this method, the seed initialisation is made as in the previous method. The closest point to each seed in the mask \(M_{HU}\) is then selected as the new seed. An illustration of this method is given on the right side of Fig. 3.

Fig. 3.
figure 3

Seed location by Methods 1 (left) and 2 (right). The first method consists on the search of an intensity value in the interval \([-800-tolerance, -500+tolerance]\) on the vertical axis, whereas the second method searches for the voxel in the interval that corresponds to the minimal distance to the initial seed.

Fig. 4.
figure 4

Seed location by Method 3 (illustration for one of the lungs, only). See text for a more detailed explanation. (Color figure online)

Method 3. The third method follows from the reasoning presented in Sect. 3.1. The z position of each seed (coronal plane) is given as the local maximum closest to the position chosen to separate the lungs (see right part of Fig. 2). Having this coordinate, the sagittal plane of \(M_{HU}\) can be retrieved for each lung, top-left plot of Fig. 4. Local maximum of the sum projection is chosen as the x coordinate of the seed, as shown in the right plot of Fig. 4. For the y coordinates, the biggest connected component of the sagittal plane of \(M_{HU}\) is first selected (yellow region in top-left plot of Fig. 4), the sum profile computed, and the local maximum position is determined, bottom plot of Fig. 4.

3.3 Iterative Region Growing

Traditional region growing algorithm (Algorithm 1) starts with a given seed and adds a neighbour to the segmented region if the difference between the value of the neighbour pixel and the average value of the pixels already in the region does not exceed a threshold. This threshold needs to be manually set, case by case, as it often depends not only on the problem, but also on the image in question.

figure a

We propose to automatically and iteratively update the value of the tolerance threshold, Th, as demonstrated in the Algorithm 2. In this version, we have now three parameters, the Tolerance Threshold Initialisation \(Th_0\), Maximum Area Threshold \(Th_H\), and Minimum Area Threshold \(Th_L\), which may, at first seem worse than the previous version with one parameter only. We note, however, that Algorithm 1 is extremely sensitive to the value of Th. In the iterative version, \(Th_H\) and \(Th_L\) can be set based on previous knowledge of the problem, in this case, biomedical knowledge on maximum and minimum lung volumes. Th is first initialised (\(Th_0\)), and is then iteratively adjusted accordingly to the volume resulting from the segmentation algorithm. If this volume is superior/inferior to \(Th_H\)/\(Th_L\), then Th is be increased/decreased, respectively. In this way, the algorithm is very robust to the initial value of \(Th_0\).

figure b

Before the application of the Region Growing algorithm, voxels have been resampled to an isomorphic resolution of [5, 5, 5] millimetres to remove variance in scanner resolution. Slice thickness and pixel spacing information present in the DICOM headers is used for this transformation. The values of the parameters were set as follows: \(Th_0=225\) HU, \(Th_L=3000\) voxels (375 mL), and \(Th_H=40000\) voxels (5000 mL).

4 Experimental Methodology

Provided by Institute of Oncology of Porto (IPO), the private dataset of patients with Hodgkin Lymphoma used in this research work includes CT volumes used for radiotherapy planning, acquired after the frontline chemotherapy treatment and the corresponding ground truth contours delimited by experts. CT volumes were acquired with a pixel spacing of 1.0, 1.1, 1.2 or 1.3 mm and a difference of patient position between adjacent slices of 2.0, 2.5 or 5.0 mm, all in the DICOM format [16]. Information has been collected for a total of 69 patients (both adults and infants). However, the ground truth is only available for 132 lungs.

Four metrics were chosen, in the present work, to evaluate the results, Dice coefficient, Jacquard index, True Positive rate, and Volumetric Similarity. While the first three are overlap based, the last one is volume based [19]. These metrics were chosen due to their complementarity. Dice coefficient and Jacquard index are suitable when in the presence of outliers; True Positive rate for when recall is important; and Volumetric Similarity is appropriate both in scenarios with outliers and when the volume is important [19].

All of these metrics can be derived from the four basic cardinalities of the confusion matrix:

  • TP: Voxels correctly considered to belong to the lung

  • FP: Voxels incorrectly considered to belong to the lung

  • TN: Voxels correctly considered not to belong to the lung

  • FN: Voxels incorrectly considered not to belong to the lung

The Dice coefficient (DICE), also called the overlap index, is the most frequently used metric. It can be defined as:

$$\begin{aligned} DICE = \frac{2TP}{2TP+FP+FN} \end{aligned}$$
(1)

The Jaccard index (JAC) is defined as the intersection divided by the union:

$$\begin{aligned} JAC = \frac{TP}{TP+FP+FN} \end{aligned}$$
(2)

True Positive rate (TPr), also called Sensitivity or Recall, measures the portion of positive voxels in the ground truth that are also identified as positive by the segmentation being evaluated:

$$\begin{aligned} TPr = \frac{TP}{TP+FN} \end{aligned}$$
(3)

This metric is sensible to segments size, and it penalises errors in small segments more than in large segments. Volumetric similarity (VS) is a measure that considers the volumes of the segments to indicate similarity:

$$\begin{aligned} VS =1 - \frac{|FN-FP|}{2TP++FP+FN} \end{aligned}$$
(4)

5 Results and Discussion

Seed location performance was evaluated by checking if the automatic seed falls within the ground truth mask and by its distance to the centroid of the ground truth masksFootnote 1. It can be seen, from Table 1, that for method 3 all of the seeds fall inside the lung region. Moreover, they are significantly closer to the centroid of the ground truth mask, when compared with the seeds retrieved by the other techniques.

Table 1. Seed placement performance. Best results in bold and signalled with “*” if statistically significant, according to the paired-sample t-test at the \(5\%\) level.

Performance of the segmentation is shown in Table 2 for each automatic seed finding method and also considering as seed the centroid of the ground truth mask. As a baseline segmentation technique, a HU threshold was considered, by retrieving the biggest connected component of \(M_{HU}\) (yellow region on Fig. 4).

Table 2. Segmentation performance. Best results in bold and signalled with “*” if statistically significant, according to the paired-sample t-test at the 5% level.

It is clear, from Table 2, that the proposed Iterative Region Growing outperforms the Standard Region Growing. Moreover, the seeds returned by method 3 originate the best segmentation results, as to be expected from the results in Table 1. In fact, results for Iterative Region Growing using the seeds returned by Method 3 are statistically significant, according to the paired-sample t-test at the \(5\%\) level, than all the other results, except the ones for Iterative Region Growing using as seed the centroid of the ground truth mask and only for DICE.

We were also interested in studying the performance for solely the cases where the seeds were placed inside the ground truth lungs. From Table 3, it can be observed that when it does provide a valid seed, method 1 generates seeds that lead to better segmentations according to DICE, JAC and TPr metrics. When looking at VS, method 3 is still the best. This leads us to the believe that the design of an algorithm that incorporates the ideas behind method 1 and method 3 is a possible future direction.When looking at the segmentation algorithms, differences between iterative and non-iterative versions are not significative according to the two-sample t-test at the \(5\%\) level, but they are significantly better according to the same test, when compared with HU threshold.

Table 3. Segmentation performance, considering valid seeds only. Best results are presented in bold.

When the seed is placed inside the lung in a region with HU on the interval \([-800,-500]\), the segmentation is robust to the seed location. For example, in the case of one of the patients of our dataset, for method 1 the seed is located near the frontier of the lung, while with method 3 it is located in the centre, but both have very similar results on the segmentation metrics (see Fig. 5).

Fig. 5.
figure 5

Example of the results of lung segmentation for method 1 (top) and 3 (bottom). Ground truth on the far left; CT with superimposed ground truth contour (red) and seeds (blue) on the middle left; segmentation results on the middle right; and representation of the four cardinalities of the confusion matrix on the far right. (Color figure online)

A fair comparison with the state of the art results is not possible due to differences in the used databases. We stress, for instance, that our database is composed of images from both adults and infants, increasing the segmentation difficulty. We note, however, that our method, besides having the advantages, when compared with the other state of the art techniques already described in Sect. 2, such as its simplicity, low running time, no need of a training phase, etc, achieves similar performances, even surpassing some of the recently proposed algorithms such as the ones presented in [2, 21].

6 Conclusions and Future Work

Segmentation is a problem present in several domains [4, 5]. Here, a new, volumetric technique for lung segmentation in CT is presented and evaluated. This techniques are simple, fast, and does not need a training phase. Moreover, all the parameter setting is made based on expert knowledge of the problem at hand.

In this way, this completely automated method for lung segmentation may help to reduce the time dispensed by the clinicians when performing a manual analysis of the CT scan, assisting them in making better decisions when selecting the better treatment choice for the patient and/or evaluation of the effectiveness of the received treatment.

In the present work, three intensity-based seed finding methods were tested and an improvement of the typical Region Growing method was proposed. Seed finding methods achieved 84%, 84% and 96% of Volume Similarity for methods 1, 2, and 3 with Iterative Region Growing, respectively. If only the valid seeds are accounted for, the Iterative Region Growing method achieved results of 95%, 94% and 96%, respectively.

A possible improvement to the presented algorithm is to apply morphology to the results in order to close possible holes that might be present in the final segmentation. Another interesting approach would be to use this technique with a database containing lung abnormalities, such as pleural effusions, consolidations, and masses, since the current image segmentation approaches apply well only if the lungs exhibit minimal or no pathological conditions [10].

In the future, we intend to work on the segmentation of other organs, such as the esophagus. To develop iterative tools for better visualisation and manual adjust of the results is another line of interest.