Keywords

1 Introduction

Identification of the nucleus characteristics is the most important study of a pathologist in the identification of any disease. The problem in the manual assessment of histopathology images includes inter-observer variability for challenging tissue samples with tenuous visual features. Moreover due to lack of adequate patient to doctor ratio the whole slide scanning period for an individual clinical expert is time-consuming. There are a significant number of nucleus present in a slide and inspecting them carefully is a tedious job for the clinicians where patient throughput is tremendously high. Computational pathology [1] with computer vision methods has gained momentum in the recent past to alleviate the conventional approaches [2,3,4,5]. The prime focus in digital pathology applications is to closely identify the nucleus area and its morphological characteristics. Therefore a principal component in computational pathology is segmentation [4, 6,7,8] of the nucleus eliminating the background from the whole slide images. Thus extracting the nucleus more accurately contributes significantly to medical software. The goal of this work is to segment nuclei relating to childhood medulloblastoma microscopic images and contribute towards computational pathology research. We have collected images from real-time patient data and annotated the ground truth, under the guidance of a clinical expert. We studied the result of various conventional segmentation method on our data. Most of the work on medulloblastoma are texture based and no work on medulloblastoma has been carried out till date on the identification of the nuclei from the tissue samples, which is a vital part of the diagnosis. The segmentation of tissue images is a lot more challenging than cytological images due to the high presence of debris, hemorrhages and nuclei diffusion. Also, hyperchromasia and overlapped nuclei are always seen. Machine learning in the past has shown conceivable results in challenging datasets [9,10,11]. The serious problem in medical research is the availability of data. We did not have a benchmark medulloblastoma data set available and so we address the problem for future researchers by creating our own data set with painstakingly marked ground truth nuclei in the samples.

2 Background

Literature reveals that work on medulloblastoma is very recent and the World Health Organization [12] (W.H.O) has characterized it into grade IV malignant tumors only in 2002. Computer-aided study for such tumor has been seen only from 2012 to the best of our knowledge. All the research [13,14,15,16] works are performed by St. Jude’s Children Hospital, USA. However, no work on cell segmentation has been attempted in the past years. Further, the studies carried out were based on the textural feature with the help of which it was attempted to categorize medulloblastoma into anaplastic and non-anaplastic subtypes. But, according to W.H.O, it can be classified into four different subgroups, based on its severity and cell characteristics. Prognosis and other health management depend largely on the subtype the tumour is classified as. Hence, for a cell-based study, the most vital role is that of automated cell segmentation from the tissue samples, which we plan to address in this paper. Previously classification of childhood medulloblastoma into its W.H.O. subtypes have been achieved [17] based on manually segmented cells. Complete automation would be achieved by automated cell extraction.

3 Methods and Materials

3.1 Study Region

The study region is North East India, Guwahati city in particular.

3.2 Data Collection

The tissue blocks were collected from Guwahati Medical College and Hospital (GMCH), Guwahati from the Neurosurgery Department. Next, the blocks were stained with H&E at Ayursundra Pvt. Ltd and finally images were captured under clinical supervision at Guwahati Neurological Research Center (GNRC), Dispur. The images were captured at 100× microscopic resolution and stored in jpeg format. A total of 94 images were collected from 15 slides. 18 normal cell images were also captured from the slides, where present in the sample. Ground truth, for identifying the cells in the tissue samples, were meticulously marked by us in red using MS paint and was later verified under expert supervision. For the particular study, we have marked 37 images which include both properly and poorly stained whole slides. From these images, a total of 1272 cells were identified and marked. Few images are shown in Fig. 1. The ground truth includes both normal and abnormal cells including the various subtypes. The marked ground truth was later segmented using Kmeans color segmentation method for evaluation purpose.

Fig. 1.
figure 1

Figure showing microscopic images collected at 100x magnification.

3.3 Ethical Statement

This study was a part of a joint project undertaken by the Institute of Advanced Study in Science and Technology (IASST) and GMCH. Permission for the same was granted from ethical bodies of both the institutions [IASST: Registration number ECR/248/Indt/AS/2015 of Rule 122DD, Drugs and Cosmetics Rule, 1945 of India; GMCH:MC/190/2007/pt-1/E-C/32 dated 30.5.2017].

3.4 Inclusion/Exclusion Criteria

The samples were taken from children less than the age of 15 years and only cases confirmed by the biopsy were included for the study. The cases that do not fall under MB after the clinical findings, summation of history and pathologic features were not considered in the study.

3.5 Overview of the Work

In our work, we studied 8 segmentation techniques that include color based segmentation, region-based segmentation, segmentation based on clustering technique, local and global segmentation methods. The model is depicted in Fig. 2. The algorithms for this purpose were Adaptive segmentation [18, 19], HSV color segmentation [20], YCbCr color segmentation, Ostu segmentation [21], Fuzzy [22, 23] modelling based segmentation, Watershed segmentation [24, 25], and Kmeans [26] clustering and entropy [27, 28] filter segmentation. The HSV and YCbCr were color-based segmentation method where the images were first converted to the respective color channels and then global thresholding was applied on the pixel intensity values. The thresholding value for color-based segmentation was static and therefore proper smooth segmentation of the cells were not possible as different images had different color intensity histograms. We then used the adaptive and Ostu segmentation which was based on finding an iterative best threshold value for cell segmentation. This method gave a better result than our previous attempt but had a high segmentation of overlapped nuclei. We then tried the traditional watershed segmentation for minimizing the segmentation output of overlapped cells, but the watershed segmentation underperformed due to inefficiency to choose an appropriate local minima and maxima regions in the image. Finally, Kmeans and Fuzzy clustering were used for segmentation, which performed higher than the previous attempts. The Kmeans segmentation grouped the whole image into clusters of different colors based on a similarity value. The initial cluster value used was 3. Before applying the Kmeans clustering method the RGB images were first converted to their respective LUV color channel. All the segmentation methods applied is based on hard partition, where each point belong to any one cluster, while the fuzzy method is a soft partition method, where a single data point can belong to two different clusters based on its membership value and the total sum of all membership points is 1. The fuzzy clustering method gave us a better output than most of the other methods. However the most appropriate was entropy-based segmentation for our images. The segmentation module for all algorithms was followed by same morphological operations of opening using a disk structuring element of radius 10 pixels in length for noise removal. Next, the objects that had less than 80 pixels were empirically identified as noise and were removed from consideration. Later, the performance evaluation among the segmentation modules and the ground truth were done using Dice and Jaccard coefficient metric. Mathematically, it is given as

$$ Dice\;coefficient = (2*TP) / \left( {2*TP + FP + FN} \right) $$
(1)

The Jaccard metrics is given by

$$ Jaccard\;coefficient = TP / \left( {TP + FP + FN} \right) $$
(2)

where TP = True positive values where both the segmented image and the ground truth image has pixel intensity value 1, FP = False positive where the segmented image has pixel intensity value 1 but ground truth has pixel intensity value 0 for the same region, FN = False negative where segmented image has intensity value 0 but ground truth has pixel intensity value 1.

Fig. 2.
figure 2

Model of the work.

4 Results

This section presents the result of our experimentation with various segmentation techniques with the ground truth segmented data. An image of annotated and ground-truth data is displayed in Fig. 3. Next, various segmentation techniques, as detailed above, are depicted in Fig. 4. A total of 37 images were tested for all the segmentation methods and Dice (using Eq. 1) and Jaccard coefficient (using Eq. 2) for each image was calculated based on the ground truth information. The individual image scores for 25 samples out of 37 are given in Tables 1 and 2. The average scores of the various segmentation algorithms is shown using bar graph in Fig. 5(a) and (b). Boxplot (Fig. 6) is an efficient and simple technique to represent graphically a set of data which belong to a same variable. It displays not only the location of the data values but also variation and are especially useful for showing comparisons. The performance of the segmentation methods were in the order HSV<Watershed<YCbCr<Kmeans<Adaptive<Ostu<Fuzzy<Entropy. The highest Dice and Jaccard coefficient that we have got is 79.2% (Table 1) and 66.1% (Table 2) using entropy-based segmentation method. From Fig. 6 it is clear that Entropy, Fuzzy and Ostu have the highest median, almost equal to each other. Further among these three Entropy median is located more in the middle of the 1st and 3rd quartile indicating that the scores are more evenly distributed. It also has the shortest distance between the 1st and 3rd quartile indicating higher consistency of scores and smaller whiskers indicating lower variation in the scores.

Fig. 3.
figure 3

Figure showing K-means segmentation of the ground truth annotation. Original Image (top left), Ground truth marking (top right), K-means boundary detection (bottom left), segmentation (bottom right)

Fig. 4.
figure 4

Segmentation output of the different algorithms.

Table 1. Dice coefficient of various segmentation methods used.
Table 2. Jaccard coefficient of various segmentation methods used.
Fig. 5.
figure 5

Figure showing the comparative (a) Dice coefficient and (b) Jaccard coefficient of various segmentation techniques with ground truth.

Fig. 6.
figure 6

Figure showing Boxplot for Dice and Jaccard coefficient values.

5 Discussions

It is seen that segmentation of histopathological tissue samples is a lot more difficult [29] then segmentation of cytological images. High cellularity, presence of high debris, nuclear diffusion, overlapping nucleus are some of the reasons for such difficulty. Moreover the computational pathology for medical data is highly data dependent and lack of benchmark data is a great hindrance for such work to carry out. Data variation is seen in particular due to difference in preparation of slides. For our dataset we have obtained a better performance for entropy segmentation followed by fuzzy modelling, Ostu thresholding, Adaptive segmentation, Kmeans color segmentation, YCbCr color segmentation, watershed and HSV. This study can be seen as a pathway to the introduction of digital MB cell analysis. The module could be integrated with medical image analysis for digital report generation and also may be extended for other tissue samples.