Patch-based system for Classification of Breast Histology images using deep learning

https://doi.org/10.1016/j.compmedimag.2018.11.003Get rights and content

Highlights

  • In this work, we have developed a patch-based classifier (PBC) using the convolutional neural network (CNN) for automated classification of breast cancer histopathology images into 4-different histology class namely normal, benign, in situ and invasive carcinoma.

  • The developed patch-based classifier (PBC) uses an optimal architecture of a convolutional neural network (CNN), for automated classification of breast cancer histopathology images.

  • The proposed classification system works in two different modes: one patch in one decision (OPOD) and all patches in one decision (APOD). The patch labels are predicted by OPOD mode, and the result is obtained unanimously whereas in the APOD mode class label of the image is obtained by a majority voting scheme.

  • To verify the classification ability of the proposed system, the breast histopathological images are classified into 2 classes (non-malignant and malignant) as well as 4 classes (normal, benign, in situ and invasive carcinoma) while most of the existing methods classify the same broadly into 2 classes.

  • We have also explored the potentiality of our proposed model in classifying the images in the test dataset obtained by splitting the training set as well as the actual hidden test dataset of ICIAR-2018 breast cancer histology image dataset.

  • Our model achieves an accuracy of 87% in classifying the images of ICIAR-2018 hidden test dataset.

Abstract

In this work, we proposed a patch-based classifier (PBC) using Convolutional neural network (CNN) for automatic classification of histopathological breast images. Presence of limited images necessitated extraction of patches and augmentation to boost the number of training samples. Thus patches of suitable sizes carrying crucial diagnostic information were extracted from the original images. The proposed classification system works in two different modes: one patch in one decision (OPOD) and all patches in one decision (APOD). The proposed PBC first predicts the class label of each patch by OPOD mode. If that class label is the same for all the extracted patches and that is the class label of that image, then the output is considered as correct classification. In another mode that is APOD, the class label of each extracted patch is extracted as done in OPOD and a majority voting scheme takes the final decision about class label of the image. We have used ICIAR 2018 breast histology image dataset for this work which comprises of 4 different classes namely normal, benign, in situ and invasive carcinoma. Experimental results show that our proposed OPOD mode achieved a patch-wise classification accuracy of 77.4% for 4 and 84.7% for 2 histopathological classes respectively on the test set obtained by splitting the training dataset. Also, our proposed APOD technique achieved image-wise classification accuracy of 90% for 4-class and 92.5% for 2-class classification respectively on the split test set. Further, we have achieved accuracy of 87% on the hidden test dataset of ICIAR-2018.

Introduction

Breast cancer is one of the leading cause (Rangayyan et al., 2007) of cancer in women worldwide. According to a report published by WHO (World Health Organization, 2018) in 2013, nearly five lakh women lost their lives due to this deadly disease worldwide in 2011. In India, too, the number of breast cancer incidences are rising at an alarming rate. It has now become the most common cancer among women in most cities in India and the second most common disease in rural areas. In addition, most females diagnosed with breast cancer are in the younger age group (25-40 years). The risk of breast cancer (Surakasula et al., 2014) increases alarmingly until menopause then it decreases gradually. Breast cancer diagnosis consists of a series of steps. Whenever a lump or nodule is discovered in a breast during clinical examination, various screening tests like mammography (Behrens et al., 2007) or ultrasound is performed to detect changes in the breast. These screening tests are followed by a biopsy to make a definite diagnosis and detect any malignant growth in the breast tissue. Biopsies enable a doctor to analyze the microscopic structure of the tissue and hence differentiate between normal, benign or malignant lesions and accordingly, performs successive prognosis. Cancer might become fatal if not detected early. However, early detection of this deadly disease can decrease the mortality rate since more treatment options become available when discovered in the early stage. The traditional method of inspecting the biopsy slides under the microscope is laid on the shoulders of the pathologists. However, this manual inspection technique is time-consuming and is dependent on the expertise of the pathologists. Thus developing an automated system for breast cancer detection from the breast histology images is the need of the hour. Carcinomas can be divided into two classes namely in-situ and invasive carcinomas. An in-situ carcinoma is one in which the malignant growth is restricted to the tissues in which they have occurred and have not spread to the surrounding tissues. In contrary, an invasive carcinoma is one in which the malignant growth has spread to surrounding areas from their point of occurrence. The primary task in developing an automated system for breast cancer detection is to classify the breast histology images into four class namely normal, benign, in situ or invasive carcinomas. In this paper, we have reviewed some of the recent state-of-the-art techniques for automated breast histology classification and have developed a patch-based classifier (PBC) using deep learning approach for automated classification of breast histology images.

As mentioned in the previous section, breast cancer is one of the deadliest diseases amongst women worldwide, and the traditional method of microscopic inspection is highly time-consuming and prone to manual errors. This motivated us to develop an automated system for classification of microscopic breast histopathological images. Most of the reported literature for this work use handcrafted features for breast cancer classification. However, deep learning approaches eliminate the need for extracting handcrafted features. Thus, in this work, we have developed a patch-based classifier (PBC) using the convolutional neural network (CNN) for automated breast histology image classification. The details of the work are mentioned in section 2.

This section describes some of the standard state-of-the-art methodologies for breast cancer detection from histopathological images. The state-of-the-art can be broadly categorized as either handcrafted feature-based approach or deep learning based approach using the convolutional neural network (CNN).

The handcrafted features (He et al., 2012) used by most researchers are mostly thresholding-based, clustering-based, active contour-based, watershed-based, graph-cut, etc. The handcrafted features mainly aim at segmenting the nuclei from the entire breast cancer (BCa) histopathology slide images. Distinguishing features are extracted from the segmented nuclei to differentiate between malignant and benign slides. In (Veta et al., 2013), the fast radial symmetry-based approach followed by marker-controlled watershed segmentation was used for nuclei extraction from breast cancer histopathology images. In their work, 39 biopsy slide images were acquired from 38 different patients. In another work (Jain et al., 2014), Chan Vese (CV) model based active contour technique was implemented to segment the cells from the background in breast histopathological images. Morphological features were extracted from these segmented cells which were used for classification of the cells as either normal or cancerous. In (Basavanhally et al., 2013), geodesic based active contour model was used for segmenting nuclei from BCa histopathological images. Both architectural and textural features of the nuclei were considered. Graph-based features (architectural feature) and 13 Haralick features (texture features) were extracted from the segmented nuclei which were used for classification and developing an automated system for detecting the Modified Bloom–Richardson (mBR) (Breast cancer and breast pathology, 2018) grade of different histopathological slides. In their work, breast histopathological images collected from 126 different patients were considered. In (Khan et al., 2013), an automated system for segmenting tumor cells in the BCa histopathological images by segmenting the image into hypocellular and hypercellular stroma regions using magnitude and phase spectra in the frequency domain was proposed. They have worked on MITOS dataset (MITOS Dataset, 2018) which consist of 35 breast histopathological images collected from 5 different patients. In (Roullier et al., 2016), graph-based segmentation has been used to extract the mitotic nuclei from the BCa histopathological whole slide images (WSI). In (Kaymak et al., 2017), an artificial neural network (ANN) based approach has been used for automatic breast histology classification.

In recent years, Convolutional Neural Networks (CNNs) has gained immense importance for breast histopathological image classification. CNNs have huge advantages over the handcrafted feature extraction techniques since CNNs extract features automatically from the image patches and the results obtained are comparable with those obtained from traditional feature extraction techniques. Spanhol et al. (Fabio Alexandre Spanhol et al., 2016) have worked on BreaKHis dataset which consists of microscopic histopathological breast images captured at different magnifications. They have developed a CNN model to classify the images as either benign or malignant. The authors reported that the accuracy of the system decreased with increase in magnification since at higher magnification their CNN architecture failed to extract useful features. In another work (Cruz-Roa et al., 2014), a CNN model was proposed for the automatic classification of invasive ductal carcinoma in whole slide images (WSI) and hence to differentiate between the invasive and non-invasive images. Both (Fabio Alexandre Spanhol et al., 2016) and (Cruz-Roa et al., 2014) is a 2-class classification problem where the classes are either benign/ malignant or invasive/ non-invasive. In (Wahab et al., 2017), authors have proposed a CNN model for separating the mitotic and non-mitotic nuclei from breast histopathological images. In (Vang et al., 2018), the authors have used the pre-trained Inception-V3 model (Szegedy et al., 2016) for 4 class classification of breast cancer histopathology image with some post-processing techniques. The Inception-V3 model is a pre-trained model that was developed for the classification of the images in the ImageNet database into 1000 different image classes.

However, the field of deep learning has been very less explored in the field of breast cancer histopathological image classification. The few state-of-the-art that exists performs 2 class classification that is a classification of the histopathological images into two histological classes namely normal and malignant. However, none of the reported state-of-the-art separates the benign ones from the normal ones. In addition, malignancy is also of two types in-situ (cancer cells are limited to the regions in which they have occurred) and invasive (cancer cells have spread to the surrounding tissues from their point of occurrence). Thus to develop a fully automated system for histopathological image classification, all the different categories should be considered. Further, deep learning models eliminate the need for extracting handcrafted features for performing automatic classification and outperforms the results obtained with handcrafted features in most cases. The performance of automated systems for classification using handcrafted features is mainly dependent on nuclei segmentation step as in (Veta et al., 2013), (Jain et al., 2014), (Basavanhally et al., 2013). But the performance of deep learning approaches is not limited by the classification results of nuclei segmentation step since training and classification using deep learning is based on the direct processing of image regions. This motivated the authors in this paper to develop a fully automatic system for 4-class (normal, benign, in situ and invasive carcinoma) and 2-class breast cancer histopathological image classification using deep learning approaches.

In this work, we have proposed a patch-based classifier (PBC) using CNN to classify breast histopathological images into four as well as two histopathological classes. The details of the classes and the CNN architecture deployed for this work is mentioned in section 2.5.

The main contribution of this paper can be summarized as follows-

  • In this work, we have developed a patch-based classifier (PBC) which uses an optimal architecture of a convolutional neural network (CNN), for automated classification of breast cancer histopathology images.

  • The proposed classification system works in two different modes: one patch in one decision (OPOD) and all patches in one decision (APOD). The patch labels are predicted by OPOD mode, and the result is obtained unanimously whereas in the APOD mode class label of the image is obtained by a majority voting scheme.

  • To verify the classification ability of the proposed system, the breast histopathological images are classified into 2 classes (non-malignant and malignant) as well as 4 classes (normal, benign, in situ and invasive carcinoma) while most of the existing methods classify the same broadly into 2 classes.

  • We have also explored the potentiality of our proposed model in classifying the images in the test dataset obtained by splitting the training set as well as the actual hidden test data set of ICIAR-2018 breast cancer histology image dataset.

  • Our model achieves an accuracy of 87% in classifying the images of ICIAR-2018 hidden test dataset.

This paper is organized as follows. Section 2 describes elaborately the materials and methodology employed in this paper. Section 3 contains the experimental results, discussions, and comparison with the state-of-the-art. Finally, the paper is concluded by section 4.

Section snippets

Schematic representation

Fig. 1 represents the block diagram of the entire methodology used in this work. Each of the parts is explained elaborately in the upcoming sections.

Preprocessing

For examination of histopathological slides, stains are used to enhance the contrast between the different histological structures especially the nuclei and the cytoplasm which eases their manual inspection under the microscope. The most commonly used stain in histopathological slides for their microscopic examination is the hematoxylin and eosin

Experimental Results and Discussion

In this section, we evaluate the classification performance of our proposed model in terms of sensitivity, precision, F1-score, and accuracy. We have initially split the training set into three parts for training, validation, and test (details in section 3.1.) and have reported the classification performance of both patch-wise and image-wise classification on this test set. In addition, the accuracy obtained in classifying the histology images in the hidden test dataset of the challenge has

Conclusion

In this work, we have developed a patch-based classifier (PBC) for automatic classification of ICIAR-2018 breast histology dataset into four class namely normal, benign, in situ and invasive carcinoma. The number of filters in each layer, kernel size was adjusted in such a way that the number of trainable parameters is less than the number of samples so as to prevent overfitting. This proposed classifier first predicts the class label of each input patch by OPOD technique. The whole image label

Declarations of interest

None.

Acknowledgments

The first author is grateful to the Department of Science and Technology (DST), Government of India for providing her Junior Research Fellowship (JRF) under DST INSPIRE fellowship program (IF170366).

References (34)

  • Aashna Jain

    Cancerous cell detection using histopathological image analysis

    International Journal of Innovative Research in Computer and Communication Engineering

    (2014)
  • Ajay Basavanhally

    Multi-field-of-view framework for distinguishing tumor grade in ER+ breast cancer from entire histopathology slides

    IEEE Transactions on biomedical engineering

    (2013)
  • Breast cancer and breast pathology, http://pathology.jhu.edu/breast/grade.php, (accessed 10 January...
  • Adnan M. Khan

    HyMaP: A hybrid magnitude-phase approach to unsupervised segmentation of tumor areas in breast cancer histology images

    Journal of pathology informatics

    (2013)
  • MITOS Dataset, 2018 http://ludo17.free.fr/mitos_2012/dataset.html, (accessed 18 January...
  • Vincent Roullier

    Multi-resolution graph-based analysis of histopathological whole slide images: Application to mitotic cell extraction and visualization

    Computerized Medical Imaging and Graphics

    (2016)
  • Fabio Alexandre Spanhol

    Breast cancer histopathological image classification using convolutional neural networks, IEEE International Conference on Neural Networks (IJCNN)

    (2016)
  • Cited by (168)

    View all citing articles on Scopus
    View full text