Elsevier

Pattern Recognition Letters

Volume 34, Issue 14, 15 October 2013, Pages 1748-1757
Pattern Recognition Letters

Classifier ensemble for an effective cytological image analysis

https://doi.org/10.1016/j.patrec.2013.04.030Get rights and content

Highlights

  • A novel medical decision support framework for cytological image analysis.

  • An in-depth examination of four proposed hybrid image segmentation methods.

  • An efficient classifier ensemble using a trained fuser based on discriminants.

  • Decision rules for fusing the outputs for nine images into a single prediction for the patient.

  • Experimental investigations carried on a large and diverse database collected by authors.

Abstract

Breast cancer is the most common type of cancer among women. As early detection is crucial for the patient’s health, much attention has been paid to the development of tools for effective recognition of this disease. This article presents an application of image analysis and classification methods for fine needle biopsy. In our approach, each patient is described by nine microscopic images taken from the biopsy sample. The images are related to regions of the biopsy that seem interesting to the physician who selects them arbitrarily. We propose four different hybrid segmentation algorithms dedicated to processing these images and examine their effectiveness for the nuclei feature extraction task. Classification is carried out with the usage of a classifier ensemble based on the Random Subspaces approach. To boost its effectiveness, we use a linear combination of the support functions returned by the individual classifiers in the ensemble. In the proposed medical support system, the final decision about the patient is delivered after a fusion of nine separate outputs of the classifier – each for a different image. Experimental results carried out on a diverse dataset collected by the authors prove that the proposed solution outperforms state-of-the-art classifiers and shows itself to be a valuable tool for supporting day-to-day cytologist’s routine.

Introduction

According to the International Agency for Research on Cancer, breast cancer is the most common cancer among women. In 2008, there were 1,384,155 diagnosed cases of breast cancer and 458,503 deaths caused by the disease worldwide (Ferlay et al., 2010, Bray et al., 2012). In 2010, there were 15,784 diagnosed cases of breast cancer among Polish women, with 5226 resulting in death (National Cancer Registry, 2012). There has also been an increase in the incidence of breast cancer by 3–4% a year since the 1980s. The effectiveness of treatment largely depends on the timely detection of the disease. An important and often used diagnostic method is the so-called triple-test, which is based on three medical examinations, and is used because of its effectiveness in diagnosing breast cancer (Britton et al., 2009). The triple-test includes self examination (palpation), mammography or ultrasonography imaging, and fine needle biopsy (FNB) (Underwood, 1987). FNB is an examination that consists of obtaining material directly from the tumor. The collected material is then examined under a microscope to determine the prevalence of cancer cells. This approach requires extensive knowledge and experience of the cytologist responsible for the diagnosis. Automatic morphometric diagnosis can help make the results objective and assist inexperienced specialists. Along with the development of advanced vision systems and computer science, quantitative cytopathology has become a useful method for the detection of diseases, infections, and many other disorders (Gurcan et al., 2009, Śmietański and Tadeusiewicz, 2010, Hassan et al., 2010).

Recently, there has been an increase in interest in computer-aided cytology. Several researchers have studied the segmentation of cytological images of breast tumors, proposed new features or classification algorithms. However, only a few of these researchers have tested the efficiency of their methodology in a comprehensive computerized cancer classification system. Jeleń et al. (2010) presented an approach based on the level sets segmentation method. Classification efficiency was tested on 110 (44 malignant, 66 benign) images with results reaching 82.6%. Niwas et al. (2010) presented a method based on the analysis of nuclei texture using a wavelet transform. Classification efficiency with the k-nearest neighbor algorithm on 45 (20 malignant, 25 benign) images reached 93.33%. Another approach was presented by Malek et al. (2009). They used active contours to segment nuclei and classified 200 (80 malignant, 120 benign) images using fuzzy C-means algorithm achieving 95% efficiency. Breast cancer diagnosis was also discussed by Xiong et al. (2005). Partial least squares regression was used to classify 699 (241 malignant, 458 benign) images yielding 96.57% efficiency. However, the authors did not describe the segmentation method used to extract nuclei.

This paper presents recent progress in the development of a comprehensive fully automatic breast cancer diagnostic system based on analysis of cytological images of FNB material. The task at hand is to classify a case as benign or malignant. Recently in our research, we have introduced the third class – fibroadenoma. Fibroadenoma is a benign tumor of the breast that often occurs in women. Despite the fact it is not cancerous, it might have some morphometric features similar to malignant neoplasm. This may confuse the system and cause an incorrect diagnosis. The diagnosis is done by using the morphometric and topological features of nuclei isolated from microscopic images of the tumor. The segmentation is based on a set of four hybrid segmentation methods combining adaptive thresholding and clustering (see Section 4).

At present the classifier ensembles, known also as multiple classifier systems (MCS) or combined classifiers (Kuncheva, 2004), are the focus of intense research (Jain et al., 2000) because usually we may have more than one classifier dedicated to a given problem at our disposal. Since each classifier has its own domain of competence (Wolpert, 2001), designing a method that can exploit strengths of individual predictors while preventing us from choosing the worst model seems to be an attractive proposition (Kurzynski and Woźniak, 2012).

The process of building such a compound classifier is not trivial, and there are several problems that must be considered carefully as classifier selection and choosing an appropriate method of classifier combination. The first task is known as the classifier selection (Ruta and Gabrys, 2005) or ensemble pruning (Martinez-Munoz et al., 2009), and we would like to assure that the chosen classifiers are characterized by the high accuracy and diversity because we expect them to be complimentary. A key issue is how to measure classifier diversity (Brown and Kuncheva, 2010) and how to select classifier using the proposed measure. The second problem is how to combine individual classifiers’ outputs. The first group of methods have their origin in voting algorithms (Biggio et al., 2007). For many years, a standard majority voting was the most widespread approach. In recent years, more advanced approaches have been proposed that take into account not all decisions coming from particular committee members should influence the voting procedure in an identical way (van Erp et al., 2002). Here, we should mention those works that propose training the weights, which seems to be an attractive alternative method, often outperforming static weight assignment (Woods et al., 1997, Woźniak and Jackowski, 2009). More advanced propositions use support functions of individual classifies. Their main form is the posterior probability typically associated with probabilistic pattern recognition models, although outputs of neural networks or other functions whose values are used to establish the decision of the classifier could be considered as well. Aggregation methods that do not require learning use simple operators, like minimum, maximum, product, or mean. However, they are typically subject to very restrictive conditions (Duin, 2002), which limit their practical use. Therefore, the design of new fusion classification models, especially trained fusers, are currently the focus of intense research (Woźniak and Krawczyk, 2012).

In this work we will focus on the weighted combination of support functions, and where weights depend on a given classifier and class number. As shown in the previous works of authors (Woźniak and Zmyslony, 2010), this type of fuser achieves fairly good quality and it can be easy trained solving simple optimization task. Classifiers ensembles are nowadays widely used in the medical decision support (Jackowski et al., 2012). This paper shows the continuations of authors work on early breast cancer detection (Krawczyk et al., 2012a, Krawczyk et al., 2012b). For the presented task of fine needle biopsy image classification, we employ a MCS build on the basis of the Random Subspace approach (Ho, 1998) to assure the initial diversity among base classifiers. As it is common knowledge that not all predictors build on the basis of this method should take an identical part in the final decision we propose to combine their outputs with a novel trained fuser based on the discriminants. This way we achieve better combination results than when using traditional fusers. Exhaustive computational tests proves that our ensemble outperforms canonical classifier committees.

The proposed ensemble is used to classify independently all nine images representing the examined patient. Then, a final decision about the state of the patient is made on the basis of these individual ones. Two heuristic approaches are proposed for this problem. The results presented in this paper demonstrate that a computerized medical diagnosis system based on our method would be effective and can provide valuable and accurate diagnostic information.

The paper is divided into six sections. Section 1 presents an introduction into breast cancer diagnosis. Section 2 describes the acquisition process of the medical images used for testing. A framework for the proposed medical decision support system is introduced in Section 3. Segmentation and feature extraction are described in Section 4, while Section 5 presents the classification algorithm. Section 6 shows the experimental results obtained by the proposed method. The paper ends with our conclusions.

Section snippets

Medical images database

All methods presented in this work were tested on real medical data. For this purpose, 675 images were collected from 75 patients (25 benign, 25 malignant and 25 fibroadenoma). Each patient is represented by 9 images selected by a pathologist arbitrarily. The number of images was recommended by the specialists from the hospital and allows for their correct diagnosis.

The cytological material was obtained by FNB from patients of the Regional Hospital in Zielona Góra, Poland. Biopsies without

Framework for patient classification

In the proposed diagnostic procedure, each patient is characterized by nine separate cytological images. The physician examines all of these images and on the basis of her/his experience gives a final diagnosis. One should note that these images are not correlated to each other, because they are chosen according to arbitrary decision of the cytologist i.e., the first image from the ith patient may describe a completely different area of the biopsy sample than the first image from the jth

Nuclei segmentation and feature extraction

To determine the type of tumor, nuclei need to be isolated from other objects (e.g., red blood cells) and the background. Then from the nuclei, certain features can be extracted (Nguyen et al., 2012) and the classification of either benign, fibroadenoma, or malignant can be determined. In the literature, many different approaches have been proposed to extract cells or nuclei from microscope images (Al-Kofahi et al., 2010, Clocksin, 2003, Cloppet and Boucher, 2008, Hrebień et al., 2010,

Classification

In this section, the proposed classification scheme, employed as a part of the presented medical decision support system, is presented in detail.

Aims of the experiment

The aim of the experiment was to examine the usefulness of the proposed hybrid segmentation methods and ensemble with trained combination method for the cytological image analysis. The main goals of the investigations are listed below:

  • To examine the usefulness of the segmentation methods for the process of feature extraction from biopsy images by checking their discriminant abilities.

  • To check the quality of the proposed MCS based on Random Subspaces and compare it with several models popular in

Conclusions

The paper proposed a novel framework for a medical decision support system dedicated to breast cancer recognition from biopsy images. We have discussed an approach that examines the patient on the basis of nine individual images, and then makes the final decision by aggregating the individual outputs of the classification module. For the evaluation purposes we use a wide collection of real-life images gathered over the recent years by authors.

We have proposed four different methods dedicated to

Acknowledgements

The authors thank Dr. Roman Monczak from the Regional Hospital in Zielona Góra, Poland for his great help and interesting discussions and Tracy Murcheski for her linguistic assistance in preparing this article.

Bartosz Krawczyk and Michał Woźniak are supported by the Polish National Science Centre under the grant NN519650440, which is being realized in years 2011–2014.

Paweł Filipczuk is a scholar within Sub-measure 8.2.2 Regional Innovation Strategies, Measure 8.2 Transfer of knowledge, Priority

References (54)

  • Brown, G., Kuncheva, L.I., 2010. Good and bad diversity in majority vote ensembles. In: MCS, pp....
  • Clocksin, W.F., 2003. Automatic segmentation of overlapping nuclei with high background variation using robust...
  • Cloppet, F., Boucher, A., 2008. Segmentation of overlapping/aggregating nuclei cells in biological images. In: Proc....
  • Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R....
  • G. Dong et al.

    Color clustering and learning for image segmentation based on neural networks

    IEEE Trans. Neural Networks

    (2005)
  • Duin, R., 2002. The combining classifier: to train or not to train?, In: Proceedings of the 16th International...
  • Ferlay, J., Shin, H., Bray, F., Forman, D., Mathers, C., Parkin, D., 2010. Globocan 2008 v2.0, cancer incidence and...
  • Filipczuk, P., Kowal, M., Obuchowicz, A., 2011a. Automatic breast cancer diagnosis based on k-means clustering and...
  • Filipczuk, P., Kowal, M., Obuchowicz, A., 2011b. Fuzzy clustering and adaptive thresholding based segmentation method...
  • R.C. Gonzalez et al.

    Digital Image Processing

    (2001)
  • M.N. Gurcan et al.

    Histopathological image analysis: a review

    IEEE Rev. Biomed. Eng.

    (2009)
  • T.K. Ho

    The random subspace method for constructing decision forests

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
  • M. Hrebień et al.

    Segmentation of breast cancer fine needle biopsy cytological images

    Int. J. Appl. Math. Comput. Sci.

    (2010)
  • Jackowski, K., Krawczyk, B., Woźniak, M., 2012. Cost-sensitive splitting and selection method for medical decision...
  • R.A. Jacobs

    Methods for combining experts’ probability assessments

    Neural Comput.

    (1995)
  • A. Jain et al.

    Statistical pattern recognition: a review

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • L. Jeleń et al.

    Classification of breast cancer malignancy using cytological images of fine needle aspiration biopsies

    Int. J. Appl. Math. Comput. Sci.

    (2010)
  • Cited by (20)

    • GC-EnC: A Copula based ensemble of CNNs for malignancy identification in breast histopathology and cytology images

      2023, Computers in Biology and Medicine
      Citation Excerpt :

      To overcome this limitation, different data augmentation techniques like manual ROI generation [16], image enhancement and rotation [26], nuclei patch identification [27] have been explored. Several classifier fusion techniques [7,16,17,28–30] have also emerged for boosting the classification accuracy exploring majority voting based approaches. Dey et al. [31] proposed a conditional GAN based synthetic cytology image data augmentation technique.

    • Reviewing ensemble classification methods in breast cancer

      2019, Computer Methods and Programs in Biomedicine
    • Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy

      2016, Applied Soft Computing Journal
      Citation Excerpt :

      Automatic breast cancer detection from medical images has been widely addressed in the contemporary literature [14,48]. There are numerous reports on applying different imaging techniques (such as microscopic analysis [19], mammography [61] or magnetic resonance [49]), segmentation methods [38] or classification approaches [20] for this task. However, not much attention was paid to the problem of designing a decision support system for breast cancer malignancy grading [36].

    • Cytological image analysis with firefly nuclei detection and hybrid one-class classification decomposition

      2014, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      A computer-aided diagnosis system can assist specialists and provide objective diagnostic information. This paper shows the continuations of authors' work on early breast cancer detection (Filipczuk et al., 2011a,b, 2013; Krawczyk et al., 2012). For the presented task of fine needle biopsy image classification we use the mentioned one-class decomposition strategy.

    • A Review of Image Processing Approaches of the Iridology as A Biomedical

      2022, 2022 FORTEI-International Conference on Electrical Engineering, FORTEI-ICEE 2022 - Proceeding
    View all citing articles on Scopus
    View full text