Classifier ensemble for an effective cytological image analysis
Introduction
According to the International Agency for Research on Cancer, breast cancer is the most common cancer among women. In 2008, there were 1,384,155 diagnosed cases of breast cancer and 458,503 deaths caused by the disease worldwide (Ferlay et al., 2010, Bray et al., 2012). In 2010, there were 15,784 diagnosed cases of breast cancer among Polish women, with 5226 resulting in death (National Cancer Registry, 2012). There has also been an increase in the incidence of breast cancer by 3–4% a year since the 1980s. The effectiveness of treatment largely depends on the timely detection of the disease. An important and often used diagnostic method is the so-called triple-test, which is based on three medical examinations, and is used because of its effectiveness in diagnosing breast cancer (Britton et al., 2009). The triple-test includes self examination (palpation), mammography or ultrasonography imaging, and fine needle biopsy (FNB) (Underwood, 1987). FNB is an examination that consists of obtaining material directly from the tumor. The collected material is then examined under a microscope to determine the prevalence of cancer cells. This approach requires extensive knowledge and experience of the cytologist responsible for the diagnosis. Automatic morphometric diagnosis can help make the results objective and assist inexperienced specialists. Along with the development of advanced vision systems and computer science, quantitative cytopathology has become a useful method for the detection of diseases, infections, and many other disorders (Gurcan et al., 2009, Śmietański and Tadeusiewicz, 2010, Hassan et al., 2010).
Recently, there has been an increase in interest in computer-aided cytology. Several researchers have studied the segmentation of cytological images of breast tumors, proposed new features or classification algorithms. However, only a few of these researchers have tested the efficiency of their methodology in a comprehensive computerized cancer classification system. Jeleń et al. (2010) presented an approach based on the level sets segmentation method. Classification efficiency was tested on 110 (44 malignant, 66 benign) images with results reaching 82.6%. Niwas et al. (2010) presented a method based on the analysis of nuclei texture using a wavelet transform. Classification efficiency with the k-nearest neighbor algorithm on 45 (20 malignant, 25 benign) images reached 93.33%. Another approach was presented by Malek et al. (2009). They used active contours to segment nuclei and classified 200 (80 malignant, 120 benign) images using fuzzy C-means algorithm achieving 95% efficiency. Breast cancer diagnosis was also discussed by Xiong et al. (2005). Partial least squares regression was used to classify 699 (241 malignant, 458 benign) images yielding 96.57% efficiency. However, the authors did not describe the segmentation method used to extract nuclei.
This paper presents recent progress in the development of a comprehensive fully automatic breast cancer diagnostic system based on analysis of cytological images of FNB material. The task at hand is to classify a case as benign or malignant. Recently in our research, we have introduced the third class – fibroadenoma. Fibroadenoma is a benign tumor of the breast that often occurs in women. Despite the fact it is not cancerous, it might have some morphometric features similar to malignant neoplasm. This may confuse the system and cause an incorrect diagnosis. The diagnosis is done by using the morphometric and topological features of nuclei isolated from microscopic images of the tumor. The segmentation is based on a set of four hybrid segmentation methods combining adaptive thresholding and clustering (see Section 4).
At present the classifier ensembles, known also as multiple classifier systems (MCS) or combined classifiers (Kuncheva, 2004), are the focus of intense research (Jain et al., 2000) because usually we may have more than one classifier dedicated to a given problem at our disposal. Since each classifier has its own domain of competence (Wolpert, 2001), designing a method that can exploit strengths of individual predictors while preventing us from choosing the worst model seems to be an attractive proposition (Kurzynski and Woźniak, 2012).
The process of building such a compound classifier is not trivial, and there are several problems that must be considered carefully as classifier selection and choosing an appropriate method of classifier combination. The first task is known as the classifier selection (Ruta and Gabrys, 2005) or ensemble pruning (Martinez-Munoz et al., 2009), and we would like to assure that the chosen classifiers are characterized by the high accuracy and diversity because we expect them to be complimentary. A key issue is how to measure classifier diversity (Brown and Kuncheva, 2010) and how to select classifier using the proposed measure. The second problem is how to combine individual classifiers’ outputs. The first group of methods have their origin in voting algorithms (Biggio et al., 2007). For many years, a standard majority voting was the most widespread approach. In recent years, more advanced approaches have been proposed that take into account not all decisions coming from particular committee members should influence the voting procedure in an identical way (van Erp et al., 2002). Here, we should mention those works that propose training the weights, which seems to be an attractive alternative method, often outperforming static weight assignment (Woods et al., 1997, Woźniak and Jackowski, 2009). More advanced propositions use support functions of individual classifies. Their main form is the posterior probability typically associated with probabilistic pattern recognition models, although outputs of neural networks or other functions whose values are used to establish the decision of the classifier could be considered as well. Aggregation methods that do not require learning use simple operators, like minimum, maximum, product, or mean. However, they are typically subject to very restrictive conditions (Duin, 2002), which limit their practical use. Therefore, the design of new fusion classification models, especially trained fusers, are currently the focus of intense research (Woźniak and Krawczyk, 2012).
In this work we will focus on the weighted combination of support functions, and where weights depend on a given classifier and class number. As shown in the previous works of authors (Woźniak and Zmyslony, 2010), this type of fuser achieves fairly good quality and it can be easy trained solving simple optimization task. Classifiers ensembles are nowadays widely used in the medical decision support (Jackowski et al., 2012). This paper shows the continuations of authors work on early breast cancer detection (Krawczyk et al., 2012a, Krawczyk et al., 2012b). For the presented task of fine needle biopsy image classification, we employ a MCS build on the basis of the Random Subspace approach (Ho, 1998) to assure the initial diversity among base classifiers. As it is common knowledge that not all predictors build on the basis of this method should take an identical part in the final decision we propose to combine their outputs with a novel trained fuser based on the discriminants. This way we achieve better combination results than when using traditional fusers. Exhaustive computational tests proves that our ensemble outperforms canonical classifier committees.
The proposed ensemble is used to classify independently all nine images representing the examined patient. Then, a final decision about the state of the patient is made on the basis of these individual ones. Two heuristic approaches are proposed for this problem. The results presented in this paper demonstrate that a computerized medical diagnosis system based on our method would be effective and can provide valuable and accurate diagnostic information.
The paper is divided into six sections. Section 1 presents an introduction into breast cancer diagnosis. Section 2 describes the acquisition process of the medical images used for testing. A framework for the proposed medical decision support system is introduced in Section 3. Segmentation and feature extraction are described in Section 4, while Section 5 presents the classification algorithm. Section 6 shows the experimental results obtained by the proposed method. The paper ends with our conclusions.
Section snippets
Medical images database
All methods presented in this work were tested on real medical data. For this purpose, 675 images were collected from 75 patients (25 benign, 25 malignant and 25 fibroadenoma). Each patient is represented by 9 images selected by a pathologist arbitrarily. The number of images was recommended by the specialists from the hospital and allows for their correct diagnosis.
The cytological material was obtained by FNB from patients of the Regional Hospital in Zielona Góra, Poland. Biopsies without
Framework for patient classification
In the proposed diagnostic procedure, each patient is characterized by nine separate cytological images. The physician examines all of these images and on the basis of her/his experience gives a final diagnosis. One should note that these images are not correlated to each other, because they are chosen according to arbitrary decision of the cytologist i.e., the first image from the ith patient may describe a completely different area of the biopsy sample than the first image from the jth
Nuclei segmentation and feature extraction
To determine the type of tumor, nuclei need to be isolated from other objects (e.g., red blood cells) and the background. Then from the nuclei, certain features can be extracted (Nguyen et al., 2012) and the classification of either benign, fibroadenoma, or malignant can be determined. In the literature, many different approaches have been proposed to extract cells or nuclei from microscope images (Al-Kofahi et al., 2010, Clocksin, 2003, Cloppet and Boucher, 2008, Hrebień et al., 2010,
Classification
In this section, the proposed classification scheme, employed as a part of the presented medical decision support system, is presented in detail.
Aims of the experiment
The aim of the experiment was to examine the usefulness of the proposed hybrid segmentation methods and ensemble with trained combination method for the cytological image analysis. The main goals of the investigations are listed below:
- •
To examine the usefulness of the segmentation methods for the process of feature extraction from biopsy images by checking their discriminant abilities.
- •
To check the quality of the proposed MCS based on Random Subspaces and compare it with several models popular in
Conclusions
The paper proposed a novel framework for a medical decision support system dedicated to breast cancer recognition from biopsy images. We have discussed an approach that examines the patient on the basis of nine individual images, and then makes the final decision by aggregating the individual outputs of the classification module. For the evaluation purposes we use a wide collection of real-life images gathered over the recent years by authors.
We have proposed four different methods dedicated to
Acknowledgements
The authors thank Dr. Roman Monczak from the Regional Hospital in Zielona Góra, Poland for his great help and interesting discussions and Tracy Murcheski for her linguistic assistance in preparing this article.
Bartosz Krawczyk and Michał Woźniak are supported by the Polish National Science Centre under the grant NN519650440, which is being realized in years 2011–2014.
Paweł Filipczuk is a scholar within Sub-measure 8.2.2 Regional Innovation Strategies, Measure 8.2 Transfer of knowledge, Priority
References (54)
- et al.
Breast-cancer identification using HMM-fuzzy approach
Comput. Biol. Med.
(2010) - et al.
Prostate cancer grading: gland segmentation and structural features
Pattern Recognition Lett.
(2012) - et al.
Combining shape, texture and intensity features for cell nuclei extraction in pap smear images
Pattern Recognition Lett.
(2011) - et al.
Classifier selection for majority voting
Inform. Fusion
(2005) - et al.
Improved automatic detection and segmentation of cell nuclei in histopathology images
IEEE Trans. Biomed. Eng.
(2010) - Alpaydin, E., 2010. Introduction to Machine Learning, second ed. The MIT...
Pattern Recognition with Fuzzy Objective Function Algorithms
(1981)- Biggio, B., Fumera, G., Roli, F., 2007. Bayesian analysis of linear combiners. In: Proceedings of the Seventh...
- et al.
Estimates of global cancer prevalence for 27 sites in the adult population in 2008
Int. J. Cancer
(2008) - Britton, P., Duffy, S., Sinnatamby, R., Wallis, M., Barter, S., Gaskarth, M., O’Neill, A., Caldas, C., Brenton, J.,...
Color clustering and learning for image segmentation based on neural networks
IEEE Trans. Neural Networks
Digital Image Processing
Histopathological image analysis: a review
IEEE Rev. Biomed. Eng.
The random subspace method for constructing decision forests
IEEE Trans. Pattern Anal. Mach. Intell.
Segmentation of breast cancer fine needle biopsy cytological images
Int. J. Appl. Math. Comput. Sci.
Methods for combining experts’ probability assessments
Neural Comput.
Statistical pattern recognition: a review
IEEE Trans. Pattern Anal. Mach. Intell.
Classification of breast cancer malignancy using cytological images of fine needle aspiration biopsies
Int. J. Appl. Math. Comput. Sci.
Cited by (20)
GC-EnC: A Copula based ensemble of CNNs for malignancy identification in breast histopathology and cytology images
2023, Computers in Biology and MedicineCitation Excerpt :To overcome this limitation, different data augmentation techniques like manual ROI generation [16], image enhancement and rotation [26], nuclei patch identification [27] have been explored. Several classifier fusion techniques [7,16,17,28–30] have also emerged for boosting the classification accuracy exploring majority voting based approaches. Dey et al. [31] proposed a conditional GAN based synthetic cytology image data augmentation technique.
Reviewing ensemble classification methods in breast cancer
2019, Computer Methods and Programs in BiomedicineEvolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
2016, Applied Soft Computing JournalCitation Excerpt :Automatic breast cancer detection from medical images has been widely addressed in the contemporary literature [14,48]. There are numerous reports on applying different imaging techniques (such as microscopic analysis [19], mammography [61] or magnetic resonance [49]), segmentation methods [38] or classification approaches [20] for this task. However, not much attention was paid to the problem of designing a decision support system for breast cancer malignancy grading [36].
Cytological image analysis with firefly nuclei detection and hybrid one-class classification decomposition
2014, Engineering Applications of Artificial IntelligenceCitation Excerpt :A computer-aided diagnosis system can assist specialists and provide objective diagnostic information. This paper shows the continuations of authors' work on early breast cancer detection (Filipczuk et al., 2011a,b, 2013; Krawczyk et al., 2012). For the presented task of fine needle biopsy image classification we use the mentioned one-class decomposition strategy.
Pixel-Level Feature Extraction Model for Breast Cancer Detection
2023, Computers, Materials and ContinuaA Review of Image Processing Approaches of the Iridology as A Biomedical
2022, 2022 FORTEI-International Conference on Electrical Engineering, FORTEI-ICEE 2022 - Proceeding