Ensemble classification of colon biopsy images based on information rich hybrid features

https://doi.org/10.1016/j.compbiomed.2013.12.010Get rights and content

Abstract

In recent years, classification of colon biopsy images has become an active research area. Traditionally, colon cancer is diagnosed using microscopic analysis. However, the process is subjective and leads to considerable inter/intra observer variation. Therefore, reliable computer-aided colon cancer detection techniques are in high demand. In this paper, we propose a colon biopsy image classification system, called CBIC, which benefits from discriminatory capabilities of information rich hybrid feature spaces, and performance enhancement based on ensemble classification methodology. Normal and malignant colon biopsy images differ with each other in terms of the color distribution of different biological constituents. The colors of different constituents are sharp in normal images, whereas the colors diffuse with each other in malignant images. In order to exploit this variation, two feature types, namely color components based statistical moments (CCSM) and Haralick features have been proposed, which are color components based variants of their traditional counterparts. Moreover, in normal colon biopsy images, epithelial cells possess sharp and well-defined edges. Histogram of oriented gradients (HOG) based features have been employed to exploit this information. Different combinations of hybrid features have been constructed from HOG, CCSM, and Haralick features. The minimum Redundancy Maximum Relevance (mRMR) feature selection method has been employed to select meaningful features from individual and hybrid feature sets. Finally, an ensemble classifier based on majority voting has been proposed, which classifies colon biopsy images using the selected features. Linear, RBF, and sigmoid SVM have been employed as base classifiers. The proposed system has been tested on 174 colon biopsy images, and improved performance (=98.85%) has been observed compared to previously reported studies. Additionally, the use of mRMR method has been justified by comparing the performance of CBIC on original and reduced feature sets.

Introduction

Colon cancer has become a major cause of deaths in modern and industrialized world. The death toll rate has been raised to 0.5 million deaths per year worldwide [1]. Colon cancer usually arises due to chain smoking, family history, increasing age, and unbalanced consumption of meat and fruits/vegetables [2].

The common and traditional method of colon cancer diagnosis is microscopic analysis of colon biopsy samples. In such an examination, histopathologists analyze the biopsy samples under microscope, and diagnose the tissue as normal/malignant based on the morphology of tissues. Normal and malignant tissues have high contrast in their morphology. Normal colon tissues have well-defined structure. Fig. 1(a) presents microscopic image of a normal colon biopsy sample, wherein all the tissues possess a regular structure. The detailed regular structure of a normal colon tissue is shown in Fig. 1(b), wherein we see that a normal colon tissue has three constituents, namely epithelial cells, non-epithelial cells, and lumen. Epithelial cells usually surround lumen and form glandular structure, whereas non-epithelial cells, called stroma, lie in between these structures. But, cancer heavily disturbs the structure of colon tissues, and makes the structure almost amorphous. The deformation introduced by cancer is clearly visible in the microscopic image of a malignant colon biopsy sample shown in Fig. 1(c). Normal and malignant colon tissues have similar colors, but the distribution of colors heavily varies. Further, normal tissues have well-defined structures such as elliptic shaped epithelial cells and lumen having sharp boundaries. Malignant colon tissues, on the other hand, have no such edges. All the constituents of tissues mix with each other, thereby diminishing the boundaries.

Histopathologists assign two quantitative measures to the malignant samples, namely stages and grades. Stage is the extent to which cancer has reached/spread in the colon or other body parts. There are five stages of colon cancer (0, A–D) according to Duke's scale [3]. Stage 0 is the earliest stage in which cancer just starts to develop. It is still restricted to the innermost lining of colon. In stage A, cancer has reached to the middle layer of colon. In stage B, cancer has reached beyond the middle layer. Cancer has stage C if it reaches lymph nodes, and is found in at least three of them. Stage D is the final stage, wherein cancer has reached other body parts such as lungs and liver. The grade of cancer, on the other hand, is the differentiability level of malignant cells. There are three grades of colon cancer. The lowest grade of colon cancer is ‘well differentiated’, in which malignant cells are almost similar to the normal ones. It is the grade in which cancer progresses at lowest speed. The second grade of colon cancer is ‘moderately differentiated’, wherein malignant cells are differentiable from normal cells. In this grade, cancer cells progress at moderate speed. The third grade of colon cancer is ‘poorly differentiated’, in which malignant cells are totally different from the normal ones, and are easily distinguishable. In this particular grade, cancer cells spread at very high rate. Fig. 2 presents microscopic images of malignant colon biopsy samples having different grades of cancer.

The determination of the grades and stages of colon cancer is a manual process. In order to determine cancer grades, histopathologists analyze the biopsy samples under microscope and assign quantitative cancer grades depending upon the morphology of malignant tissues. On the other hand, cancer stage is determined by microscopic analysis of separate biopsy samples taken from different layers of colon and lymph nodes. The manual process of colon cancer detection has a few limitations. For instance, it consumes precious time of the histopathologists as they have to analyze many images per day. Moreover, the process is subjective, and leads to biased opinion due to workload and experience level of histopathologists. Further, the process leads to inter- and intra-observer variabilities [4], [5]. Therefore, an accurate computational system for automatic colon cancer detection is highly desirable.

In the past two decades, a few computer-aided diagnostic systems have been proposed for automatic detection of colon cancer. However, the efforts in case of colon cancer are still deficient compared to other areas of computer-aided diagnosis. Some of the typical approaches for computer-aided diagnosis of colon cancer include analysis of human genes using microarrays [6], [7], study of variation in the composition of normal and cancerous blood serum [8], [9], and exploitation of textural changes in cancerous and normal colon images. These techniques have been summarized in a recent survey reported by Rathore et al. [10].

Textural variations in colon biopsy images are the emphasis of this research work. Texture analysis of colon biopsy images is characterized by extraction of discriminate features from the observed texture of these images. The extracted features are then used as an input to different classifiers for discerning normal and malignant images. For example, Esgiar et al. calculated six texture features (contrast, entropy, angular second moment, dissimilarity, inverse difference moment and correlation) from gray-level co-occurrence matrix (GLCM) of the input colon biopsy images. They employed linear discriminant analysis (LDA) and K-nearest neighbor (KNN) classifiers, and obtained 90.2% classification accuracy [11]. They found correlation and entropy to be the two most distinctive features compared to others. Esgiar et al. [12] further extended their work, and combined features of entropy and correlation with image fractal dimensions. They obtained 94.10% classification accuracy with the same set of classifiers. Masood et al. [13] employed morphological and GLCM based texture features to obtain a classification accuracy of 84% and 90%, respectively. Morphological features comprise features of shape, size and orientation, whereas GLCM based features encompass energy, inertia and local homogeneity. Both types of features are obtained using single spectral band of colon biopsy images. Support vector machines (SVM) with polynomial kernel of degree 3 is employed as a classifier. Masood and Rajpoot [14] further extend their work, and employed circular local binary patterns from single spectral band in order to classify colon biopsy images. They employed Gaussian SVM for classification, and obtained an accuracy of 90%.

In 2010, Altunbay et al. [15] proposed a textural features based technique for classifying colon samples into normal and malignant categories. They constructed a graph on different objects, obtained by using circle fit algorithm [16] on the white, pink and purple clusters of the image. A few structural features such as degree, average clustering coefficient, and diameter are computed from the color graphs, and are used to classify given samples by using SVM classifier with linear kernel. Moreover, Ozdemir et al. [17] presented an interesting method of colon cancer detection. In their work, reference graphs of a few images of normal colon tissues are generated by employing previously used method of graph creation [15], [18], [19], [20], and are stored for future referencing. Then, query graphs are generated from the test images, and are located in the reference graphs. Query graphs are searched in the reference graphs by placing nucleus node of a query graph on each node of the reference graphs. Three most similar graphs are found in the reference graphs, and then based on the degree of similarity, normal/malignant class is assigned to the test sample.

The schemes mentioned herein suffer a few drawbacks. For instance, graph based colon image classification schemes [15], [17] are computationally expensive, and consume considerable CPU time in feature extraction and classification stages. Further, previous techniques have exploited only one certain aspect for colon biopsy image classification i.e. they have utilized features of only one type. These techniques include texture features, morphological features or object texture based features. But multiple feature types have not been investigated simultaneously to get a more robust and discerning feature set. Therefore, an automatic colon biopsy image classification scheme is highly desirable that is computationally tractable and simultaneously highly rich in terms of discerning features.

In this paper, we propose a colon biopsy image classification (CBIC) system, which performs ensemble classification of samples based on discriminatory capabilities of hybrid feature spaces. In order to exploit the color information present in colon biopsy images, variants of traditional statistical moments and Haralick features have been proposed. Further, traditional histogram of oriented gradients (HOG) based features have been used. These features have been combined to form various hybrid feature sets. The minimum Redundancy Maximum Relevance (mRMR) method has been employed to select discerning feature sets from individual as well as hybrid feature sets. The selected discerning feature sets have been used for classification of samples into normal and malignant classes by employing ensemble classification through majority voting.

The experimental results in this work have been obtained from various aspects. First, the performance of individual as well as hybrid feature types has been investigated. Second, the performance of original feature sets and the feature sets selected by mRMR method has been examined. Third, the performance of individual as well as ensemble classifier has been studied. The experimental results verify that the proposed system is quite suitable for the classification of colon biopsy images. Further, an analysis on computational efficiency of feature extraction and classification stages has been presented in order to validate the suitability of the proposed CBIC system to serve in real-time scenarios where histopathologists receive many images per day.

The remainder of this paper is organized as follows. Section 2 describes proposed system in detail. Section 3 describes performance measures. Section 4 demonstrates experimental results, and Section 5 concludes the paper.

Section snippets

Proposed system

The proposed CBIC colon classification system utilizes hybrid features, selected by mRMR, for decision making through ensemble classification. In this paper, we have experimentally validated the proposed CBIC system by evaluating the discerning capability of reduced individual and hybrid feature sets using base and ensemble classifiers. The proposed system comprises four main stages, namely (1) feature extraction, (2) feature selection, (3) training and testing data formulation, and finally,

Performance measures

The proposed CBIC system has been quantitatively evaluated using well-known performance measures such as accuracy, sensitivity, specificity, Mathew’s correlation coefficient (MCC), F-score, Kappa statistics, and receiver operating characteristics curve (ROC). Generally, a particular measure of accuracy takes into account a certain factor underlying the yielded classification results. However, we use multiple classification measures in order to obtain more reliable comparison. Normal and

Results and discussions

In this section, we present the results of using the proposed system for identifying normal and malignant colon biopsy images from the dataset presented in Section 4.1. Individual features as described in Section 2.1 have been extracted from colon biopsy images, and multiple hybrid feature sets have been constructed from the individual feature sets. Individual as well as hybrid features have been reduced using mRMR method (see Section 4.2). Majority voting based ensemble classifier has been

Conclusion

In this research study, a classification system (CBIC) has been proposed for predicting cancer in colon tissues. In the proposed system, hybrid feature set comprising CCSM, Haralick-HSV, and HOG is constructed. The mRMR method is employed to select discerning features from the hybrid feature set. The discerning features are then used in different SVM kernels based ensemble classification. Working with colon biopsy images, highest classification accuracy of 98.85% and 96.68% has been observed

Conflict of interest statement

This paper is the authors' original work and has not been published nor has it been submitted simultaneously elsewhere. All authors have checked the paper and have agreed to the submission. We do not have any financial/personal relationships with other people/organizations which could inappropriately influence this work.

Acknowledgment

This work is supported by the Higher Education Commission of Pakistan (HEC) under indigenous PhD scholarship program as per Award no. 117-7931-Eg7-037. We are thankful to Mr. Imtiaz Ahmad Qureshi (Assistant Professor, Histopathology Department, Rawalpindi Medical College) for providing data and relevant expert opinion. We also appreciate the support provided by Histopathology department of PAEC general hospital, Islamabad, Pakistan for providing imaging equipment.

References (60)

  • Cancer Facts and Figures, American Cancer Society, 〈http://www.cancer.org/research/cancerfactsstatistics〉, October...
  • Colon Cancer Risk Factors, C.C. Alliance, 〈http://www.ccalliance.org/colorectal_cancer/riskfactors.html〉, October...
  • D. Myers, Colon Cancer Stages: Basics of Each Colon Cancer Stage,...
  • G.D. Thomas et al.

    Observer variation in the histological grading of rectal carcinoma

    J. Clin. Pathol.

    (1983)
  • A. Andrion et al.

    Malignant mesothelioma of the pleura: inter observer variability

    J. Clin. Pathol.

    (1995)
  • E.T. Venkatesh et al.

    An improved neural approach for malignant and normal colon tissue classification from oligonucleotide arrays

    Eur. J. Sci. Res.

    (2011)
  • H.S. Shon, G. Sohn, K.S. Jung, S.Y. Kim, E.J. Cha, K.H. Ryu, Gene expression data classification using discrete wavelet...
  • X. Li, X. Li, M. Lei, D. Wang, J. Lin, Detection of colon cancer by laser induced fluorescence and raman spectroscopy,...
  • S. Rathore et al.

    A recent survey on colon cancer detection techniques

    IEEE/ACM Trans. Comput. Biol. Bioinf.

    (2013)
  • A.N. Esgiar et al.

    Microscopic image analysis for quantitative measurement and feature identification of normal and cancerous colonic mucosa

    IEEE Trans. Inf. Technol. Biomed.

    (1998)
  • A.N. Esgiar et al.

    Fractal analysis in the detection of colonic cancer images

    IEEE Trans. Inf. Technol. Biomed.

    (2002)
  • K. Masood, N. Rajpoot, H. Qureshi, K. Rajpoot, Co-occurrence and morphological analysis for colon tissue biopsy...
  • K. Masood, N. Rajpoot, Texture based classification of hyperspectral colon biopsy samples using CLBP, in: Proceedings...
  • D. Altunbay et al.

    Color graphs for automated cancer diagnosis and grading

    IEEE Trans. Biomed. Eng.

    (2010)
  • E. Ozdemir et al.

    A hybrid classification model for digital pathology using structural and statistical pattern recognition

    IEEE Trans. Med. Imaging

    (2013)
  • D. Altunbay et al.

    Color graphs for automated cancer diagnosis and grading

    IEEE Trans. Biomed. Eng.

    (2010)
  • C.G. Demir et al.

    Automatic segmentation of colon glands using object-graphs

    Med. Image Anal.

    (2010)
  • A.B. Tosun et al.

    Graph run-length matrices for histopathological image segmentation

    IEEE Trans. Med. Imag.

    (2011)
  • N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Proceedings of the International...
  • L. Meng, L. Li, S. Mei, W. Wu, Directional entropy feature for human detection, in: Proceedings of the 19th...
  • Cited by (76)

    • Feature Ranking Importance from Multimodal Radiomic Texture Features using Machine Learning Paradigm: A Biomarker to Predict the Lung Cancer

      2022, Big Data Research
      Citation Excerpt :

      We used empirical receiver operating characteristics (EROC) curve and rand classifier slop to rank the features. Moreover, for classification of lung types, the researchers in the past used various techniques for detection and prognosis of lung cancer from data mining [28], fuzzy rules, [29], medical imaging [30–33] and machine learning techniques [34–36]. The aim of this study is to extract multimodal features to capture the intrinsic dynamics present in the lung cancer subtype.

    • Improving Multi-class Brain Tumor Detection Using Vision Transformer as Feature Extractor

      2023, Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
    View all citing articles on Scopus
    View full text