Elsevier

Expert Systems with Applications

Volume 40, Issue 18, 15 December 2013, Pages 7457-7467
Expert Systems with Applications

A comparison of methods for extracting information from the co-occurrence matrix for subcellular classification

https://doi.org/10.1016/j.eswa.2013.07.047Get rights and content

Highlights

  • Cell phenotype image classification by ensemble of descriptors.

  • Compare some recently proposed methods that are based on the co-occurrence matrix.

  • Investigate the correlation among the features that can be extracted from the co-occurrence matrix.

  • Determine the best way to combine co-occurrence matrix based feature sets.

Abstract

In this paper we focus on cell phenotype image classification, a bioimaging problem that is concerned with finding the location of protein expressions within a cell. Protein localization is becoming increasingly critical in the diagnosis and prognosis of many diseases. In recent years several new approaches for describing a given image have been proposed. Some of the most significant developments have been based on binary encodings, such as local binary patterns and local phase quantization. In this paper we reexamine one of the oldest methods for representing an image that Haralick famously proposed in 1979 using the co-occurrence matrix for calculating a set of image statistics. Few methods have been proposed since that extract new features from the co-occurrence matrix. In this work we compare some recently proposed methods that are based on the co-occurrence matrix (CM) to classify cell phenotype images. We investigate the correlation among the different sets of features that can be extracted from the CM and then determine the best way to combine these different feature sets for optimizing system performance. Moreover, we combine our novel approach with state of the art descriptors to optimize performance. We validate our approach on various types of biological microscope images using five image databases for subcellular classification. We use these image features for training a stand-alone support vector machine and a random subspace of support vector machines to separate the classes in each dataset.

The Matlab code for some of the approaches tested in this paper will be available at <http://www.dei.unipd.it/wdyn/?IDsezione=3314&IDgruppo_pass=124&preview=>.

Introduction

The development of new and improved tools for automatic analysis and classification has already proven beneficial in clinical practice and in medical and biological research (Hamilton et al., 2009, Murphy, 2006). In Karkanis, Iakovidis, Maroulis, Karras, and Tzivras (2003), for example, linear discriminant analysis of wavelet features, used in applications that range from traffic incident detection (Samant & Adeli, 2000) to face identification and verification (Shen, Bai, & Fairhurst, 2007), recently proved effective in the detection of tumors in endoscopic images. In Ameling, Wirth, Paulus, Lacey, and Vilarino (2009), image texture information successfully discriminated polyps in colonoscopy images. The Local Binary Pattern (LBP) operator Ojala, Pietikainen, and Maeenpaa (2002), which has distinguished itself from other image texture operators by its simplicity, effectiveness, and robustness, has also proven very good at detecting a variety of tumors and masses. In Vécsei, Amann, Hegenbart, Liedlgruber, and Uhl (2011), for example, LBP was used to assign a Marsh-like score to endoscopical images of pediatric celiac diseases, thus providing concrete help for pathologists. In Oliver, Lladó, Freixenet, and Martí (2007) a Support Vector Machine (SVM) was coupled with the LBP operator to distinguish real masses from normal parenchyma in mammographic images, thus reducing the incidence of false positive samples. In Unay and Ekin (2008), for instance, LBP was used to explore brain magnetic resonance data, and in Nanni and Lumini (2008) the authors demonstrated how a combination of LBP with other texture descriptors is effective in classifying different cell phenotypes using SVM.

In this paper we focus on cell phenotype image classification, a bioimaging problem that is concerned with finding the location of protein expressions within a cell. Understanding the function of proteins at the cellular level is a major goal in biology (Chebira et al., 2007) since the knowledge of the subcellular locations of a protein is useful in understanding its specific function and in describing a cell’s behavior under different conditions. Not only is protein localization an important topic in biology, but it is also critical in the design of drug screening systems, drug discovery, and recently in the diagnosis and prognosis of many diseases. In fact, it has been suggested that the old paradigm of one protein, one biomarker, one clinical decision no longer holds and is being replaced by multiparametric analysis of genes and proteins, with protein patterns currently considered to offer better diagnostic possibilities than a single biomarker (Rosenblatt et al., 2008).

This change in view has been motivated in part by the abundance of recent research showing that aberrant subcellular localizations are associated with many diseases: cancer (Fernandes et al., 2009, Geerts et al., 2007, Knostman et al., 2007, Mezzanzanica and D., 0000, Perrone et al., 2007, Ralhan et al., 2010, Tang et al., 2010), heart disease (Bedard et al., 2011, Obrenovich et al., 2006), kidney disease (Nishibori et al., 2004), pulmonary fibrosis (Thomas et al., 2002), Alzheimer disease, etc. In Nishibori et al. (2004), for instance, subcellular localization of three missense mutants located in the proximal C-terminus part of the podocin protein (NPHS2) were found to be greatly altered in patients with steroid-resistant nephrotic syndrome. In Knostman et al. (2007) sodium/iodide symporter (NIS) expression, subcellular localization, and function were analyzed in MCF-7 human breast cancer cells. The authors found that NIS and intercellular localization are associated with pAkt expression in human breast cancer tissues, and in Perrone et al. (2007) COX-2 expression and its subcellular localization in lobular in situ neoplasia (LIN) of the breast was assessed as a candidate biomarker for breast cancer. In Ralhan et al. (2010) the subcellular localization of EpEx and Ep-ICD in the human colon adenocarcinoma cell line CX-1 was examined using immunofluorescence. Nuclear and cytoplasmic Ep-ICD expression was increased in cancers of the breast, prostate, head, neck, and esophagus compared to their corresponding normal tissues that showed membrane localization of the protein. In Bachmann, Straume, Puntervoll, Kalvenes, and Akslen (2005) alterations in the expression and subcellular localization of cell adhesion markers were found to be important in the development and progression of melanocytic tumors, and it is well-known that activated leukocyte cell adhesion molecule (ALCAM) is expressed at the cell surface of epithelial ovarian cancer cell lines (Piazza et al., 2005). Moreover, in Bedard et al. (2011) it was discovered that Mutations in the gene encoding zinc finger of the cerebellum protein 3 (ZIC3) can cause congenital heart defects. Cytosolic proteins, such as G protein-coupled receptor kinases (GRKs), like GRK2, are well-characterized in the heart, and G protein-couple receptor desensitization is emerging as a key feature in several cardiovascular diseases. GRKs are also found in cerebral tissues, and significant increase in GRK2 immunoreactivity endothelial cells has been found in patients with Alzheimer disease. In Obrenovich et al. (2006), the authors explored cellular and subcellular localization by immunoreactivity of GRK2 and demonstrated that the ultrastructural localization and overexpression of GRK2 occurs during the early stages of damage in aged humans and in Alzheimer disease cases.

As indicated above, protein subcellular localization is also becoming increasing important as a prognostic indicator (Bachmann et al., 2005, Moreira et al., 2010, Surowiak et al., 2006). There is considerable evidence, for instance, that the loss of BLCAP expression is associated with tumor progression. In Moreira et al. (2010) the authors were able to classify urothelial carcinomas into four groups based on levels of expression and subcellular localization of BLCAP protein. Their findings suggested that BLCAP may have prognostic value in bladder cancer. Subcellular localization may even be useful in the prediction of chemotherapy response to ovarian cancer (Surowiak et al., 2006).

In the last two decades several research groups have investigated automated cell phenotype image classification (Boland et al., 1998, Boland and Murphy, 2001, Conrad et al., 2004, Danckaert et al., 2002, Lin et al., 2007, Nanni and Lumini, 2008, Perner et al., 2002). Work that has focused on training generic classifiers with image descriptors include (Chen et al., 2006, Glory and Murphy, 2007, Glory et al., 2008). The most common image recognition methods used for this problem are Haralick texture measures (Huang & Murphy, 2004), Zernike moments and threshold adjacency statistics (Hamilton, Pantelic, Hanson, & Teasdale, 2007), Gabor filters, and a number of other ad hoc measures (Conrad et al., 2004). More recently, new methods have been developed that are based on fusion at the feature and score level. At the feature level, vectors are created by concatenating several descriptors (Chen et al., 2005, Hamilton et al., 2007, Huang and Murphy, 2004). A multi-resolution approach, proposed in Chebira et al. (2007), trained classifiers using descriptors extracted from different resolution spaces. Examples of fusion methods used at the score level include (Lin et al., 2007).

The aim of this work is to assess the discriminant power in this problem using a recent method for extracting features from the co-occurrence matrix (CM), where new features are extracted considering the co-occurrence matrix as a 3D shape (SHAPE). This set of features works rather poorly with respect to the standard Haralick feature set (HAR), but we have discovered that SHAPE can be coupled with HAR to boost performance to levels similar to that obtained by recent texture descriptor approaches (Nanni et al., 2013a, Nanni et al., 2013b). Moreover, when this new set of features is coupled with state-of-the-art global descriptors, further improvements are obtained. This is demonstrated by combining our proposed set of features based on the co-occurrence matrix (SHAPE) with descriptors proposed in Paci et al. (2013). Since the Q-statistic between HAR and SHAPE is quite low, thus confirming the low correlation between the information extracted by HAR and SHAPE, we were motivated to investigate their fusion. As expected, fusion of HAR and SHAPE produced better results. Our experiments were validated across five databases: (1) 2D HeLa dataset (Boland & Murphy, 2001), (2) locate endogenous mouse sub-cellular organelles, (3) locate transfected mouse subcellular organelles (Fink et al., 2006, Hamilton et al., 2007), (4) Chinese Hamster Ovary (Boland et al., 1998), and (5) the RNAi dataset (Zhang & Pham, 2011).

The rest of this paper is organized as follows. In Section 2 we describe our proposed approach using descriptors based on CM along with all the other descriptors combined with our system or used for comparison purposes. In Section 3 we describe the five benchmark databases used to validate our approach. In Section 4, we present our experimental results, and in Section 5 we conclude our paper with a few suggestions for further research.

Section snippets

Proposed approach

In this paper we compare recently proposed descriptors based on the co-occurrence matrix (CM). Our goal is to enhance the performance of the standard Haralick descriptor with a recent set of proposed features that can also be extracted from the co-occurrence matrix (Nanni et al., 2013a). Our tests using the Q-statistic show that the different approaches for extracting features from the co-occurrence matrix provide different information. This indicates that combining these descriptors should

Datasets

In this section we describe the datasets used in our experimental section. Each of the datasets includes subgroups of either subcellular structures, such as organelles, or cell classes, such as different cell lines, that were used as classes to both train and test the classifiers.

Experimental results

For the evaluation protocol, we use 5-fold cross-validation for testing each of the texture descriptors. The performance indicator is accuracy since that is the indicator widely used in papers published using these datasets.

The first experiment was aimed at establishing which version of the co-occurrence matrix works best to classify the images in the five datasets. Table 1 reports different approaches (see Section 2.1 for a description of these features):

  • HAR: the standard Haralick method.

Discussion and conclusion

In this paper we focused on the problem of cell phenotype image classification by studying a range of features that can be extracted from the co-occurrence matrix (CM). Starting from the analysis of the best methods and descriptors proposed in the literature, we compared a number of different approaches and demonstrated the power of combining standard Haralick features with a set of novel features derived from the CM using five different benchmark datasets.

We believe the descriptors described

Acknowledgment

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 284607.

References (59)

  • S. Ameling et al.

    Texture-based polyp detection in colonoscopy

  • I.M. Bachmann et al.

    Importance of p-cadherin, b-catenin, and wnt5a/frizzled for progression of melanocytic tumors and prognosis in cutaneous melanoma

    Clinical Cancer Research

    (2005)
  • J.E. Bedard et al.

    Identification of a novel ZIC3 isoform and mutation screening in patients with heterotaxy and congenital heart disease

    Plos One

    (2011)
  • M.V. Boland et al.

    Automated recognition of patterns characteristic of subcellular structures in fluorescence microscopy images

    Cytopathology

    (1998)
  • M.V. Boland et al.

    A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells

    Bioinformatics

    (2001)
  • C.-H. Chan et al.

    Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2013)
  • A. Chebira et al.

    A multiresolution approach to automated classification of protein subcellular location images

    BMC Bioinformatics

    (2007)
  • X. Chen et al.

    Automated interpretation of subcellular patterns in fluorescence microscope images for location proteomics

    Cytometry

    (2006)
  • L. Chen et al.

    VFDB: A reference database for bacterial virulence factors

    Nucleic Acids Research

    (2005)
  • C. Conrad et al.

    Automatic identification of subcellular phenotypes on human cell arrays

    Genome Research

    (2004)
  • A. Danckaert et al.

    Automated recognition of intracellular organelles in confocal microscope images

    Traffic

    (2002)
  • A.P. Fernandes et al.

    Expression profiles of thioredoxin family proteins in human lung cancer tissue: Correlation with proliferation and differentiation

    Histopathology

    (2009)
  • J.L. Fink et al.

    LOCATE: A protein subcellular localization database

    Nucleic Acids Research

    (2006)
  • D. Geerts et al.

    Expression of prenylated rab acceptor 1domain family, member 2 (praf2) in neuroblastoma: Correlation with clinical features, cellular localization, and cerulenin-mediated apoptosis regulation

    Human Cancer Biology

    (2007)
  • Ghidoni, S., Cielniak, G., & Menegatti, E. (2012). Texture-based crowd detection and localisation. In The 12th...
  • E. Glory et al.

    Automated comparison of protein subcellular location patterns between images of normal and cancerous tissues

    PMC

    (2008)
  • N. Hamilton et al.

    Fast automated cell phenotype classification

    BMC Bioinformatics

    (2007)
  • N.A. Hamilton et al.

    Statistical and visual differentiation of subcellular imaging

    BMC Bioinformatics

    (2009)
  • R.M. Haralick

    Statistical and structural approaches to texture

    Proceedings of the IEEE

    (1979)
  • Cited by (0)

    View full text