A comparison of methods for extracting information from the co-occurrence matrix for subcellular classification
Introduction
The development of new and improved tools for automatic analysis and classification has already proven beneficial in clinical practice and in medical and biological research (Hamilton et al., 2009, Murphy, 2006). In Karkanis, Iakovidis, Maroulis, Karras, and Tzivras (2003), for example, linear discriminant analysis of wavelet features, used in applications that range from traffic incident detection (Samant & Adeli, 2000) to face identification and verification (Shen, Bai, & Fairhurst, 2007), recently proved effective in the detection of tumors in endoscopic images. In Ameling, Wirth, Paulus, Lacey, and Vilarino (2009), image texture information successfully discriminated polyps in colonoscopy images. The Local Binary Pattern (LBP) operator Ojala, Pietikainen, and Maeenpaa (2002), which has distinguished itself from other image texture operators by its simplicity, effectiveness, and robustness, has also proven very good at detecting a variety of tumors and masses. In Vécsei, Amann, Hegenbart, Liedlgruber, and Uhl (2011), for example, LBP was used to assign a Marsh-like score to endoscopical images of pediatric celiac diseases, thus providing concrete help for pathologists. In Oliver, Lladó, Freixenet, and Martí (2007) a Support Vector Machine (SVM) was coupled with the LBP operator to distinguish real masses from normal parenchyma in mammographic images, thus reducing the incidence of false positive samples. In Unay and Ekin (2008), for instance, LBP was used to explore brain magnetic resonance data, and in Nanni and Lumini (2008) the authors demonstrated how a combination of LBP with other texture descriptors is effective in classifying different cell phenotypes using SVM.
In this paper we focus on cell phenotype image classification, a bioimaging problem that is concerned with finding the location of protein expressions within a cell. Understanding the function of proteins at the cellular level is a major goal in biology (Chebira et al., 2007) since the knowledge of the subcellular locations of a protein is useful in understanding its specific function and in describing a cell’s behavior under different conditions. Not only is protein localization an important topic in biology, but it is also critical in the design of drug screening systems, drug discovery, and recently in the diagnosis and prognosis of many diseases. In fact, it has been suggested that the old paradigm of one protein, one biomarker, one clinical decision no longer holds and is being replaced by multiparametric analysis of genes and proteins, with protein patterns currently considered to offer better diagnostic possibilities than a single biomarker (Rosenblatt et al., 2008).
This change in view has been motivated in part by the abundance of recent research showing that aberrant subcellular localizations are associated with many diseases: cancer (Fernandes et al., 2009, Geerts et al., 2007, Knostman et al., 2007, Mezzanzanica and D., 0000, Perrone et al., 2007, Ralhan et al., 2010, Tang et al., 2010), heart disease (Bedard et al., 2011, Obrenovich et al., 2006), kidney disease (Nishibori et al., 2004), pulmonary fibrosis (Thomas et al., 2002), Alzheimer disease, etc. In Nishibori et al. (2004), for instance, subcellular localization of three missense mutants located in the proximal C-terminus part of the podocin protein (NPHS2) were found to be greatly altered in patients with steroid-resistant nephrotic syndrome. In Knostman et al. (2007) sodium/iodide symporter (NIS) expression, subcellular localization, and function were analyzed in MCF-7 human breast cancer cells. The authors found that NIS and intercellular localization are associated with pAkt expression in human breast cancer tissues, and in Perrone et al. (2007) COX-2 expression and its subcellular localization in lobular in situ neoplasia (LIN) of the breast was assessed as a candidate biomarker for breast cancer. In Ralhan et al. (2010) the subcellular localization of EpEx and Ep-ICD in the human colon adenocarcinoma cell line CX-1 was examined using immunofluorescence. Nuclear and cytoplasmic Ep-ICD expression was increased in cancers of the breast, prostate, head, neck, and esophagus compared to their corresponding normal tissues that showed membrane localization of the protein. In Bachmann, Straume, Puntervoll, Kalvenes, and Akslen (2005) alterations in the expression and subcellular localization of cell adhesion markers were found to be important in the development and progression of melanocytic tumors, and it is well-known that activated leukocyte cell adhesion molecule (ALCAM) is expressed at the cell surface of epithelial ovarian cancer cell lines (Piazza et al., 2005). Moreover, in Bedard et al. (2011) it was discovered that Mutations in the gene encoding zinc finger of the cerebellum protein 3 (ZIC3) can cause congenital heart defects. Cytosolic proteins, such as G protein-coupled receptor kinases (GRKs), like GRK2, are well-characterized in the heart, and G protein-couple receptor desensitization is emerging as a key feature in several cardiovascular diseases. GRKs are also found in cerebral tissues, and significant increase in GRK2 immunoreactivity endothelial cells has been found in patients with Alzheimer disease. In Obrenovich et al. (2006), the authors explored cellular and subcellular localization by immunoreactivity of GRK2 and demonstrated that the ultrastructural localization and overexpression of GRK2 occurs during the early stages of damage in aged humans and in Alzheimer disease cases.
As indicated above, protein subcellular localization is also becoming increasing important as a prognostic indicator (Bachmann et al., 2005, Moreira et al., 2010, Surowiak et al., 2006). There is considerable evidence, for instance, that the loss of BLCAP expression is associated with tumor progression. In Moreira et al. (2010) the authors were able to classify urothelial carcinomas into four groups based on levels of expression and subcellular localization of BLCAP protein. Their findings suggested that BLCAP may have prognostic value in bladder cancer. Subcellular localization may even be useful in the prediction of chemotherapy response to ovarian cancer (Surowiak et al., 2006).
In the last two decades several research groups have investigated automated cell phenotype image classification (Boland et al., 1998, Boland and Murphy, 2001, Conrad et al., 2004, Danckaert et al., 2002, Lin et al., 2007, Nanni and Lumini, 2008, Perner et al., 2002). Work that has focused on training generic classifiers with image descriptors include (Chen et al., 2006, Glory and Murphy, 2007, Glory et al., 2008). The most common image recognition methods used for this problem are Haralick texture measures (Huang & Murphy, 2004), Zernike moments and threshold adjacency statistics (Hamilton, Pantelic, Hanson, & Teasdale, 2007), Gabor filters, and a number of other ad hoc measures (Conrad et al., 2004). More recently, new methods have been developed that are based on fusion at the feature and score level. At the feature level, vectors are created by concatenating several descriptors (Chen et al., 2005, Hamilton et al., 2007, Huang and Murphy, 2004). A multi-resolution approach, proposed in Chebira et al. (2007), trained classifiers using descriptors extracted from different resolution spaces. Examples of fusion methods used at the score level include (Lin et al., 2007).
The aim of this work is to assess the discriminant power in this problem using a recent method for extracting features from the co-occurrence matrix (CM), where new features are extracted considering the co-occurrence matrix as a 3D shape (SHAPE). This set of features works rather poorly with respect to the standard Haralick feature set (HAR), but we have discovered that SHAPE can be coupled with HAR to boost performance to levels similar to that obtained by recent texture descriptor approaches (Nanni et al., 2013a, Nanni et al., 2013b). Moreover, when this new set of features is coupled with state-of-the-art global descriptors, further improvements are obtained. This is demonstrated by combining our proposed set of features based on the co-occurrence matrix (SHAPE) with descriptors proposed in Paci et al. (2013). Since the Q-statistic between HAR and SHAPE is quite low, thus confirming the low correlation between the information extracted by HAR and SHAPE, we were motivated to investigate their fusion. As expected, fusion of HAR and SHAPE produced better results. Our experiments were validated across five databases: (1) 2D HeLa dataset (Boland & Murphy, 2001), (2) locate endogenous mouse sub-cellular organelles, (3) locate transfected mouse subcellular organelles (Fink et al., 2006, Hamilton et al., 2007), (4) Chinese Hamster Ovary (Boland et al., 1998), and (5) the RNAi dataset (Zhang & Pham, 2011).
The rest of this paper is organized as follows. In Section 2 we describe our proposed approach using descriptors based on CM along with all the other descriptors combined with our system or used for comparison purposes. In Section 3 we describe the five benchmark databases used to validate our approach. In Section 4, we present our experimental results, and in Section 5 we conclude our paper with a few suggestions for further research.
Section snippets
Proposed approach
In this paper we compare recently proposed descriptors based on the co-occurrence matrix (CM). Our goal is to enhance the performance of the standard Haralick descriptor with a recent set of proposed features that can also be extracted from the co-occurrence matrix (Nanni et al., 2013a). Our tests using the Q-statistic show that the different approaches for extracting features from the co-occurrence matrix provide different information. This indicates that combining these descriptors should
Datasets
In this section we describe the datasets used in our experimental section. Each of the datasets includes subgroups of either subcellular structures, such as organelles, or cell classes, such as different cell lines, that were used as classes to both train and test the classifiers.
Experimental results
For the evaluation protocol, we use 5-fold cross-validation for testing each of the texture descriptors. The performance indicator is accuracy since that is the indicator widely used in papers published using these datasets.
The first experiment was aimed at establishing which version of the co-occurrence matrix works best to classify the images in the five datasets. Table 1 reports different approaches (see Section 2.1 for a description of these features):
- •
HAR: the standard Haralick method.
- •
Discussion and conclusion
In this paper we focused on the problem of cell phenotype image classification by studying a range of features that can be extracted from the co-occurrence matrix (CM). Starting from the analysis of the best methods and descriptors proposed in the literature, we compared a number of different approaches and demonstrated the power of combining standard Haralick features with a set of novel features derived from the CM using five different benchmark datasets.
We believe the descriptors described
Acknowledgment
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 284607.
References (59)
- et al.
Automated subcellular location determination and high throughput microscopy
Developmental Cell
(2007) - et al.
Bladder cancer-associated protein, a potential prognostic biomarker in human bladder cancer
Molecular & Cellular Proteomics
(2010) - et al.
A reliable method for cell phenotype image classification
Artificial Intelligence in Medicine
(2008) - et al.
Local binary patterns variants as texture descriptors for medical image analysis
Artificial Intelligence in Medicine
(2010) - et al.
Disease-causing missense mutations in NPHS2 gene alter normal nephrin trafficking to the plasma membrane
Kidney International
(2004) - et al.
Mining knowledge for hep-2 cell image classification
Artificial Intelligence in Medicine
(2002) - et al.
Gabor wavelets and general discriminant analysis for face identification and verification
Image and Vision Computing
(2007) - et al.
Histogram modified local contrast enhancement for mammogram images
Applications of Soft Computing
(2011) - et al.
Automated marsh-like classification of celiac disease in children using local texture operators
Computers in Biology and Medicine
(2011) - Akhloufi, M., & Bendada, A. (2010). Locally adaptive texture features for multispectral face recognition. In IEEE...