Elsevier

Neurocomputing

Volume 134, 25 June 2014, Pages 254-261
Neurocomputing

Cell phenotyping in multi-tag fluorescent bioimages

https://doi.org/10.1016/j.neucom.2013.08.043Get rights and content

Abstract

Multi-tag bioimaging systems have recently emerged as powerful tools which provide spatiotemporal localization of several different proteins in the same tissue specimen. The analysis of such multivariate bioimages requires sophisticated analytical methods that extract a molecular signature of various types of cells and assist in analyzing interaction behaviors of functional protein complexes. Previous studies were mainly focused on pixel-level analysis which essentially ignore cellular structures as units which can be crucial when analyzing cancerous cells. In this paper, we present a framework in order to overcome these limitations by incorporating cell-level analysis. We use this framework to identify cell phenotypes based on their high-dimensional co-expression profiles contained within the images generated by the robotically controlled TIS microscope installed at Warwick. The proposed paradigm employs a refined cell segmentation algorithm followed by a locality preserving nonlinear embedding algorithm which is shown to produce significantly better cell classification and phenotype distribution results as compared to its linear counterpart.

Introduction

Bioimage computing is rapidly emerging as a new branch of computational biology which deals with the processing and analysis of bioimages as well as the mining and exploration of useful information present in vast amounts of image data generated regularly in biology labs around the world. Image based systems biology promises to provide functional localization in space and time [1]. Recent advances in single-molecule detection using fluorescence microscopy imaging technologies allow image analysis to provide access to invisible yet reproducible information extracted from bioimages [2]. Highly multiplexed fluorescence imaging techniques such as multi-epitope ligand cartography (MELC) or toponome imaging system (TIS) [3] generate massive amounts of multi-channel image data, where each individual channel can provide information about the abundance level of a specific protein molecule localized within an individual cell using the corresponding tag. Such high-dimensional representation of multiple co-localized protein expression levels demands for sophisticated analytical methods to extract molecular signatures of diseases such as cancer in order not only to allow us to understand the biological processes behind cancer development but also to aid us in early diagnosis and appropriate treatment of cancer.

For visualization of such a high dimensional data in low-dimensional subspace, there exists extensive literature on manifold learning [4], [5], [6], where n-dimensional data vectors can be transformed into 2- or 3-dimensions. This is motivated by the idea that data may lie on a low-dimensional manifold embedded inside high-dimensional space. Since the main objective of dimensionality reduction (DR) methods is to minimize the reconstruction error of underlying low-dimensional sub-manifold rather than visual discrimination and discovery of various structures that may exist in the data, therefore they are often sub-optimal in discovery of clusters and classification [7].

A novel approach for visualization of dataset with different underlying manifold is t-SNE [8], a variation of Stochastic Neighbor Embedding (SNE) [9] with an easier to optimize cost function using Student t-distribution for similarity calculation. Despite the non-convexity of the t-SNE cost function, t-SNE representation of local similarities provides better visual separation of the data clusters than other non-linear DR methods. However, a major limitation of DR methods including t-SNE is that they scale exponentially with the number of data points. This makes them infeasible for large datasets. In contrast, prototype based methods such as Self-Organizing Map (SOM) [10] exploit the knowledge encoded in data representatives (prototypes) for visualization which often demonstrates very revealing structures present in the data.

A traditional approach to analyze TIS images is to binarize bioimages by applying a fixed threshold to each image of the stack [11]. This simplification is lossy, biased and subjective. To avoid these limitations several algorithms have recently been proposed [12], [13], [14]. These algorithms exploit pixel level relationships to identify molecular co-expression patterns (MCEPs). Pixel-level analysis of such high-dimensional data is effective in identifying MCEPs, however such low-level processing has mainly two limitations: (1) it does not take into account cellular structures which are very important in identifying cancer stem cells; (2) it is intensity dependent and hence may vary substantially in between different stacks. In this paper, we propose to overcome these limitations by making a departure from pixel-level to cell-level analysis. We first segment the stacks to restrict our analysis to cellular contents only. Then for each cell, we calculate a fixed size feature vector, and finally we use this feature vector to mine some interesting functional relationships between cells.

In this paper, we extend our previous work [15] to mine for cell phenotypes based on their high-dimensional protein co-expression profiles. We make the following important contributions: (1) we perform our analysis at the cell level marking a departure from the existing intensity dependent approaches employing pixel-level analysis [3], [13], [14]; (2) we show that the raw protein co-expression vectors have a nonlinear high-dimensional structure which can be effectively visualized using a symmetric neighborhood embedding approach; (3) we demonstrate the effectiveness of the nonlinear embedding coordinates for (a) classifying the tissue type at a cellular level as compared to principal component analysis (PCA), its linear embedding counterpart, and (b) mining the cell phenotypes in an exploratory clustering setup using Affinity Propagation Clustering (APC) [16], Agglomerative Hierarchical Clustering (AHC) [17] and SOM [10].

Section snippets

The mining framework

The framework presented in this work consists of three stages: preprocessing involving stack alignment and cell segmentation, non-linear low-dimensional embedding, and unsupervised clustering.

Experimental results and discussion

In this study, we used three TIS image stacks obtained from two human colon tissue specimens. We obtained one stack from the cancerous sample and other two stacks from two different visual fields of the normal sample. The tissue samples are verified to be normal or cancerous by independent expert pathologists. A library of 26 tags is used to generate the stacks of multi-tag microscopic bioimages using TIS [3]. Each image is of size 1056 × 1026 with a pixel resolution of 206 × 206 nm/pixel.

The

Conclusions

We presented a paradigm for cell-level mining of molecular signatures in multi-tag bioimages using a nonlinear embedding approach. In contrast to traditional pixel-level analysis approaches, we used cell-level analysis. We showed that the symmetric neighborhood embedding outperforms the original high-dimensional raw protein expression vectors in terms of its ability to discriminate between normal and cancer tissue samples on the basis of their phenotypic distributions. Our future work will

Acknowledgments

Research reported in this publication is partially supported by the Qatar National Research Fund (QNRF) under the award number NPRP 5-1345-1-228. The authors would like to thank Ahmad Humayun for his collaboration and useful discussions. A.M. Khan acknowledges the financial support provided by the Warwick Postgraduate Research Scholarship (WPRS) program and the Department of Computer Science at the University of Warwick, UK. S.A. Raza acknowledges the financial support provided by the

Adnan Mujahid Khan received bachelor′s degree in computer science from National University of Computer and Emerging Sciences, Pakistan, in 2005 and master′s degree in computer systems engineering from Ghulam Ishaq Khan Institute, Pakistan, in 2007. He is currently a postgraduate student in Department of Computer Science, University of Warwick, UK. His research interests include biological image processing and analysis, computer vision and machine learning.

References (23)

  • S. Megason et al.

    Imaging in systems biology

    Cell

    (2007)
  • G. Danuser

    Computer vision in cell biology

    Cell

    (2011)
  • W. Schubert et al.

    Analyzing proteome topology and function by automated multidimensional fluorescence microscopy

    Nat. Biotechnol.

    (2006)
  • L.K. Saul et al.

    Think globally, fit locallyunsupervised learning of low dimensional manifolds

    J. Mach. Learn. Res.

    (2003)
  • J. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • T.F. Cox et al.
    (2000)
  • M.-H. Yang, Face recognition using extended isomap, in: Proceedings of the 2002 International Conference on Image...
  • L. Van der Maaten et al.

    Visualizing data using t-SNE

    J. Mach. Learn. Res.

    (2008)
  • G. Hinton et al.

    Stochastic neighbor embedding

    Adv. Neural Inf. Process. Syst.

    (2002)
  • T. Kohonen

    Self-Organizing Maps

    (1995)
  • Bhattacharya Sayantan et al.

    Toponome imaging system in situ protein network mapping in normal and cancerous colon from the same patient reveals more than five-thousand cancer specific protein clusters and their subcellular annotation by using a three symbol code

    J. Proteome Res.

    (2010)
  • Adnan Mujahid Khan received bachelor′s degree in computer science from National University of Computer and Emerging Sciences, Pakistan, in 2005 and master′s degree in computer systems engineering from Ghulam Ishaq Khan Institute, Pakistan, in 2007. He is currently a postgraduate student in Department of Computer Science, University of Warwick, UK. His research interests include biological image processing and analysis, computer vision and machine learning.

    Shan-e-Ahmed Raza graduated in 2008 from University of Engineering and Technology, Taxila, in Electrical Engineering. He graduated from Pakistan Institute of Engineering and Applied Sciences, Islamabad, in MS Systems Engineering. He joined University of Warwick, UK, in 2011 as a Ph.D., student in Computer Science Department. He is currently in the third year of his Ph.D.,

    Mike Khan Ph.D., FRCP, is an Associate Professor of Medicine at the University Hospitals of Coventry and Warwickshire and former Head of Molecular Medicine at the University of Warwick. He was elected as a fellow of the Royal College of Physicians in 2002 and as a member of the Association of Physicians in 2004. His main research interests have been in the regulation of tissue growth and plasticity during development and in adult tissue homeostasis. Currently, he is collaborating with mathematicians and others in a systems biology approach to define key-functional gene and protein networks involved in beta cell mass regulation and the origin of new beta cells in the adult. To do this the group has collaborated in the development of powerful new systems for conditional gene expression in vivo and for imaging of combinatorial in situ protein expression using TIS microscopy.

    Nasir Rajpoot (Senior Member, IEEE) received his Ph.D., in Computer Science from the University of Warwick, UK, in 2001. He was a postgraduate research fellow in the Applied Mathematics program at Yale University, USA, during 1998–2000. Prior to his PhD, he obtained his first degree in Computer Science from the Zakariya University, Pakistan, in 1994 and his MSc in Systems Engineering from the Quaid-e-Azam University, Pakistan, in 1996, both with the highest distinction. His group at Warwick has been internationally recognized for its research in digital pathology image analysis and computational biology. His group has developed novel algorithms for analysis and modeling of nuclei and morphological patterns in histopathology images, including classification of cells via unsupervised learning of shape manifolds, segmentation of tumor regions, and detection of mitotic cells in breast histopathology images. A recent focus of research in his lab has been on algorithms for computerized analysis and modeling of sub-cellular objects in multi-channel fluorescence microscopy images. Dr. Rajpoot has recently chaired international meetings in the area of histopathology image analysis (for example CHiP@ISBI′2008, OPTIMHisE′2009, MIUA′2010, PRinHIMA′2010, HIMA@MICCAI′2011, and HIMA@MICCAI′2012). He was the guest co-editor for a special issue of Machine Vision and Applications on Microscopy Image Analysis and its Applications in Biology in 2012.

    View full text