Group sparsity model for stain unmixing in brightfield multiplex immunohistochemistry images

https://doi.org/10.1016/j.compmedimag.2015.04.001Get rights and content

Highlights

  • Unmixing RGB image into more than three colors is hardly studied in literature.

  • A novel IHC image unmixing algorithm is proposed based on group sparsity model.

  • Biological biomarker co-localization information is used as prior in this model.

  • It unmixes more than three dyes while preserving the biological constraints.

  • The new algorithm demonstrates better results than the existing strategies.

Abstract

Multiplex immunohistochemistry (IHC) staining is a new, emerging technique for the detection of multiple biomarkers within a single tissue section. The initial key step in multiplex IHC image analysis in digital pathology is of tremendous clinical importance due to its ability to accurately unmix the IHC image and differentiate each of the stains. The technique has become popular due to its significant efficiency and the rich diagnostic information it contains. The intriguing task of unmixing a three-channel CCD color camera acquired RGB image into more than three colors is very challenging, and to the best of our knowledge, hardly studied in academic literature.

This paper presents a novel stain unmixing algorithm for brightfield multiplex IHC images based on a group sparsity model. The proposed framework achieves robust unmixing for more than three chromogenic dyes while preserving the biological constraints of the biomarkers. Typically, a number of biomarkers co-localize in the same cell parts named priori. With this biological information in mind, the number of stains at one pixel therefore has a fixed up-bound, i.e. equivalent to the number of co-localized biomarkers. By leveraging the group sparsity model, the fractions of stain contributions from the co-localized biomarkers are explicitly modeled into one group to yield the least square solution within the group. A sparse solution is obtained among the groups since ideally only one group of biomarkers is present at each pixel. The algorithm is evaluated on both synthetic and clinical data sets, and demonstrates better unmixing results than the existing strategies.

Introduction

As one of the most life-threatening group of diseases, cancer causes millions of deaths each year. Traditional TNM staging system is often used to provide prognostic information, however, it relies solely on the tumor cells and leads to significant variation of outcomes within the same tumor stage. Therefore, it is of great clinical importance to have a reliable, reproducible, clinically relevant and biologically meaningful system for cancer identification and staging in contrast to TNM [1]. Recently, the study of immune regulation within the tumor microenvironment has gained tremendous attention in cancer research [2], [1], [3] and it has been evidenced that the immune cells are associated with the clinical outcome of certain cancer types [2]. A quantitative and objective evaluation of different types of immune cells within the tumor microenvironment hence needs to be achieved in both research and clinical studies, wherein digital pathology plays an important role.

While the popular primary staining Hematoxylin and Eosin (H&E) slides are widely investigated in digital pathology to study the tissue morphologies, classify the cancer types, or grade the cancer [4], [5], [6], [7], [8], [9], the special staining techniques such as immunohistochemistry staining also convey important information. A multiplex immunohistochemistry (IHC) slide has the potential advantage of simultaneously identifying multiple biomarkers in one tissue section as opposed to single biomarker labeling in multiple slides (see Fig. 1 for example). Therefore, multiplex immunohistochemistry staining is often used for simultaneous assessment of multiple biomarkers in cancerous tissue. For example, tumors often contain infiltrates of immune cells, which may prevent the development of tumors, or favor the outgrowth of tumors [2]. In this scenario, multiple biomarkers are used to target different types of immune cells, and then using the population distribution of each immune cell type to study the clinical outcome of the patients. The biomarkers of the immune cells are stained by using different chromogenic dyes. The correct unmixing of the IHC digital image into its individual constituent dyes for each biomarker, while also, obtaining the proportion of each dye in the color mixture remain prerequisites for accurate detection and classification of immune cells in multiplex IHC image analysis.

A typical digital pathology workflow for multiplex staining, and stain unmixing is shown in Fig. 2. A tissue slide is stained with the multiplex assay. The stained slide is then imaged using a CCD color camera mounted on a microscope, or a scanner. The acquired RGB color image is a mixture of the underlying co-localized biomarker expressions. Several techniques have been proposed in literature to decompose each pixel of the RGB image into a collection of constituent stains and the fractions of the contributions from each of them, that is, to convert the RGB image into biomarker-specific image channels. Stain unmixing is therefore a prerequisite step for the application of the following image analysis algorithms: cell detection, segmentation and classification for each biomarker. Ruifrok et al. developed an unmixing method called color deconvolution [10] to unmix the RGB image with up to three stains in the converted optical density space. Given the reference color vectors xi  R3 of the pure stains, the method assumes that each pixel of the color mixture y  R3 is a linear combination of the pure stain colors and solves a linear system to obtain the combination weights b  RM. The linear system is denoted as y = Xb, where X = [x1, …, xM], M  3 is the matrix of reference colors. This technique is currently most widely used in the domain of digital pathology. However, the maximum number of stains which can be resolved is limited to three, as the linear system is deficient for not having enough equations in cases of more than three stains. A multilayer perceptron learning based technique has been proposed in [11] for three color brightfield image unmixing. In [12], Rabinovich et al. formulated the color unmixing problem into non-negative matrix factorization and proposed a system capable of performing the color decomposition in a fully automated manner, wherein no reference stain color selection is required. Again, these methods have the same limitation when dealing with large stain numbers due to solving y = Xb. To the best of our knowledge, the method of unmixing brightfield IHC image with more than three stains is not available in literature. In order to compare with Ruifrok's method, we divide the color space into several systems with up to three colors in each system based on the nearest color matching of each pixel to one of the systems. Ruifrok's method can therefore be used in solving each individual system. Due to the independent assignment of each pixel into different systems, the spatial continuity is lost in the unmixed images and artifacts such as holes are observed. However, this is the most straightforward modification of Ruifrok's method to be feasible on more than three color multiplex brightfield image unmixing for comparison purposes.

Alternatively, there exists another class of methods for multi-spectral image unmixing that works for a larger number of stain colors [13], [14], [15], [16], [17]. In fact, the multi-spectral image differs from the RGB image in terms of image acquisition. A multi-spectral imaging system is used to capture the image using a set of spectral narrow-band filters, rather than using the CCD color camera. The number of filters K can be as few as a dozen to as many as a hundred, and ultimately lead to a multi-channel image that provides much richer information than the bright field RGB image. The linear system constructed from it is always an over-determined system with X being a K × M (K  M) matrix that leads to a unique solution. However, the scanning process in the multi-spectral imaging system is time consuming and provides only a single field of view, manually selected by a trained technician, rather than a whole slide scan. As an example of the multi-spectral imaging unmixing, the two-stage methods [14], [15] are developed in the remote sensing domain to first learn the reference colors from the image context and then use them to unmix the image. Sparse models have been widely used in radiology image analysis for image registration, segmentation, shape modeling and low dose CT analysis, etc. [18], [19], [20], [21], [22], [23], [24], [25], [26] and demonstrate improved performance with respect to the classical models. More recently, a sparse model is proposed by Greer in [17] for high dimensional multi-spectral image unmixing. It adopts the L0 norm to regularize the combination weights b of the reference colors hence leads to a solution that only a small number of reference colors are contributed to the stain color mixture. These serve as valuable sources of inspiration for selecting regularization terms for the linear system. However, the method proposed in [17] is also designed for multi-spectral image and no prior biological information about the biomarkers are used in that framework which may lead to undesired solution for real data.

In this paper, we propose a novel color unmixing algorithm for multiplex IHC image (scanned using CCD color camera) that can handle more than three stain colors, and maintain the biological properties of the biomarkers. Intuitively, the unmixing algorithm for the multiplex IHC image should work as following. (1) Only one group of stains has non-zero contribution in the color mixture for each pixel. (2) Within that group, the fractions of the contributions from each constituent stain should be correctly estimated. These conditions motivate us to model the unmixing problem within the group sparsity [27] framework so as to ensure the sparsity among the group, but non-sparsity within the group.

Section snippets

Methodology

In this section, we present the methodology of our algorithm. We begin with illustrating the basic framework in Fig. 3 using the following example. In the analysis of cancerous tissues, different biomarkers are specified to one or more types of immune cells. For instance, CD3 is a known universal marker for all T-cells, and CD8 only stains the membranes of the cytotoxic T-cells. FoxP3 marks only the regulatory T-cells in the nuclei, and hematoxylin (HTX) stains all the nuclei. A summary of the

Experiments

In this section, we empirically validate our unmixing algorithm and compare it against the existing techniques.

Conclusion

In this paper, we introduced a novel color unmixing strategy for multiplexed bright field histopathology images based on a group sparsity model. The biological co-localization information of the biomarkers is explicitly defined in the regularization term to produce biologically meaningful unmixing results. The experiments of both synthetic and clinical data demonstrate the efficacy of the proposed algorithm in terms of accuracy and stability when compared to the existing techniques. A promising

References (30)

  • X. Zhang et al.

    Towards large-scale histopathological image analysis: Hashing-based image retrieval

    IEEE Trans Med Imaging

    (2015)
  • K. Nguyen et al.

    Structure and context in prostatic gland segmentation and classification

    (2012)
  • X. Zhang et al.

    Mining histopathological images via composite hashing and online learning

    (2014)
  • F. Xing et al.

    Robust selection-based sparse shape model for lung cancer image segmentation

    (2013)
  • M. Gurcan et al.

    Histopathological image analysis: a review

    IEEE Rev Biomed Eng

    (2009)
  • Cited by (18)

    • DNA sequencing using the RGB image sensor of a consumer digital color camera

      2022, Sensors and Actuators B: Chemical
      Citation Excerpt :

      Therefore, in spectroscopic analysis using an RGB image sensor, the number of components that can be identified or quantified simultaneously is limited to only three [13,14]. Notwithstanding the limitation of an RGB image sensor, in several studies, four or more components were identified or quantified by three-color detection using an RGB image sensor [15–21]. That achievement was possible because the temporal and spatial overlap of multiple components was avoided or suppressed.

    • Stain Color Adaptive Normalization (SCAN) algorithm: Separation and standardization of histological stains in digital pathology

      2020, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Hence, our algorithm is potential to be applied to other stains that satisfy Beer–Lambert law (i.e. trichrome and giemsa stain, periodic acid–schiff stain, alcian blue stain). Future studies are required to test the performance of this method for the normalization of stains that do not follow the Beer–Lambert law (e.g. some immunohistochemical stains [30]). In the future, the SCAN algorithm could be integrated into deep learning frameworks to increase the performance of CNNs designed to segment or classify the cellular structures within the histological tissue.

    • Artificial intelligence and the interplay between tumor and immunity

      2020, Artificial Intelligence and Deep Learning in Pathology
    • Deep Adversarial Network Based Stain Unmixing for Brightfield Multiplex Immunohistochemistry Images

      2023, Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
    View all citing articles on Scopus
    View full text