Weakly supervised histopathology cancer image segmentation and classification

doi:10.1016/j.media.2014.01.010

Medical Image Analysis

Volume 18, Issue 3, April 2014, Pages 591-604

https://doi.org/10.1016/j.media.2014.01.010 Get rights and content

Highlights

•
We propose a new learning method, multiple clustered instance learning (MCIL), along the line of weakly supervised learning.
•
The proposed MCIL simultaneously performs image classification (cancer vs. non-cancer image), segmentation, and clustering.
•
We embed clustering into MIL and derive a principled solution to performing the three tasks in an integrated framework.
•
We introduce contextual constraints as a prior for MCIL, which significantly reduces the ambiguity in multiple instance learning.

Abstract

Labeling a histopathology image as having cancerous regions or not is a critical task in cancer diagnosis; it is also clinically important to segment the cancer tissues and cluster them into various classes. Existing supervised approaches for image classification and segmentation require detailed manual annotations for the cancer pixels, which are time-consuming to obtain. In this paper, we propose a new learning method, multiple clustered instance learning (MCIL) (along the line of weakly supervised learning) for histopathology image segmentation. The proposed MCIL method simultaneously performs image-level classification (cancer vs. non-cancer image), medical image segmentation (cancer vs. non-cancer tissue), and patch-level clustering (different classes). We embed the clustering concept into the multiple instance learning (MIL) setting and derive a principled solution to performing the above three tasks in an integrated framework. In addition, we introduce contextual constraints as a prior for MCIL, which further reduces the ambiguity in MIL. Experimental results on histopathology colon cancer images and cytology images demonstrate the great advantage of MCIL over the competing methods.

Introduction

Histopathology image analysis is a vital technology for cancer recognition and diagnosis (Tabesh et al., 2007, Park et al., 2011, Esgiar et al., 2002, Madabhushi, 2009). High resolution histopathology images provide reliable information differentiating abnormal tissues from the normal ones. In this paper, we use tissue microarrays (TMAs) which are referred to histopathology images here. Fig. 1 shows a typical histopathology colon cancer image, together with a non-cancer image. Recent developments in specialized digital microscope scanners make digitization of histopathology readily accessible. Automatic cancer recognition from histopathology images thus has become an increasingly important task in the medical imaging field (Esgiar et al., 2002, Madabhushi, 2009). Some clinical tasks (Yang et al., 2008) for histopathology image analysis include: (1) detecting the presence of cancer (image classification); (2) segmenting images into cancer and non-cancer region (medical image segmentation); (3) clustering the tissue region into various classes. In this paper, we aim to develop an integrated framework to perform classification, segmentation, and clustering altogether.

Several practical systems for classifying and grading cancer histopathology images have been recently developed. These methods are mostly focused on the feature design including fractal features (Huang and Lee, 2009), texture features (Kong et al., 2009), object-level features (Boucheron, 2008), and color graphs features (Altunbay et al., 2010, Ta et al., 2009). Various classifiers (Bayesian, KNN and SVM) are also investigated for pathological prostate cancer image analysis (Huang and Lee, 2009).

From a different angle, there is a rich body of literature on supervised approaches for image detection and segmentation (Viola and Jones, 2004, Shotton et al., 2008, Felzenszwalb et al., 2010, Tu and Bai, 2010). However, supervised approaches require a large amount of high quality annotated data, which are labor-intensive and time-consuming to obtain. In addition, there is intrinsic ambiguity in the data delineation process. In practice, obtaining the very detailed annotation of cancerous regions from a histopathology image could be a challenging task, even for expert pathologists.

Unsupervised learning methods (Duda et al., 2001, Loeff et al., 2005, Tuytelaars et al., 2009), on the other hand, ease the burden of having manual annotations, but often at the cost of inferior results.

In the middle of the spectrum is the weakly supervised learning scenario. The idea is to use coarsely-grained annotations to aid automatic exploration of fine-grained information. The weakly supervised learning direction is closely related to semi-supervised learning in machine learning (Zhu, 2008). One particular form of weakly supervised learning is multiple instance learning (MIL) (Dietterich et al., 1997) in which a training set consists of a number of bags; each bag includes many instances; the goal is to learn to predict both bag-level and instance-level labels while only bag-level labels are given in training. In our case, we aim at automatically learning image models to recognize cancers from weakly supervised histopathology images. In this scenario, only image-level annotations are required. It is relatively easier for a pathologist to label a histopathology image than to delineate detailed cancer regions in each image.

In this paper, we develop an integrated framework to classify histopathology images as having cancerous regions or not, segment cancer tissues from a cancer image, and cluster them into different types. This system automatically learns the models from weakly supervised histopathology images using multiple clustered instance learning (MCIL), derived from MIL. Many previous MIL-based approaches have achieved encouraging results in the medical domain such as major adverse cardiac event (MACE) prediction (Liu et al., 2010), polyp detection (Dundar et al., 2008, Fung et al., 2006, Lu et al., 2011), pulmonary emboli validation (Raykar et al., 2008), and pathology slide classification (Dundar et al., 2010). However, none of the above methods aim to perform medical image segmentation. They also have not provided an integrated framework for the task of simultaneous classification, segmentation, and clustering.

We propose to embed the clustering concept into the MIL setting. The current literature in MIL assumes single cluster/model/classifier for the target of interest (Viola et al., 2005), single cluster within each bag (Babenko et al., 2008, Zhang and Zhou, 2009, Zhang et al., 2009), or multiple components of one object (Dollár et al., 2008). Since cancer tissue clustering is not always available, it is desirable to discover/identify the classes of various cancer tissue types; this results in patch-level clustering of cancer tissues. The incorporation of clustering concept leads to an integrated system that is able to simultaneously perform image segmentation, image-level classification, and patch-level clustering.

In addition, we introduce contextual constraints as a prior for cMCIL, which reduces the ambiguity in MIL. Most of the previous MIL methods make the assumption that instances are distributed independently, without considering the correlations among instances. Explicitly modeling the instance interdependencies (structures) can effectively improve the quality of segmentation. In our experiment, we show that while obtaining comparable results in classification, cMCIL improves the segmentation significantly (over 20%) compared MCIL. Thus, it is beneficial to explore the structural information in the histopathology images.

Section snippets

Related work

Related work can be roughly divided into two broad categories: (1) approaches for histopathology image classification and segmentation and (2) MIL methods in machine learning and computer vision. After the discussion about the previously work, we show the contributions of our method.

Methods

We follow the general definition of bags and instances in the multiple instance learning (MIL) formulation (Dietterich et al., 1997).

In this paper, the ith histopathology image is considered as a bag $x_{i}$ ; the jth image patch densely sampled from an image corresponds to an instance $x_{ij}$ . A patch of cancer tissue is treated as a positive instance ( $y_{ij} = 1$ ) and a patch without any cancer tissues is a negative instance ( $y_{ij} = - 1$ ). The ith bag is labeled as positive (cancer image), namely $y_{i} = 1$ , if this

Experiments

To illustrate the advantages of MCIL, we conduct experiments on two medical image datasets. In the first experiment, without loss of generality, we use colon tissue microarrays to perform joint classification, segmentation and clustering. For convenience, tissue microarrays are called histopathology images. In the second experiment, cytology images (Lezoray and Cardot, 2002) are used to further validate the effectiveness of MCIL. All the methods in the following experiments, unless particularly

Conclusion

In this paper, we have presented an integrated formulation, multiple clustered instance learning (MCIL), for classifying, segmenting, and clustering medical images along the line of weakly supervised learning. The advantages of MCIL are evident over the state-of-the-art methods that perform the individual tasks, which include easing the burden of manual annotation in which only image-level label is required and perform image-level classification, pixel-level segmentation and patch-level

Acknowledgments

This work was supported by Microsoft Research Asia (MSR Asia). The work was also supported by NSF CAREER award IIS-0844566 (IIS-1360568), NSF IIS-1216528 (IIS-1360566), and ONR N000140910099. It was also supported by MSRA eHealth grant, and Grant 61073077 from National Science Foundation of China and Grant SKLSDE-2011ZX-13 from State Key Laboratory of Software Development Environment in Beihang University in China. We would like to thank Department of Pathology, Zhejiang University in China for

References (66)

T. Dietterich et al.
Solving the multiple instance problem with axis-parallel rectangles
Artif. Intell.
(1997)
J. Kong et al.
Computer-aided evaluation of neuroblastoma on whole-slide histology images: classifying grade of neuroblastic differentiation
Pattern Recogn.
(2009)
J.P. Monaco et al.
High-throughput detection of prostate cancer in histological sections using probabilistic pairwise Markov models
Med. Image Anal.
(2010)
O. Sertel et al.
Computer-aided prognosis of neuroblastoma on whole-slide images: classification of stromal development
Pattern Recogn.
(2009)
V.T. Ta et al.
Graph-based tools for microscopic cellular image segmentation
Pattern Recogn.
(2009)
Ahonen, T., Matas, J., He, C., Pietikäinen, M., 2009. Rotation invariant image description with local binary pattern...
D. Altunbay et al.
Color graphs for automated cancer diagnosis and grading
IEEE Trans. Biomed. Eng.
(2010)
Andrews, S., Tsochantaridis, I., Hofmann, T., 2003. Support vector machines for multiple-instance learning. In:...
Y. Artan et al.
Prostate cancer localization with multispectral MRI using cost-sensitive support vector machines and conditional random fields
IEEE Trans. Image Process.
(2010)
Y. Artan et al.
A boosted bayesian multiresolution classifier for prostate cancer detection from digitized needle biopsies
IEEE Trans. Biomed. Eng.
(2012)

Babenko, B., Dollár, P., Tu, Z., Belongie, S., 2008. Simultaneous learning and alignment: multi-instance and multi-pose...

B. Babenko et al.

Robust object tracking with online multiple instance learning

IEEE Trans. Pattern Anal. Mach. Intell.

(2011)

D.P. Bertsekas et al.

Nonlinear Programming

(1999)

Boucheron, L.E., 2008. Object- and Spatial-Level Quantitative Analysis of Multispectral Histopathology Images for...

Dollár, P., Babenko, B., Belongie, S., Perona, P., Tu, Z., 2008. Multiple component learning for object detection. In:...

R.O. Duda et al.

Pattern Classification

(2001)

M. Dundar et al.

Multiple instance learning algorithms for computer aided diagnosis

IEEE Trans. Biomed. Eng.

(2008)

Dundar, M., Badve, S., Raykar, V., Jain, R., Sertel, O., Gurcan, M., 2010. A multiple instance learning approach toward...

A. Esgiar et al.

Fractal analysis in the detection of colonic cancer images

IEEE Trans. Inform. Technol. Biomed.

(2002)

Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A., 2009. The PASCAL Visual Object Classes...

P.F. Felzenszwalb et al.

Object detection with discriminatively trained part-based models

IEEE Trans. Pattern Anal. Mach. Intell.

(2010)

Fung, G., Dundar, M., Krishnapuram, B., Rao, B., 2006. Multiple instance algorithms for computer aided diagnosis. In:...

Fung, G., Dundar, M., Krishnapuram, B., Rao, R., 2007. Multiple instance learning for computer aided diagnosis. In:...

Galleguillos, C., Babenko, B., Rabinovich, A., Belongie, S., 2008. Weakly supervised object recognition and...

Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J., 2002. Multi-instance kernels. In: International Conference on...

P.W. Huang et al.

Automatic classification for pathological prostate images based on fractal analysis

IEEE Trans. Med. Imag.

(2009)

Jin, R., Wang, S., Zhou, Z.H., 2009. Learning a distance metric from multi-instance multi-label data. In: IEEE...

Keeler, J.D., Rumelhart, D.E., Leow, W.K., 1990. Integrated segmentation and recognition of hand-printed numerals. In:...

H. Kong et al.

Partitioning histopathological images: an integrated framework for supervised color-texture segmentation and cell splitting

IEEE Trans. Med. Imag.

(2011)

Lafferty, J.D., McCallum, A., Pereira, F.C.N., 2001. Conditional random fields: Probabilistic models for segmenting and...

O. Lezoray et al.

Cooperation of color pixel classification schemes and color watershed: a study for microscopic images

IEEE Trans. Image Process.

(2002)

Liang, J., Bi, J., 2007. Computer aided detection of pulmonary embolism with tobogganing and multiple instance...

Liu, Q., Qian, Z., Marvasty, I., Rinehart, S., Voros, S., Metaxas, D., 2010. Lesion-specific coronary artery calcium...

Cited by (0)

View full text

Weakly supervised histopathology cancer image segmentation and classification

Highlights

Abstract

Introduction

Section snippets

Related work

Methods

Experiments

Conclusion

Acknowledgments

Artif. Intell.

Pattern Recogn.

Med. Image Anal.

Pattern Recogn.

Pattern Recogn.

Color graphs for automated cancer diagnosis and grading

IEEE Trans. Biomed. Eng.

Prostate cancer localization with multispectral MRI using cost-sensitive support vector machines and conditional random fields

IEEE Trans. Image Process.

A boosted bayesian multiresolution classifier for prostate cancer detection from digitized needle biopsies

IEEE Trans. Biomed. Eng.

Robust object tracking with online multiple instance learning

IEEE Trans. Pattern Anal. Mach. Intell.

Nonlinear Programming

Pattern Classification

Multiple instance learning algorithms for computer aided diagnosis

IEEE Trans. Biomed. Eng.

Fractal analysis in the detection of colonic cancer images

IEEE Trans. Inform. Technol. Biomed.

Object detection with discriminatively trained part-based models

IEEE Trans. Pattern Anal. Mach. Intell.

Automatic classification for pathological prostate images based on fractal analysis

IEEE Trans. Med. Imag.

Partitioning histopathological images: an integrated framework for supervised color-texture segmentation and cell splitting

IEEE Trans. Med. Imag.

Cooperation of color pixel classification schemes and color watershed: a study for microscopic images

IEEE Trans. Image Process.