Weakly supervised histopathology cancer image segmentation and classification
Introduction
Histopathology image analysis is a vital technology for cancer recognition and diagnosis (Tabesh et al., 2007, Park et al., 2011, Esgiar et al., 2002, Madabhushi, 2009). High resolution histopathology images provide reliable information differentiating abnormal tissues from the normal ones. In this paper, we use tissue microarrays (TMAs) which are referred to histopathology images here. Fig. 1 shows a typical histopathology colon cancer image, together with a non-cancer image. Recent developments in specialized digital microscope scanners make digitization of histopathology readily accessible. Automatic cancer recognition from histopathology images thus has become an increasingly important task in the medical imaging field (Esgiar et al., 2002, Madabhushi, 2009). Some clinical tasks (Yang et al., 2008) for histopathology image analysis include: (1) detecting the presence of cancer (image classification); (2) segmenting images into cancer and non-cancer region (medical image segmentation); (3) clustering the tissue region into various classes. In this paper, we aim to develop an integrated framework to perform classification, segmentation, and clustering altogether.
Several practical systems for classifying and grading cancer histopathology images have been recently developed. These methods are mostly focused on the feature design including fractal features (Huang and Lee, 2009), texture features (Kong et al., 2009), object-level features (Boucheron, 2008), and color graphs features (Altunbay et al., 2010, Ta et al., 2009). Various classifiers (Bayesian, KNN and SVM) are also investigated for pathological prostate cancer image analysis (Huang and Lee, 2009).
From a different angle, there is a rich body of literature on supervised approaches for image detection and segmentation (Viola and Jones, 2004, Shotton et al., 2008, Felzenszwalb et al., 2010, Tu and Bai, 2010). However, supervised approaches require a large amount of high quality annotated data, which are labor-intensive and time-consuming to obtain. In addition, there is intrinsic ambiguity in the data delineation process. In practice, obtaining the very detailed annotation of cancerous regions from a histopathology image could be a challenging task, even for expert pathologists.
Unsupervised learning methods (Duda et al., 2001, Loeff et al., 2005, Tuytelaars et al., 2009), on the other hand, ease the burden of having manual annotations, but often at the cost of inferior results.
In the middle of the spectrum is the weakly supervised learning scenario. The idea is to use coarsely-grained annotations to aid automatic exploration of fine-grained information. The weakly supervised learning direction is closely related to semi-supervised learning in machine learning (Zhu, 2008). One particular form of weakly supervised learning is multiple instance learning (MIL) (Dietterich et al., 1997) in which a training set consists of a number of bags; each bag includes many instances; the goal is to learn to predict both bag-level and instance-level labels while only bag-level labels are given in training. In our case, we aim at automatically learning image models to recognize cancers from weakly supervised histopathology images. In this scenario, only image-level annotations are required. It is relatively easier for a pathologist to label a histopathology image than to delineate detailed cancer regions in each image.
In this paper, we develop an integrated framework to classify histopathology images as having cancerous regions or not, segment cancer tissues from a cancer image, and cluster them into different types. This system automatically learns the models from weakly supervised histopathology images using multiple clustered instance learning (MCIL), derived from MIL. Many previous MIL-based approaches have achieved encouraging results in the medical domain such as major adverse cardiac event (MACE) prediction (Liu et al., 2010), polyp detection (Dundar et al., 2008, Fung et al., 2006, Lu et al., 2011), pulmonary emboli validation (Raykar et al., 2008), and pathology slide classification (Dundar et al., 2010). However, none of the above methods aim to perform medical image segmentation. They also have not provided an integrated framework for the task of simultaneous classification, segmentation, and clustering.
We propose to embed the clustering concept into the MIL setting. The current literature in MIL assumes single cluster/model/classifier for the target of interest (Viola et al., 2005), single cluster within each bag (Babenko et al., 2008, Zhang and Zhou, 2009, Zhang et al., 2009), or multiple components of one object (Dollár et al., 2008). Since cancer tissue clustering is not always available, it is desirable to discover/identify the classes of various cancer tissue types; this results in patch-level clustering of cancer tissues. The incorporation of clustering concept leads to an integrated system that is able to simultaneously perform image segmentation, image-level classification, and patch-level clustering.
In addition, we introduce contextual constraints as a prior for cMCIL, which reduces the ambiguity in MIL. Most of the previous MIL methods make the assumption that instances are distributed independently, without considering the correlations among instances. Explicitly modeling the instance interdependencies (structures) can effectively improve the quality of segmentation. In our experiment, we show that while obtaining comparable results in classification, cMCIL improves the segmentation significantly (over 20%) compared MCIL. Thus, it is beneficial to explore the structural information in the histopathology images.
Section snippets
Related work
Related work can be roughly divided into two broad categories: (1) approaches for histopathology image classification and segmentation and (2) MIL methods in machine learning and computer vision. After the discussion about the previously work, we show the contributions of our method.
Methods
We follow the general definition of bags and instances in the multiple instance learning (MIL) formulation (Dietterich et al., 1997).
In this paper, the ith histopathology image is considered as a bag ; the jth image patch densely sampled from an image corresponds to an instance . A patch of cancer tissue is treated as a positive instance () and a patch without any cancer tissues is a negative instance (). The ith bag is labeled as positive (cancer image), namely , if this
Experiments
To illustrate the advantages of MCIL, we conduct experiments on two medical image datasets. In the first experiment, without loss of generality, we use colon tissue microarrays to perform joint classification, segmentation and clustering. For convenience, tissue microarrays are called histopathology images. In the second experiment, cytology images (Lezoray and Cardot, 2002) are used to further validate the effectiveness of MCIL. All the methods in the following experiments, unless particularly
Conclusion
In this paper, we have presented an integrated formulation, multiple clustered instance learning (MCIL), for classifying, segmenting, and clustering medical images along the line of weakly supervised learning. The advantages of MCIL are evident over the state-of-the-art methods that perform the individual tasks, which include easing the burden of manual annotation in which only image-level label is required and perform image-level classification, pixel-level segmentation and patch-level
Acknowledgments
This work was supported by Microsoft Research Asia (MSR Asia). The work was also supported by NSF CAREER award IIS-0844566 (IIS-1360568), NSF IIS-1216528 (IIS-1360566), and ONR N000140910099. It was also supported by MSRA eHealth grant, and Grant 61073077 from National Science Foundation of China and Grant SKLSDE-2011ZX-13 from State Key Laboratory of Software Development Environment in Beihang University in China. We would like to thank Department of Pathology, Zhejiang University in China for
References (66)
- et al.
Solving the multiple instance problem with axis-parallel rectangles
Artif. Intell.
(1997) - et al.
Computer-aided evaluation of neuroblastoma on whole-slide histology images: classifying grade of neuroblastic differentiation
Pattern Recogn.
(2009) - et al.
High-throughput detection of prostate cancer in histological sections using probabilistic pairwise Markov models
Med. Image Anal.
(2010) - et al.
Computer-aided prognosis of neuroblastoma on whole-slide images: classification of stromal development
Pattern Recogn.
(2009) - et al.
Graph-based tools for microscopic cellular image segmentation
Pattern Recogn.
(2009) - Ahonen, T., Matas, J., He, C., Pietikäinen, M., 2009. Rotation invariant image description with local binary pattern...
- et al.
Color graphs for automated cancer diagnosis and grading
IEEE Trans. Biomed. Eng.
(2010) - Andrews, S., Tsochantaridis, I., Hofmann, T., 2003. Support vector machines for multiple-instance learning. In:...
- et al.
Prostate cancer localization with multispectral MRI using cost-sensitive support vector machines and conditional random fields
IEEE Trans. Image Process.
(2010) - et al.
A boosted bayesian multiresolution classifier for prostate cancer detection from digitized needle biopsies
IEEE Trans. Biomed. Eng.
(2012)