Computer-aided prognosis: Predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data

https://doi.org/10.1016/j.compmedimag.2011.01.008Get rights and content

Abstract

Computer-aided prognosis (CAP) is a new and exciting complement to the field of computer-aided diagnosis (CAD) and involves developing and applying computerized image analysis and multi-modal data fusion algorithms to digitized patient data (e.g. imaging, tissue, genomic) for helping physicians predict disease outcome and patient survival. While a number of data channels, ranging from the macro (e.g. MRI) to the nano-scales (proteins, genes) are now being routinely acquired for disease characterization, one of the challenges in predicting patient outcome and treatment response has been in our inability to quantitatively fuse these disparate, heterogeneous data sources. At the Laboratory for Computational Imaging and Bioinformatics (LCIB)1 at Rutgers University, our team has been developing computerized algorithms for high dimensional data and image analysis for predicting disease outcome from multiple modalities including MRI, digital pathology, and protein expression. Additionally, we have been developing novel data fusion algorithms based on non-linear dimensionality reduction methods (such as Graph Embedding) to quantitatively integrate information from multiple data sources and modalities with the overarching goal of optimizing meta-classifiers for making prognostic predictions. In this paper, we briefly describe 4 representative and ongoing CAP projects at LCIB. These projects include (1) an Image-based Risk Score (IbRiS) algorithm for predicting outcome of Estrogen receptor positive breast cancer patients based on quantitative image analysis of digitized breast cancer biopsy specimens alone, (2) segmenting and determining extent of lymphocytic infiltration (identified as a possible prognostic marker for outcome in human epidermal growth factor amplified breast cancers) from digitized histopathology, (3) distinguishing patients with different Gleason grades of prostate cancer (grade being known to be correlated to outcome) from digitized needle biopsy specimens, and (4) integrating protein expression measurements obtained from mass spectrometry with quantitative image features derived from digitized histopathology for distinguishing between prostate cancer patients at low and high risk of disease recurrence following radical prostatectomy.

Introduction

Most researchers agree that cancer is a complex disease which we do not yet fully understand. Predictive, preventive, and personalized medicine (PPP) has the potential to transform clinical practice by decreasing morbidity due to diseases such as cancer by integrating multi-scale, multi-modal, and heterogeneous data to determine the probability of an individual contracting certain diseases and/or responding to a specific treatment regimen [3]. In the clinic, the same treatment applied to two patients with diseases that look very similar often have vastly different outcomes under the same treatment [4], [5]. A part of this difference is undoubtedly patient specific, but a part must also be a result of our limited understanding of the relationship between disease progression and clinical presentation.

An understanding of the interplays of different hierarchies of biological information from proteins, tissue, metabolites, and imaging will provide conceptual insights and practical innovations that will profoundly transform people's lives [3], [5], [6]. There is a consensus among clinicians and researchers that a more quantitative approach, using computerized imaging techniques to better understand tumor morphology, combined with the classification of disease into more meaningful molecular subtypes, will lead to better patient care and more effective therapeutics [5], [7], [8]. With the advent of digital pathology [5], [6], [9], multi-functional imaging, mass spectrometry, immuno-histochemical, and fluorescent in situ hybridization (FISH) techniques, the acquisition of multiple, orthogonal sources of genomic, proteomic, multi-parametric radiological, and histological information for disease characterization is becoming routine at several institutions [10], [11]. Computerized image analysis and high dimensional data fusion methods will likely constitute an important piece of the prognostic tool-set to enable physicians to predict which patients may be susceptible to a particular disease and also for predicting disease outcome and survival. These tools will also have important implications in theragnostics [12], [13], [14], the ability to predict how an individual may react to various treatments, thereby (1) providing guidance for developing customized therapeutic drugs and (2) enabling development of preventive treatments for individuals based on their potential health problems. A theragnostic profile that is a synthesis of various biomarker and imaging tests from different levels of the biological hierarchy (genomic, proteomic, metabolic) could be used to characterize an individual patient and her/his drug treatment outcome.

If multiple sensors or sources are used in the inference process, in principle, they could be fused at one of 3 levels in the hierarchy; (1) raw data-level fusion, (2) feature-level fusion, or (3) decision-level fusion [15], [16]. Several classifier ensemble or multiple classifier schemes have been previously proposed to associate and correlate data at the decision-level (combination of decisions (COD)) [17], [18], [19], [20], [21], [22], [23], [24]; a much easier task compared to data integration at the raw-data or feature level (combination of features (COF)). Traditional decision fusion based approaches have focused on combining either binary decisions Yα(c){+1,1}, ranks, or probabilistic classifier outputs Pα(c) obtained via classification of each of the k individual data sources Fα(c), α  {1, 2, …, k}, via a Bayesian framework [25], Dempster–Shafer evidence theory [26], fuzzy set theory, or via classical decision ensembles schemes, e.g. Adaboost [19], Support Vector Machines (SVM) [18], or Bagging [17]. At a given data scale (e.g. radiological images such as MRI and CT), several researchers [27], [28], [29], [30], [31], [32], [33], [34], [35] have developed techniques for combining imaging data sources (assuming the registration problem has been solved) by simply concatenating the individual image modality attributes FMRI(c) and FCT(c) at every spatial location c to create a combined feature vector [FMRI(c), FCT(c)] which can be input to a classifier. However when the individual modalities are heterogeneous (image and non-image based) and of different dimensions, e.g. a 256 dimensional vectorial spectral signal FMRS(c) and a scalar image intensity value FMRI(c), a simple concatenation [FMRI(c), FMRS(c)] will not provide a meaningful data fusion solution. Thus, a significant challenge in integrating heterogeneous imaging and non-imaging biological data has been the lack of a quantifiable knowledge representation framework to reconcile cross-modal, cross-dimensional differences in feature values.

While no general theory yet exists for domain data fusion, most researchers agree that heterogeneous data needs be represented in a way that will allow for confrontation of the different channels, an important prerequisite to fusion or classification. Bruno et al. [36] recently designed a multimodal dissimilarity space for retrieval of video documents. Lanckriet et al. [37] and Lewis et al. [38] both presented kernel based frameworks for representing heterogeneous data relating to protein sequences and then used the data representation in conjunction with a SVM classifier [18] for protein structure prediction. Mandic et al. [39] recently proposed a sequential data fusion approach for combining wind measurements via the representation of directional signals within the field of complex numbers. Coppock and Mazlack [40] extended Gower's metric [41] for nominal and ordinal data integration within an agglomerative hierarchical clustering algorithm to cluster mixed data.

In spite of the challenges, data fusion at the feature level aims at retrieving the interesting characteristics of the phenomenon being studied [39]. Kernel-based formulations have been used in combining multiple related datasets (such as gene expression, protein sequence, and protein–protein interaction data) for function prediction in yeast [37] as well as for heterogeneous data fusion for studying Alzheimer's disease [42]. However the selection and tuning of the kernels used in multi-kernel learning (MKL) play an important role in the performance of the approach. This selection proves to be non-trivial when considering completely heterogeneous, multi-scale data such as molecular protein-, and gene-expression signatures and imaging and metabolic phenotypes. Additionally these methods typically employ the same kernel or metric, across modalities, for estimating object similarity. Thus while the Euclidean kernel might be appropriate for image intensities, it might not be appropriate for all feature spaces (e.g. time series spectra or gene expression vectors) [43].

Recently, approaches involving the use of dimensionality reduction (DR) methods for representing high dimensional data in terms of embedding vectors in a reduced dimensional space have been proposed. Applications have included the fusion of heterogeneous dimensional data (e.g. scalar imaging (MRI) and vectorial information (e.g. magnetic resonance spectroscopy (MRS))) [44], [45], [46] by attempting to reduce the dimensionality of the higher dimensional data source to that of the lower dimensional modality via principal component analysis (PCA), independent component analysis (ICA), or a linear combination model (LCM) [47]. However, these strategies often lead to non-optimal fusion solutions due to (a) use of linear DR schemes, (b) dimensionality reduction of only the non-imaging data channel and (c) large scaling differences between the different modalities. Yu and Tresp proposed a generalized PCA model for representing real-world image painting data [48]. Recently, manifold learning (ML) methods such as isometric mapping (Isomap) [49] and locally linear embedding (LLE) [50] have become popular for mapping high dimensional information into a low dimensional representation for the purpose of visualization or classification. While these non-linear DR (NLDR) methods enjoy advantages compared to traditional linear DR methods such as PCA [51] and LCM [52] in that they are able to discover non-linear relationships in the data [53], [54], they are notoriously susceptible to the choice of optimal embedding parameters [49], [50].

Researchers have since been developing novel methods for overcoming the difficulties in obtaining an appropriate manifold representation of the data. Samko et al. [55] has developed an estimator for optimal neighborhood size for Isomap. However, in cases of varying neighborhood densities, an optimal neighborhood size may not exist on a global scale. Others have developed adaptive methods that select neighbors based on additional constraints such as local tangents [56], [57], intrinsic dimensionality [58], and estimating geodesic distances within a neighborhood [59]. The additional constraints in these adaptive methods aim to create a graph that does not contain spurious neighbors, but the use of additional constraints leaves the user with an additional degree of freedom to define when creating a manifold.

Along with other groups [60], [61], [62], the Rutgers Laboratory for Computational Imaging and Bioinformatics (LCIB) group has been working on developing NLDR schemes that have been shown to be more resistant to some of the failings of LLE [50] and Isomap [49]. C-Embed is a consensus NLDR scheme that [54], [63], [64], [65] combines multiple low dimensional multi-dimensional projections of the data to obtain a more robust low dimensional data representation, one which is not sensitive to careful selection of the neighborhood parameter (κ), unlike LLE and Isomap. These schemes [11], [63], [65], [66], [67], [68], [69] allow for non-linearly transforming each of the k individual high dimensional heterogeneous modalities into the common format of low dimensional embedding vectors thereby enabling direct, data-level fusion of structural, functional, metabolic, architectural, genomic, and proteomic information in the original space while overcoming the differences in scale, size, and dimensionality of individual feature spaces. This integrated representation of multiple modalities in the transformed space can be used to train meta-classifiers for studying and predicting biological activity.

While a diagnostic marker identifies diseased from normal tissue, a prognostic marker identifies subgroups of patients associated with different disease outcomes. With increasing early detection of diseases via improved diagnostic imaging methodologies [21], [64], [65], [69], [70], [71], [72], [73], it has become important to predict biologic behaviors and disease “aggressiveness”. Clinically applicable prognostic markers are urgently needed to assist in the selection of optimal therapy. In the context of prostate cancer (PCa), well established prognostic markers include histologic grade, prostate specific antigen (PSA), margin positivity, pathologic stage, intra-glandular tumor extent, and DNA ploidy [74], [75], [76]. Other recently promising prognostic indicators include tumor suppressor gene p53, cell proliferation marker Ki-67, Oncoantigen 519, microsatellite instability, angiogenesis and tumor vascularity (TVC), vascular endothelial growth factor (VEGF), and E-cadherin [76], [77]. None of these factors, however, have individually proven to be accurate enough to serve routinely as a prognostic marker [77], [78]. The problem is that men with early detected PCa have in 50% of cases [79], and in some cases 80% [80], a homogeneous pattern with respect to most standard prognostic variables (PSA < 10, T1c, Gleason score < 7). In this growing group of patients, the traditional markers seem to lose their efficacy and the subsequent therapy decision is complicated. Gao et al. [81] suggest that only a combination of multiple prognostic markers will prove superior to any individual marker. Graefen et al. [82] and Stephenson et al. [83], [84], [85] have suggested that better prognostic accuracy can be obtained by a combination of the individual markers via a machine classifier like an artificial neural network.

Graphs are effective techniques to represent spatial arrangement of structures by defining a large set of topological features. These features are quantified by definition of computable metrics. The use of spatial-relation features for quantifying cellular arrangement was proposed in the early 1990s [86], [87], but did not find application to biomedical imagery until recently [88], [89], [90], [91], [92], [93], [94]. However, with recent evidence demonstrating that for certain classes of tumors, tumor–host interactions correlate with clinical outcome [95], graph algorithms clearly have a role to play in modeling the tumor–host network and hence in predicting disease outcome.

Table 1 lists common spatial, graph based features that one can extract from the Voronoi Diagram (VD), Delaunay Triangulation (DT), and the Minimum Spanning Tree (MST) [96], [97], [98]. Additionally a number of features based off nuclear statistics can be similarly extracted. Using the nuclear centroids in a tissue region (Fig. 1(a)) as vertices, the DT graph (Fig. 1(b)), a unique triangulation of the centroids, and the MST (Fig. 1(c)), a graph that connects all centroids with the minimum possible graph length, can be constructed. These features quantify important biological information, such as the proliferation and structural arrangement of the cells in the tissue, which is closely tied to cancerous activity. Our hypothesis is that the genetic descriptors that define clinically relevant classes of cancer are reflected in the visual characteristics of the cellular morphology and tissue architecture, and that these characteristics can be measured by image analysis techniques. We believe that image-based classifiers of disease developed via comprehensive analysis of quantitative image-based information present in tissue histology will have strong correlation with gene-expression based prognostic classification.

At LCIB in Rutgers University, we have been developing an array of computerized image analysis and high dimensional data analysis, fusion tools for quantitatively integrating molecular features of a tumor (as measured by gene expression profiling or mass spectrometry) [54], [99], results from the imaging of the tumor cellular architecture and microenvironment (as captured in histological imaging) [6], [9], the tumor 3-d tissue architecture [100], and its metabolic features (as seen by metabolic or functional imaging modalities such as Magnetic Resonance Spectroscopy (MRS)) [21], [64], [65], [69], [70], [71], [72], [73]. In this paper, we briefly describe 4 representative and ongoing projects at LCIB in the context of predicting outcome of breast and prostate cancer patients and involving computerized image, data analysis and fusion of quantitative measurements from digitized histopathology, and protein expression features obtained via mass spectrometry. Preliminary data pertaining to these projects is also presented.

Section snippets

Image-based risk score for ER+ breast cancers

The current gold standard for achieving a quantitative and reproducible prognosis in estrogen receptor-positive breast cancers (ER+ BC) is via the Oncotype DX (Genomic Health, Inc.) molecular assay, which produces a Recurrence Score (RS) between 0 and 100, where a high RS corresponds to a poor outcome and vice versa. In [101], we presented Image-based Risk Score (IbRiS), a novel CAP scheme that uses only quantitatively derived information (architectural features derived from spatial arrangement

Lymphocytic infiltration and outcome in HER2+ breast cancers

The identification of phenotypic changes in BC histopathology with respect to corresponding molecular changes is of significant clinical importance in predicting BC outcome. One such example is the presence of lymphocytic infiltration (LI) in BC histopathology, which has been correlated with nodal metastasis and distant recurrence in human epidermal growth factor amplified (HER2+) breast cancers.

In [103], [104], we introduced a computerized image analysis system for detecting and grading the

Automated Gleason grading on prostate cancer histopathology

PCa is diagnosed in over 200,000 people and causes 27,000 deaths in the US annually. However, the five-year survival rate for patients diagnosed at an early stage of tumor development is very high [106], [107]. If PCa is found on a needle biopsy, the tumor is then assigned a Gleason grade (1–5) [6], [9]. Gleason grade 1 tissue is highly differentiated and non-infiltrative while grade 5 is poorly differentiated and highly infiltrative. Gleason grading is predominantly based on tissue

Integrated proteomic, histological signatures for predicting prostate cancer recurrence

Following radical prostatectomy (RP), there remains a substantial risk of disease recurrence (estimated at 25–40%) [109]. Studies have identified infiltration beyond the surgical margin, and high Gleason score as possible predictors of prostate cancer recurrence. However, owing to inter-observer variability in Gleason grade determination, cancers identified with the same Gleason grade could have significantly different outcomes [110]. Discovery of a predictive biomarker for outcome following RP

Concluding remarks

In this paper we briefly described some of the primary challenges in the quantitative fusion of multi-scale, multi-modal data for building prognostic meta-classifiers for predicting treatment response and patient outcome. We also described some of the ongoing efforts at the Laboratory for Computational Imaging and Bioinformatics (LCIB) at Rutgers University to address some of these computational challenges in personalized therapy and highlighted ongoing projects in computer-aided prognosis of

Acknowledgments

This work was supported by the Wallace H. Coulter Foundation, the National Cancer Institute under Grants R01CA136535, R01CA140772, R03CA143991, the Cancer Institute of New Jersey, and the Department of Defense (W81XWH-08-1-0145).

References (112)

  • A. Madabhushi et al.

    Computer-aided prognosis: predicting patient and disease outcome via multi-modal image analysis

    IEEE Int Symp Biomed Imaging (ISBI)

    (2010)
  • A. Janowczyk et al.

    Hierarchical normalized cuts: unsupervised segmentation of vascular biomarkers from ovarian cancer tissue microarrays

    Med Image Comput Comput Assist Interv

    (2009)
  • A. Madabhushi et al.

    Integrated diagnostics: a conceptual framework with examples, Clinical Chemistry and Laboratory Medicine

    Clin Chem Lab Med

    (2010)
  • A. Madabhushi

    Digital pathology image analysis: opportunities and challenges

    Imaging Med

    (2009)
  • S. Agner et al.

    A comprehensive multi-attribute manifold learning scheme-based computer aided diagnostic system for breast MRI

  • S. Doyle et al.

    A boosted Bayesian multi-resolution classifier for prostate cancer detection from digitized needle biopsies

    IEEE Trans Biomed Eng

    (2010)
  • A. Madabhushi et al.

    Graph embedding to improve supervised classification: detecting prostate cancer

  • R.E. Lenkinski et al.

    An illustration of the potential for mapping MRI/MRS parameters with genetic over-expression profiles in human prostate cancer

    Magma

    (2008)
  • D. Juan et al.

    Identification of a MicroRNA panel for clear-cell kidney cancer

    Urology

    (2009)
  • G. Lexe et al.

    Towards improved cancer diagnosis and prognosis using analysis of gene expression data and computer aided imaging

    Exp Biol Med (Maywood)

    (2009)
  • F. Pene et al.

    Toward theragnostics

    Crit Care Med

    (2009)
  • G. Lippi

    Wisdom of theragnostics, other changes

    MLO Med Lab Obs

    (2008)
  • V. Ozdemir et al.

    Mapping translational research in personalized therapeutics: from molecular markers to health policy

    Pharmacogenomics

    (2007)
  • A.R. Mirza

    An architectural selection framework for data fusion in sensor platforms

  • D.L. Hall

    Perspectives on the fusion of image and non-image data

  • L. Breiman

    Bagging predictors

    Mach Learn

    (1996)
  • C.A. Burges

    Tutorial on support vector machines for pattern recognition

    Data Min Knowl Discov

    (1998)
  • R.S.Y. Freund

    Experiments with a new boosting algorithm in proceedings of national conference

    Mach Learn

    (1996)
  • A. Madabhushi et al.

    Optimally combining 3D texture features for automated segmentation of prostatic adenocarcinoma from high resolution MR images

  • A. Madabhushi et al.

    Automated detection of prostatic adenocarcinoma from high resolution ex vivo MRI

    IEEE Trans Med Imaging

    (2005)
  • A. Madabhushi et al.

    Comparing classification performance of feature ensembles: detecting prostate cancer from high resolution MRI

  • T. Twellmann et al.

    Image fusion for dynamic contrast enhanced magnetic resonance imaging

    Biomed Eng Online

    (2004)
  • J.L. Jesneck et al.

    Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis

    Med Phys

    (2006)
  • P.E.H.R.O. Duda

    Pattern classification and scene analysis

    (1973)
  • A.W. Smeulders et al.

    An analysis of pathology knowledge and decision making for the development of artificial intelligence-based consulting systems

    Anal Quant Cytol Histol

    (1989)
  • S. Dube et al.

    Content based image retrieval for MR image studies of brain tumors

    Conf Proc IEEE Eng Med Biol Soc

    (2006)
  • R.A. Heckemann et al.

    Multiclassifier fusion in human brain MR segmentation: modelling convergence

    Med Image Comput Comput Assist Interv Int Conf

    (2006)
  • S. Hunsche et al.

    Combined X-ray and magnetic resonance imaging facility: application to image-guided stereotactic and functional neurosurgery

    Neurosurgery

    (2007)
  • T. Rohlfing et al.

    Information fusion in biomedical image analysis: combination of data vs. combination of interpretations

    Inf Process Med Imaging

    (2005)
  • T.Z. Wong et al.

    PET and brain tumor image fusion

    Cancer J

    (2004)
  • E. Bruno et al.

    Design of multimodal dissimilarity spaces for retrieval of video documents

    IEEE Trans Pattern Anal Mach Intell

    (2008)
  • G.R. Lanckriet et al.

    Kernel-based data fusion and its application to protein function prediction in yeast

    Pac Symp Biocomput

    (2004)
  • D.P. Lewis et al.

    Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure

    Struct Bioinform

    (2006)
  • D.P. Mandic et al.

    Sequential data fusion via vector spaces: fusion of heterogeneous data in the complex domain

    J VLSI Signal Process

    (2007)
  • S. Coppock et al.
  • K.J. Friston et al.

    Functional topography: multidimensional scaling and functional connectivity in the brain

    Cereb Cortex

    (1996)
  • J. Ye et al.

    Heterogeneous data fusion for alzheimer's disease study

  • S. Rao et al.

    Evaluating distance functions for clustering tandem repeats

    Genome Inform

    (2005)
  • A.W. Simonetti et al.

    Combination of feature-reduced MR spectroscopic and MR imaging data for improved brain tumor classification

    NMR Biomed

    (2005)
  • A.W. Simonetti et al.

    A chemometric approach for brain tumor classification using magnetic resonance imaging and spectroscopy

    Anal Chem

    (2003)
  • Cited by (108)

    • A new approach for cancer prediction based on deep neural learning

      2023, Journal of King Saud University - Computer and Information Sciences
    • Yottixel – An Image Search Engine for Large Archives of Histopathology Whole Slide Images

      2020, Medical Image Analysis
      Citation Excerpt :

      In fact, CAD is now integral to many clinical routines for diagnostic radiology and recently becoming eminent in diagnostic pathology as well. With an increase in the workload of pathologists, there is a compelling need to integrate CAD systems into pathology routines (Komura and Ishikawa, 2018; Madabhushi and Lee, 2016; Madabhushi et al., 2011; Gurcan et al., 2009a). Researchers in both image analysis and pathology fields have recognized the importance of the quantitative analysis of pathology images by using machine learning (ML) techniques (Gurcan et al., 2009a).

    • Artificial intelligence and the interplay between tumor and immunity

      2020, Artificial Intelligence and Deep Learning in Pathology
    View all citing articles on Scopus

    A preliminary version of this paper appeared in [1].

    1

    http://lcib.rutgers.edu.

    View full text