Computer-aided prognosis: Predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data

doi:10.1016/j.compmedimag.2011.01.008

Computerized Medical Imaging and Graphics

Volume 35, Issues 7–8, October–December 2011, Pages 506-514

https://doi.org/10.1016/j.compmedimag.2011.01.008 Get rights and content

Abstract

Computer-aided prognosis (CAP) is a new and exciting complement to the field of computer-aided diagnosis (CAD) and involves developing and applying computerized image analysis and multi-modal data fusion algorithms to digitized patient data (e.g. imaging, tissue, genomic) for helping physicians predict disease outcome and patient survival. While a number of data channels, ranging from the macro (e.g. MRI) to the nano-scales (proteins, genes) are now being routinely acquired for disease characterization, one of the challenges in predicting patient outcome and treatment response has been in our inability to quantitatively fuse these disparate, heterogeneous data sources. At the Laboratory for Computational Imaging and Bioinformatics (LCIB)¹ at Rutgers University, our team has been developing computerized algorithms for high dimensional data and image analysis for predicting disease outcome from multiple modalities including MRI, digital pathology, and protein expression. Additionally, we have been developing novel data fusion algorithms based on non-linear dimensionality reduction methods (such as Graph Embedding) to quantitatively integrate information from multiple data sources and modalities with the overarching goal of optimizing meta-classifiers for making prognostic predictions. In this paper, we briefly describe 4 representative and ongoing CAP projects at LCIB. These projects include (1) an Image-based Risk Score (IbRiS) algorithm for predicting outcome of Estrogen receptor positive breast cancer patients based on quantitative image analysis of digitized breast cancer biopsy specimens alone, (2) segmenting and determining extent of lymphocytic infiltration (identified as a possible prognostic marker for outcome in human epidermal growth factor amplified breast cancers) from digitized histopathology, (3) distinguishing patients with different Gleason grades of prostate cancer (grade being known to be correlated to outcome) from digitized needle biopsy specimens, and (4) integrating protein expression measurements obtained from mass spectrometry with quantitative image features derived from digitized histopathology for distinguishing between prostate cancer patients at low and high risk of disease recurrence following radical prostatectomy.

Introduction

Most researchers agree that cancer is a complex disease which we do not yet fully understand. Predictive, preventive, and personalized medicine (PPP) has the potential to transform clinical practice by decreasing morbidity due to diseases such as cancer by integrating multi-scale, multi-modal, and heterogeneous data to determine the probability of an individual contracting certain diseases and/or responding to a specific treatment regimen [3]. In the clinic, the same treatment applied to two patients with diseases that look very similar often have vastly different outcomes under the same treatment [4], [5]. A part of this difference is undoubtedly patient specific, but a part must also be a result of our limited understanding of the relationship between disease progression and clinical presentation.

An understanding of the interplays of different hierarchies of biological information from proteins, tissue, metabolites, and imaging will provide conceptual insights and practical innovations that will profoundly transform people's lives [3], [5], [6]. There is a consensus among clinicians and researchers that a more quantitative approach, using computerized imaging techniques to better understand tumor morphology, combined with the classification of disease into more meaningful molecular subtypes, will lead to better patient care and more effective therapeutics [5], [7], [8]. With the advent of digital pathology [5], [6], [9], multi-functional imaging, mass spectrometry, immuno-histochemical, and fluorescent in situ hybridization (FISH) techniques, the acquisition of multiple, orthogonal sources of genomic, proteomic, multi-parametric radiological, and histological information for disease characterization is becoming routine at several institutions [10], [11]. Computerized image analysis and high dimensional data fusion methods will likely constitute an important piece of the prognostic tool-set to enable physicians to predict which patients may be susceptible to a particular disease and also for predicting disease outcome and survival. These tools will also have important implications in theragnostics [12], [13], [14], the ability to predict how an individual may react to various treatments, thereby (1) providing guidance for developing customized therapeutic drugs and (2) enabling development of preventive treatments for individuals based on their potential health problems. A theragnostic profile that is a synthesis of various biomarker and imaging tests from different levels of the biological hierarchy (genomic, proteomic, metabolic) could be used to characterize an individual patient and her/his drug treatment outcome.

If multiple sensors or sources are used in the inference process, in principle, they could be fused at one of 3 levels in the hierarchy; (1) raw data-level fusion, (2) feature-level fusion, or (3) decision-level fusion [15], [16]. Several classifier ensemble or multiple classifier schemes have been previously proposed to associate and correlate data at the decision-level (combination of decisions (COD)) [17], [18], [19], [20], [21], [22], [23], [24]; a much easier task compared to data integration at the raw-data or feature level (combination of features (COF)). Traditional decision fusion based approaches have focused on combining either binary decisions $Y_{α} (c) \in {+ 1, - 1}$ , ranks, or probabilistic classifier outputs $P_{α} (c)$ obtained via classification of each of the k individual data sources F_α(c), α ∈ {1, 2, …, k}, via a Bayesian framework [25], Dempster–Shafer evidence theory [26], fuzzy set theory, or via classical decision ensembles schemes, e.g. Adaboost [19], Support Vector Machines (SVM) [18], or Bagging [17]. At a given data scale (e.g. radiological images such as MRI and CT), several researchers [27], [28], [29], [30], [31], [32], [33], [34], [35] have developed techniques for combining imaging data sources (assuming the registration problem has been solved) by simply concatenating the individual image modality attributes F_MRI(c) and F_CT(c) at every spatial location c to create a combined feature vector [F_MRI(c), F_CT(c)] which can be input to a classifier. However when the individual modalities are heterogeneous (image and non-image based) and of different dimensions, e.g. a 256 dimensional vectorial spectral signal F_MRS(c) and a scalar image intensity value F_MRI(c), a simple concatenation [F_MRI(c), F_MRS(c)] will not provide a meaningful data fusion solution. Thus, a significant challenge in integrating heterogeneous imaging and non-imaging biological data has been the lack of a quantifiable knowledge representation framework to reconcile cross-modal, cross-dimensional differences in feature values.

While no general theory yet exists for domain data fusion, most researchers agree that heterogeneous data needs be represented in a way that will allow for confrontation of the different channels, an important prerequisite to fusion or classification. Bruno et al. [36] recently designed a multimodal dissimilarity space for retrieval of video documents. Lanckriet et al. [37] and Lewis et al. [38] both presented kernel based frameworks for representing heterogeneous data relating to protein sequences and then used the data representation in conjunction with a SVM classifier [18] for protein structure prediction. Mandic et al. [39] recently proposed a sequential data fusion approach for combining wind measurements via the representation of directional signals within the field of complex numbers. Coppock and Mazlack [40] extended Gower's metric [41] for nominal and ordinal data integration within an agglomerative hierarchical clustering algorithm to cluster mixed data.

In spite of the challenges, data fusion at the feature level aims at retrieving the interesting characteristics of the phenomenon being studied [39]. Kernel-based formulations have been used in combining multiple related datasets (such as gene expression, protein sequence, and protein–protein interaction data) for function prediction in yeast [37] as well as for heterogeneous data fusion for studying Alzheimer's disease [42]. However the selection and tuning of the kernels used in multi-kernel learning (MKL) play an important role in the performance of the approach. This selection proves to be non-trivial when considering completely heterogeneous, multi-scale data such as molecular protein-, and gene-expression signatures and imaging and metabolic phenotypes. Additionally these methods typically employ the same kernel or metric, across modalities, for estimating object similarity. Thus while the Euclidean kernel might be appropriate for image intensities, it might not be appropriate for all feature spaces (e.g. time series spectra or gene expression vectors) [43].

Recently, approaches involving the use of dimensionality reduction (DR) methods for representing high dimensional data in terms of embedding vectors in a reduced dimensional space have been proposed. Applications have included the fusion of heterogeneous dimensional data (e.g. scalar imaging (MRI) and vectorial information (e.g. magnetic resonance spectroscopy (MRS))) [44], [45], [46] by attempting to reduce the dimensionality of the higher dimensional data source to that of the lower dimensional modality via principal component analysis (PCA), independent component analysis (ICA), or a linear combination model (LCM) [47]. However, these strategies often lead to non-optimal fusion solutions due to (a) use of linear DR schemes, (b) dimensionality reduction of only the non-imaging data channel and (c) large scaling differences between the different modalities. Yu and Tresp proposed a generalized PCA model for representing real-world image painting data [48]. Recently, manifold learning (ML) methods such as isometric mapping (Isomap) [49] and locally linear embedding (LLE) [50] have become popular for mapping high dimensional information into a low dimensional representation for the purpose of visualization or classification. While these non-linear DR (NLDR) methods enjoy advantages compared to traditional linear DR methods such as PCA [51] and LCM [52] in that they are able to discover non-linear relationships in the data [53], [54], they are notoriously susceptible to the choice of optimal embedding parameters [49], [50].

Researchers have since been developing novel methods for overcoming the difficulties in obtaining an appropriate manifold representation of the data. Samko et al. [55] has developed an estimator for optimal neighborhood size for Isomap. However, in cases of varying neighborhood densities, an optimal neighborhood size may not exist on a global scale. Others have developed adaptive methods that select neighbors based on additional constraints such as local tangents [56], [57], intrinsic dimensionality [58], and estimating geodesic distances within a neighborhood [59]. The additional constraints in these adaptive methods aim to create a graph that does not contain spurious neighbors, but the use of additional constraints leaves the user with an additional degree of freedom to define when creating a manifold.

Along with other groups [60], [61], [62], the Rutgers Laboratory for Computational Imaging and Bioinformatics (LCIB) group has been working on developing NLDR schemes that have been shown to be more resistant to some of the failings of LLE [50] and Isomap [49]. C-Embed is a consensus NLDR scheme that [54], [63], [64], [65] combines multiple low dimensional multi-dimensional projections of the data to obtain a more robust low dimensional data representation, one which is not sensitive to careful selection of the neighborhood parameter (κ), unlike LLE and Isomap. These schemes [11], [63], [65], [66], [67], [68], [69] allow for non-linearly transforming each of the k individual high dimensional heterogeneous modalities into the common format of low dimensional embedding vectors thereby enabling direct, data-level fusion of structural, functional, metabolic, architectural, genomic, and proteomic information in the original space while overcoming the differences in scale, size, and dimensionality of individual feature spaces. This integrated representation of multiple modalities in the transformed space can be used to train meta-classifiers for studying and predicting biological activity.

While a diagnostic marker identifies diseased from normal tissue, a prognostic marker identifies subgroups of patients associated with different disease outcomes. With increasing early detection of diseases via improved diagnostic imaging methodologies [21], [64], [65], [69], [70], [71], [72], [73], it has become important to predict biologic behaviors and disease “aggressiveness”. Clinically applicable prognostic markers are urgently needed to assist in the selection of optimal therapy. In the context of prostate cancer (PCa), well established prognostic markers include histologic grade, prostate specific antigen (PSA), margin positivity, pathologic stage, intra-glandular tumor extent, and DNA ploidy [74], [75], [76]. Other recently promising prognostic indicators include tumor suppressor gene p53, cell proliferation marker Ki-67, Oncoantigen 519, microsatellite instability, angiogenesis and tumor vascularity (TVC), vascular endothelial growth factor (VEGF), and E-cadherin [76], [77]. None of these factors, however, have individually proven to be accurate enough to serve routinely as a prognostic marker [77], [78]. The problem is that men with early detected PCa have in 50% of cases [79], and in some cases 80% [80], a homogeneous pattern with respect to most standard prognostic variables (PSA < 10, T1c, Gleason score < 7). In this growing group of patients, the traditional markers seem to lose their efficacy and the subsequent therapy decision is complicated. Gao et al. [81] suggest that only a combination of multiple prognostic markers will prove superior to any individual marker. Graefen et al. [82] and Stephenson et al. [83], [84], [85] have suggested that better prognostic accuracy can be obtained by a combination of the individual markers via a machine classifier like an artificial neural network.

Graphs are effective techniques to represent spatial arrangement of structures by defining a large set of topological features. These features are quantified by definition of computable metrics. The use of spatial-relation features for quantifying cellular arrangement was proposed in the early 1990s [86], [87], but did not find application to biomedical imagery until recently [88], [89], [90], [91], [92], [93], [94]. However, with recent evidence demonstrating that for certain classes of tumors, tumor–host interactions correlate with clinical outcome [95], graph algorithms clearly have a role to play in modeling the tumor–host network and hence in predicting disease outcome.

Table 1 lists common spatial, graph based features that one can extract from the Voronoi Diagram (VD), Delaunay Triangulation (DT), and the Minimum Spanning Tree (MST) [96], [97], [98]. Additionally a number of features based off nuclear statistics can be similarly extracted. Using the nuclear centroids in a tissue region (Fig. 1(a)) as vertices, the DT graph (Fig. 1(b)), a unique triangulation of the centroids, and the MST (Fig. 1(c)), a graph that connects all centroids with the minimum possible graph length, can be constructed. These features quantify important biological information, such as the proliferation and structural arrangement of the cells in the tissue, which is closely tied to cancerous activity. Our hypothesis is that the genetic descriptors that define clinically relevant classes of cancer are reflected in the visual characteristics of the cellular morphology and tissue architecture, and that these characteristics can be measured by image analysis techniques. We believe that image-based classifiers of disease developed via comprehensive analysis of quantitative image-based information present in tissue histology will have strong correlation with gene-expression based prognostic classification.

At LCIB in Rutgers University, we have been developing an array of computerized image analysis and high dimensional data analysis, fusion tools for quantitatively integrating molecular features of a tumor (as measured by gene expression profiling or mass spectrometry) [54], [99], results from the imaging of the tumor cellular architecture and microenvironment (as captured in histological imaging) [6], [9], the tumor 3-d tissue architecture [100], and its metabolic features (as seen by metabolic or functional imaging modalities such as Magnetic Resonance Spectroscopy (MRS)) [21], [64], [65], [69], [70], [71], [72], [73]. In this paper, we briefly describe 4 representative and ongoing projects at LCIB in the context of predicting outcome of breast and prostate cancer patients and involving computerized image, data analysis and fusion of quantitative measurements from digitized histopathology, and protein expression features obtained via mass spectrometry. Preliminary data pertaining to these projects is also presented.

Section snippets

Image-based risk score for ER+ breast cancers

The current gold standard for achieving a quantitative and reproducible prognosis in estrogen receptor-positive breast cancers (ER+ BC) is via the Oncotype DX (Genomic Health, Inc.) molecular assay, which produces a Recurrence Score (RS) between 0 and 100, where a high RS corresponds to a poor outcome and vice versa. In [101], we presented Image-based Risk Score (IbRiS), a novel CAP scheme that uses only quantitatively derived information (architectural features derived from spatial arrangement

Lymphocytic infiltration and outcome in HER2+ breast cancers

The identification of phenotypic changes in BC histopathology with respect to corresponding molecular changes is of significant clinical importance in predicting BC outcome. One such example is the presence of lymphocytic infiltration (LI) in BC histopathology, which has been correlated with nodal metastasis and distant recurrence in human epidermal growth factor amplified (HER2+) breast cancers.

In [103], [104], we introduced a computerized image analysis system for detecting and grading the

Automated Gleason grading on prostate cancer histopathology

PCa is diagnosed in over 200,000 people and causes 27,000 deaths in the US annually. However, the five-year survival rate for patients diagnosed at an early stage of tumor development is very high [106], [107]. If PCa is found on a needle biopsy, the tumor is then assigned a Gleason grade (1–5) [6], [9]. Gleason grade 1 tissue is highly differentiated and non-infiltrative while grade 5 is poorly differentiated and highly infiltrative. Gleason grading is predominantly based on tissue

Integrated proteomic, histological signatures for predicting prostate cancer recurrence

Following radical prostatectomy (RP), there remains a substantial risk of disease recurrence (estimated at 25–40%) [109]. Studies have identified infiltration beyond the surgical margin, and high Gleason score as possible predictors of prostate cancer recurrence. However, owing to inter-observer variability in Gleason grade determination, cancers identified with the same Gleason grade could have significantly different outcomes [110]. Discovery of a predictive biomarker for outcome following RP

Concluding remarks

In this paper we briefly described some of the primary challenges in the quantitative fusion of multi-scale, multi-modal data for building prognostic meta-classifiers for predicting treatment response and patient outcome. We also described some of the ongoing efforts at the Laboratory for Computational Imaging and Bioinformatics (LCIB) at Rutgers University to address some of these computational challenges in personalized therapy and highlighted ongoing projects in computer-aided prognosis of

Acknowledgments

This work was supported by the Wallace H. Coulter Foundation, the National Cancer Institute under Grants R01CA136535, R01CA140772, R03CA143991, the Cancer Institute of New Jersey, and the Department of Defense (W81XWH-08-1-0145).

References (112)

J. Monaco et al.
Pairwise probabilistic models for markov random fields: detecting prostate cancer from digitized whole-mount histopathology
Med Image Anal
(2010)
J. Cizek et al.
Fast and robust registration of PET and MR images of human brain
Neuroimage
(2004)
T. Liu et al.
Brain tissue segmentation based on DTI data
Neuroimage
(2007)
C.R. Mascott et al.
Image fusion of fluid-attenuated inversion recovery magnetic resonance imaging sequences for surgical image guidance
Surg Neurol
(2007)
I. Bloch et al.
Representation and fusion of heterogeneous fuzzy information in the 3D space for model-based structural recognition – application to 3D brain imaging
Artif Intell
(2003)
A. Devos et al.
The use of multivariate MR imaging intensities versus metabolic data from MR spectroscopic imaging for brain tumour classification
J Magn Reson
(2005)
O. Samko et al.
Selection of the optimal parameter value for the Isomap algorithm
Pattern Recogn Lett
(2006)
G. Wen et al.
Using locally estimated geodesic distance to optimize neighborhood graph for isometric data embedding
Pattern Recogn
(2008)
M Graefen et al.
Can predictive models for prostate cancer patients derived in the United States of America be utilized in European patients? A validation study of the partin tables
Eur Urol
(2003)
J. Sudbo et al.
Prognostic value of graph theory-based tissue architecture analysis in carcinomas of the tongue
Lab Invest
(2000)

A. Madabhushi et al.

Computer-aided prognosis: predicting patient and disease outcome via multi-modal image analysis

IEEE Int Symp Biomed Imaging (ISBI)

(2010)

A. Janowczyk et al.

Hierarchical normalized cuts: unsupervised segmentation of vascular biomarkers from ovarian cancer tissue microarrays

Med Image Comput Comput Assist Interv

(2009)

A. Madabhushi et al.

Integrated diagnostics: a conceptual framework with examples, Clinical Chemistry and Laboratory Medicine

Clin Chem Lab Med

(2010)

A. Madabhushi

Digital pathology image analysis: opportunities and challenges

Imaging Med

(2009)

S. Agner et al.

A comprehensive multi-attribute manifold learning scheme-based computer aided diagnostic system for breast MRI

S. Doyle et al.

A boosted Bayesian multi-resolution classifier for prostate cancer detection from digitized needle biopsies

IEEE Trans Biomed Eng

(2010)

A. Madabhushi et al.

Graph embedding to improve supervised classification: detecting prostate cancer

R.E. Lenkinski et al.

An illustration of the potential for mapping MRI/MRS parameters with genetic over-expression profiles in human prostate cancer

Magma

(2008)

D. Juan et al.

Identification of a MicroRNA panel for clear-cell kidney cancer

Urology

(2009)

G. Lexe et al.

Towards improved cancer diagnosis and prognosis using analysis of gene expression data and computer aided imaging

Exp Biol Med (Maywood)

(2009)

F. Pene et al.

Toward theragnostics

Crit Care Med

(2009)

G. Lippi

Wisdom of theragnostics, other changes

MLO Med Lab Obs

(2008)

V. Ozdemir et al.

Mapping translational research in personalized therapeutics: from molecular markers to health policy

Pharmacogenomics

(2007)

A.R. Mirza

An architectural selection framework for data fusion in sensor platforms

D.L. Hall

Perspectives on the fusion of image and non-image data

L. Breiman

Bagging predictors

Mach Learn

(1996)

C.A. Burges

Tutorial on support vector machines for pattern recognition

Data Min Knowl Discov

(1998)

R.S.Y. Freund

Experiments with a new boosting algorithm in proceedings of national conference

Mach Learn

(1996)

A. Madabhushi et al.

Optimally combining 3D texture features for automated segmentation of prostatic adenocarcinoma from high resolution MR images

A. Madabhushi et al.

Automated detection of prostatic adenocarcinoma from high resolution ex vivo MRI

IEEE Trans Med Imaging

(2005)

A. Madabhushi et al.

Comparing classification performance of feature ensembles: detecting prostate cancer from high resolution MRI

T. Twellmann et al.

Image fusion for dynamic contrast enhanced magnetic resonance imaging

Biomed Eng Online

(2004)

J.L. Jesneck et al.

Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis

Med Phys

(2006)

P.E.H.R.O. Duda

Pattern classification and scene analysis

(1973)

A.W. Smeulders et al.

An analysis of pathology knowledge and decision making for the development of artificial intelligence-based consulting systems

Anal Quant Cytol Histol

(1989)

S. Dube et al.

Content based image retrieval for MR image studies of brain tumors

Conf Proc IEEE Eng Med Biol Soc

(2006)

R.A. Heckemann et al.

Multiclassifier fusion in human brain MR segmentation: modelling convergence

Med Image Comput Comput Assist Interv Int Conf

(2006)

S. Hunsche et al.

Combined X-ray and magnetic resonance imaging facility: application to image-guided stereotactic and functional neurosurgery

Neurosurgery

(2007)

T. Rohlfing et al.

Information fusion in biomedical image analysis: combination of data vs. combination of interpretations

Inf Process Med Imaging

(2005)

T.Z. Wong et al.

PET and brain tumor image fusion

Cancer J

(2004)

E. Bruno et al.

Design of multimodal dissimilarity spaces for retrieval of video documents

IEEE Trans Pattern Anal Mach Intell

(2008)

G.R. Lanckriet et al.

Kernel-based data fusion and its application to protein function prediction in yeast

Pac Symp Biocomput

(2004)

D.P. Lewis et al.

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure

Struct Bioinform

(2006)

D.P. Mandic et al.

Sequential data fusion via vector spaces: fusion of heterogeneous data in the complex domain

J VLSI Signal Process

(2007)

S. Coppock et al.

K.J. Friston et al.

Functional topography: multidimensional scaling and functional connectivity in the brain

Cereb Cortex

(1996)

J. Ye et al.

Heterogeneous data fusion for alzheimer's disease study

S. Rao et al.

Evaluating distance functions for clustering tandem repeats

Genome Inform

(2005)

A.W. Simonetti et al.

Combination of feature-reduced MR spectroscopic and MR imaging data for improved brain tumor classification

NMR Biomed

(2005)

A.W. Simonetti et al.

A chemometric approach for brain tumor classification using magnetic resonance imaging and spectroscopy

Anal Chem

(2003)

Cited by (108)

A new approach for cancer prediction based on deep neural learning
2023, Journal of King Saud University - Computer and Information Sciences
We know today that numerous factors play a significant role as causes of cancer. Because of this, a doctor's opinion alone cannot be used to classify cancer. Intelligent algorithms providing medical assistance are therefore necessary. In addition, many researchers have adopted them for estimating the likelihood of patient survival, and others have employed predictive methodologies like machine learning and deep learning to forecast prognoses for cancer. The accuracy of predictive cancer prognosis is currently of widespread concern. Since deep neural learning (DNL) methods can quickly predict outcomes from a significant amount of clinical and genetic data, they are essential for predicting various diseases. Deep neural learning is the foundation of our suggested approach. Our deep neural learning cancer prediction model (DNLC) has the following stages. In the first stage, Deep Network (DN) is used to select the best collection of features from datasets. In the second stage, we train genomic or clinical data samples with a deep neural network (DNN). In the third stage, we evaluate the capabilities of the DNLC model of predicting cancer in its earlier stages. For classification, DNLC uses five cancer datasets, which are for colon, lung adenocarcinoma, squamous cell carcinoma, breast, and leukaemia cancers. The five cancer datasets are used in experiments to predict how well the suggested model will perform. The dataset is divided into two parts: training sets, which make up 80% of the dataset, and testing sets, which make up 20%. The experimental results show that the suggested model performs better in terms of accuracy than earlier CNN and RNN models. Our findings demonstrate that the DNLC technique, with an average accuracy of 93%, outperforms other methods in all circumstances.
Information fusion and artificial intelligence for smart healthcare: a bibliometric study
2023, Information Processing and Management
With the fast progress in information technologies and artificial intelligence (AI), smart healthcare has gained considerable momentum. By using advanced technologies like AI, smart healthcare aims to promote human beings’ health and well-being throughout their life. As smart healthcare develops, big healthcare data are produced by various sensors, devices, and communication technologies constantly. To deal with these big multi-source data, automatic information fusion becomes crucial. Information fusion refers to the integration of multiple information sources for obtaining more reliable, effective, and precise information to support optimal decision-making. The close study of information fusion for healthcare with the adoption of advanced AI technologies has become an increasingly important and active field of research. The aim of this is to present a systematic description and state-of-the-art understanding of research about information fusion for healthcare with AI. Structural topic modeling was implemented to detect major research topics covered within 351 relevant articles. Annual trends and correlations of the identified topics were also investigated to identify potential future research directions. In addition, the primary research concerns of top countries/regions, institutions, and authors were shown and compared. The findings based on our analyses provide scientific and technological perspectives of research on information fusion for smart health with AI and offer useful insights and implications for its future development. We also provide valuable guidance for researchers and project managers to allocate research resources and promote effective international collaborations.
The effect of multilinear data fusion on the accuracy of multivariate curve resolution outputs
2022, Analytica Chimica Acta
Vast evolution in the analytical instruments (e.g. Chemical imaging) makes higher-order data recording more facile. Such sophisticated data acquisition schemes, definitely call for updated data treatment tools. In this way, chemometrics has a decisive role in the analytical chemistry area not only to analyze, but also to expect such data generation possibilities, e.g. combining data coming from instruments with different orders. “Partial trilinearity constraint” is specially planned to cope with differences in bi-and-trilinear models. In this research, we tried to shed light on the accuracy of the mixed multilinear multimodal analysis under partial trilinearity constraint and its conjugation with force-to-zero constraint. For this, several numerical and real experiments are designed by changing the number of components in each block. Finally, some practical guidelines are provided for the analytical chemists.
Hyper-graph based sparse canonical correlation analysis for the diagnosis of Alzheimer's disease from multi-dimensional genomic data
2021, Methods
The effective and accurate diagnosis of Alzheimer's disease (AD), especially in the early stage (i.e., mild cognitive impairment (MCI)) remains a big challenge in AD research. So far, multiple biomarkers have been associated with AD diagnosis and progression. However, most of the existing research only utilized single modality data for diagnostic biomarker identification, which did not take the advantages of multi-modal data that provide comprehensive and complementary information at multiple levels into consideration. In this paper, we integrate multi-modal genomic data from postmortem AD brains (i.e., mRNA, miRNA and epigenomic data) and propose a hyper-graph based sparse canonical correlation analysis (HGSCCA) method to extract the most correlated multi-modal biomarkers associated with AD and MCI. Specifically, our model utilizes the sparse canonical correlation analysis framework (SCCA), which aims at finding the best linear projections for each input modality so that the strongest correlation within the selected features of multi-dimensional genomic data can be captured. In addition, with the consideration of high-order relationships among different subjects, we also introduce a hyper-graph-based regularization term that will lead to the selection of more discriminative biomarkers. To evaluate the effectiveness of the proposed method, we conduct the experiments on the well-known AD cohort study, The Religious Orders Study and Memory and Aging Project (ROSMAP) dataset, and the results show that our method can not only identify meaningful biomarkers for the diagnosis AD disease, but also achieve superior classification performance than the comparing methods.
Yottixel – An Image Search Engine for Large Archives of Histopathology Whole Slide Images
2020, Medical Image Analysis
Citation Excerpt :
In fact, CAD is now integral to many clinical routines for diagnostic radiology and recently becoming eminent in diagnostic pathology as well. With an increase in the workload of pathologists, there is a compelling need to integrate CAD systems into pathology routines (Komura and Ishikawa, 2018; Madabhushi and Lee, 2016; Madabhushi et al., 2011; Gurcan et al., 2009a). Researchers in both image analysis and pathology fields have recognized the importance of the quantitative analysis of pathology images by using machine learning (ML) techniques (Gurcan et al., 2009a).
With the emergence of digital pathology, searching for similar images in large archives has gained considerable attention. Image retrieval can provide pathologists with unprecedented access to the evidence embodied in already diagnosed and treated cases from the past. This paper proposes a search engine specialized for digital pathology, called Yottixel, a portmanteau for “one yotta pixel,” alluding to the big-data nature of histopathology images. The most impressive characteristic of Yottixel is its ability to represent whole slide images (WSIs) in a compact manner. Yottixel can perform millions of searches in real-time with a high search accuracy and low storage profile. Yottixel uses an intelligent indexing algorithm capable of representing WSIs with a mosaic of patches which are then converted into barcodes, called “Bunch of Barcodes” (BoB), the most prominent performance enabler of Yottixel. The performance of the prototype platform is qualitatively tested using 300 WSIs from the University of Pittsburgh Medical Center (UPMC) and 2,020 WSIs from The Cancer Genome Atlas Program (TCGA) provided by the National Cancer Institute. Both datasets amount to more than 4,000,000 patches of 1000 × 1000 pixels. We report three sets of experiments that show that Yottixel can accurately retrieve organs and malignancies, and its semantic ordering shows good agreement with the subjective evaluation of human observers.
Artificial intelligence and the interplay between tumor and immunity
2020, Artificial Intelligence and Deep Learning in Pathology
Digital pathology image analysis and deep learning can be utilized to quantify and characterize nuanced interactions between cancer and the immune system. Recent advances in deep learning and artificial intelligence in Pathomics data have led to the development of methods and techniques that augment and empower qualitative traditional diagnostic histopathologic evaluation in order to substantially accelerate cancer research. Emerging digital pathology and deep learning applications can (1) stratify patient management through data-driven insights into cancer, (2) identify relevant biomarkers to predict clinical outcomes and treatment response, (3) enhance our collective understanding of cancer biology to motivate the utilization of novel therapeutic approaches. This chapter introduces and describes a selected set of novel Pathomics-based deep learning methods that have been developed to classify and reproducibly quantify the interplay between tumor cells and the immune response in the tumor microenvironment.

View all citing articles on Scopus

^☆: A preliminary version of this paper appeared in [1].

¹: http://lcib.rutgers.edu.

View full text

Computer-aided prognosis: Predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data☆

Abstract

Introduction

Section snippets

Image-based risk score for ER+ breast cancers

Lymphocytic infiltration and outcome in HER2+ breast cancers

Automated Gleason grading on prostate cancer histopathology

Integrated proteomic, histological signatures for predicting prostate cancer recurrence

Concluding remarks

Acknowledgments

Med Image Anal

Neuroimage

Neuroimage

Surg Neurol

Artif Intell

J Magn Reson

Pattern Recogn Lett

Pattern Recogn

Eur Urol

Lab Invest

Computer-aided prognosis: predicting patient and disease outcome via multi-modal image analysis

IEEE Int Symp Biomed Imaging (ISBI)

Hierarchical normalized cuts: unsupervised segmentation of vascular biomarkers from ovarian cancer tissue microarrays

Med Image Comput Comput Assist Interv

Integrated diagnostics: a conceptual framework with examples, Clinical Chemistry and Laboratory Medicine

Clin Chem Lab Med

Digital pathology image analysis: opportunities and challenges

Imaging Med

A comprehensive multi-attribute manifold learning scheme-based computer aided diagnostic system for breast MRI

A boosted Bayesian multi-resolution classifier for prostate cancer detection from digitized needle biopsies

IEEE Trans Biomed Eng

Graph embedding to improve supervised classification: detecting prostate cancer

An illustration of the potential for mapping MRI/MRS parameters with genetic over-expression profiles in human prostate cancer

Magma

Identification of a MicroRNA panel for clear-cell kidney cancer

Urology

Towards improved cancer diagnosis and prognosis using analysis of gene expression data and computer aided imaging

Exp Biol Med (Maywood)

Toward theragnostics

Crit Care Med

Wisdom of theragnostics, other changes

MLO Med Lab Obs

Mapping translational research in personalized therapeutics: from molecular markers to health policy

Pharmacogenomics

An architectural selection framework for data fusion in sensor platforms

Perspectives on the fusion of image and non-image data

Bagging predictors

Mach Learn

Tutorial on support vector machines for pattern recognition

Data Min Knowl Discov

Experiments with a new boosting algorithm in proceedings of national conference

Mach Learn

Optimally combining 3D texture features for automated segmentation of prostatic adenocarcinoma from high resolution MR images

Automated detection of prostatic adenocarcinoma from high resolution ex vivo MRI

IEEE Trans Med Imaging

Comparing classification performance of feature ensembles: detecting prostate cancer from high resolution MRI

Image fusion for dynamic contrast enhanced magnetic resonance imaging

Biomed Eng Online

Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis

Med Phys

Pattern classification and scene analysis

An analysis of pathology knowledge and decision making for the development of artificial intelligence-based consulting systems

Anal Quant Cytol Histol

Content based image retrieval for MR image studies of brain tumors

Conf Proc IEEE Eng Med Biol Soc

Multiclassifier fusion in human brain MR segmentation: modelling convergence

Med Image Comput Comput Assist Interv Int Conf

Combined X-ray and magnetic resonance imaging facility: application to image-guided stereotactic and functional neurosurgery

Neurosurgery

Information fusion in biomedical image analysis: combination of data vs. combination of interpretations

Inf Process Med Imaging

PET and brain tumor image fusion

Cancer J

Design of multimodal dissimilarity spaces for retrieval of video documents

IEEE Trans Pattern Anal Mach Intell

Kernel-based data fusion and its application to protein function prediction in yeast

Pac Symp Biocomput

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure

Struct Bioinform

Sequential data fusion via vector spaces: fusion of heterogeneous data in the complex domain