IMENSE: An e-infrastructure environment for patient specific multiscale data integration, modelling and clinical treatment

https://doi.org/10.1016/j.jocs.2011.07.001Get rights and content

Abstract

Secure access to patient data and analysis tools to run on that data will revolutionize the treatment of a wide range of diseases, by using advanced simulation techniques to underpin the clinical decision making process. To achieve these goals, suitable e-Science infrastructures are required to allow clinicians and researchers to trivially access data and launch simulations. In this paper we describe the open source Individualized MEdiciNe Simulation Environment (IMENSE), which provides a platform to securely manage clinical data, and to perform wide ranging analysis on that data, ultimately with the intention of enhancing clinical decision making with direct impact on patient health care. We motivate the design decisions taken in the development of the IMENSE system by considering the needs of researchers in the ContraCancrum project, which provides a paradigmatic case in which clinicians and researchers require coordinated access to data and simulation tools. We show how the modular nature of the IMENSE system makes it applicable to a wide range of biomedical computing scenarios, from within a single hospital to major international research projects.

Highlights

► Clinicians face making decisions based on analysis of many different types of data. ► Integrating disparate data sources can assist the clinical decision making process. ► The IMENSE system integrates different types of clinical data from multiple sources. ► Clinicians and researchers can use the system to perform complex analysis techniques.

Introduction

Patient specific medical simulation holds the promise of revolutionizing medical treatment [1], and is at the heart of the Virtual Physiological Human (VPH) initiative [2]. The ContraCancrum (Clinically Oriented Translational Cancer Multilevel Modelling) VPH project [3] aims to have an impact primarily in the provision of (a) a better understanding of the natural phenomenon of cancer at different levels of biocomplexity and (b) a disease treatment optimization procedure in the patient's individualized context by simulating the response to various therapeutic regimens. The modelling and simulation techniques used rely on multiple forms of patient derived data, including imaging, histopathological, molecular and clinical data. Fundamental biological mechanisms involved in tumour development and tumour and normal tissue treatment response such as metabolism, cell cycle, tissue mechanics and cell survival following treatment are modelled. From a theoretical point of view, the simulation techniques used exploit several discrete and continuous mathematical methods such as cellular automata, Monte Carlo techniques, finite elements, differential equations and molecular dynamics.

The ContraCancrum project is using two cancer types as exemplars with which to validate the simulation techniques: glioblastomas and lung carcinoma. To support this work, an information technology environment, the Individualized MEdiciNe Simulation Environment (IMENSE), which we describe in detail in this paper, has been developed to provide a single portal from which researchers and clinicians can access large quantities of heterogeneous patient data and use it to launch simulation workflows based on this data, in order to assess the efficacy of different courses of treatment on individual patients. While the IMENSE system has been developed to support the needs of ContraCancrum clinicians and researchers, we will demonstrate that the system is generic and flexible enough to support a wide range of clinical computing scenarios. Moreover its standards based, open source architecture means that it can be easily taken up by a broad range of different communities.

The problem of sharing clinical data presents a major hurdle if patient specific medical simulation is to be incorporated into clinical practice, and for the facilitation of research using that data. Even the data sources themselves held by hospitals represent a major resource that is currently not adequately exploited, either by researchers or clinicians. Patient data collected as part of routine clinical practice, and which is used as input to the simulation techniques used in ContraCancrum, initially resides in information systems based within the hospital where the data was obtained. This data includes medical images obtained through techniques such as magnetic resonance imaging (MRI) or computed tomography (CT), biopsy microphotographs, DNA sequence data and records of clinical treatment regimes.

The data held by clinical data systems can be used in two different ways: (1) the composition of large, (pseudo)-anonymized datasets from multiple sources, used to perform inductive reasoning or clinical trials; (2) data obtained from a single patient used to run a workflow in support of a clinical decision making process. At the heart of ContraCancrum is the need to gain access to these distributed data sources in a routine, transparent way, following appropriate anonymization and security procedures. While solutions exist to enable access to federated, distributed data sources, in many cases these are neither appropriate nor acceptable to a hospital's Information and Communication Technologies (ICT) policies, are too complicated to achieve widespread uptake, or not generic enough to be used in anything other than the narrow scenarios for which they were originally developed [4].

There are also many problems related to the disparate nature of the data sources; data on the same clinical pathologies is stored in different formats by different hospitals, with some data fields stored by some hospitals and not others. The quality of data is also a problem: in some clinical environments, certain data fields may not be stored routinely, leaving the available data incomplete. We can view these types of problems as issues related to the curation of the data.

In order to address these problems in the ContraCancrum project, a data sharing system has been developed to support both of ways of using clinical data; it is designed to be acceptable to hospital ICT managers, comply with relevant data protection legislation and deal with the problem of data curation. As ContraCancrum involves partners from across the European Union, the system must provide a means to share data across international borders, acting as a central warehouse for anonymized patient data. Since the clinical managers of the data are the best people to decide what is shared and ensure the data is well curated, a clinician or clinical worker decides on a set of data to be shared and ensures that the data is curated to an acceptable level. The data is then pushed from hospital systems to a central project data repository. This removes the need to open inbound holes in hospital firewalls in order to create federated databases. When a user wants to gain access to the data, he can do so by retrieving it from the central repository.

We describe the process used to define the requirements of the IMENSE system in Section 2 and then go on to describe the various components of the system in Section 3.

There are many different approaches to building electronic health record and hospital information systems. Systems such as PatientCentre from iSOFT [5] are deployed within an individual hospital to manage patient data, although the GP2GP system [6] in the UK allows GP practices to transfer records amongst themselves. Although there are few standards for integrated electronic health record systems, those such as HL7 [7] allow data to be transferred between systems. Online ‘cloud’ based systems such as Microsoft HealthVault [8] and Google Health [9] allow individual patients to store and manage their own health records via an online service.

Clinical data management systems are designed specifically to manage data coming from clinical trials, and thus are often used to federate data from multiple administrative domains. Systems such as the IBM Cognos platform [10] provide business intelligence services to pharmaceutical and life sciences companies conducting clinical trials. Microsoft's Amalga system [11] brings historically disparate data together and makes it easy to search and gain insight from that data.

However, while these systems are designed to manage and integrate large amounts of data (potentially all of the patients treated in a hospital), they do not generally deal with sharing patient data for research purposes between multiple institutions, which could potentially be located in different countries. In addition, traditional electronic health care record systems do not go beyond data management to provide advanced analysis and decision support capabilities, which rely on high performance computing platforms out of the control of the administrators of the data management system.

A number of previous research projects, having similar requirements to ContraCancrum, have developed partial solutions to address the issues of data sharing and analysis, but none of the ones we have encountered provide as comprehensive a solution, focused on clinical needs, as IMENSE.

The @neurIST project [12] worked to develop an integrated decision support system to assess the risk of aneurysm rupture in patients and to optimize their treatments. Many of the components developed by the project are only released under commercial licencing terms, such as @neuEndo, which enables clinicians to model aneurysms. The @neurIST system provides a simple web-based GUI to allow clinicians to evaluate the risks of an aneurysm bursting and also to assess the suitability of a particular stent to use to treat it. The licencing terms alone prevent it from being used by the ContraCancrum project.

ImmunoGrid [13] sought to develop a computer model of the Human Immune System implemented using grid technologies. It integrated processes at molecular, cellular and organ levels (and is therefore multi-scale), and provided a web portal interface to allow users to launch simulations. While the portal is based on the same Application Hosting Environment (AHE) component [14] as is used in IMENSE, the lack of a data storage framework meant it was not suitable as the basis for the ContraCancrum project.

ACGT [15] is an open environment for supporting clinical trials and related research through the use of grid-enabled tools and infrastructure, and to some extent is a precursor to the research conducted in ContraCancrum. The Oncosimulator [16] is an advanced information system, developed by the EU FP6 ACGT project, which aims to simulate the response of tumours and affected normal tissues to therapeutic schemes based on clinical, imaging, histopathological and molecular data of a given cancer patient, in order to optimize cancer treatment on a patient-individualized basis. While Oncosimulator itself is not a unified environment for data sharing and analysis, it will be integrated with the IMENSE system via a workflow.

To some extent, the IMENSE system has built on technologies and ideas developed in these earlier projects, to provide a generic, user oriented open source platform for computational biomedicine.

Section snippets

Investigating user requirements

The IMENSE system is designed to meet the needs of a range of different users, from clinicians to researchers. The ContraCancrum user community provides a paradigmatic case illustrating the needs of the targetted users, and hence the ContraCancrum project has been used to derive the system requirements. The initial task at the design phase of the IMENSE system was to understand who the users are, and what they need to use the system for. This analysis separated potential users into four

Meeting user needs

The requirements described in Section 2 have shaped the development of the IMENSE system, designed to provide clinicians and researchers with access to multiscale clinical data, and to act as a platform from which to launch a range of different simulation tools and workflows to analyze the data on a patient by patient basis.

Clinical data storage

A central component of the IMENSE system is the data warehouse. The data warehouse provides a central facility for all of the data generated in the project. Here we describe the design and implementation of the data warehouse.

Analyzing project data

The goal of ContraCancrum is to apply novel analytical techniques to a range of patient data, in order to furnish clinicians and researchers with information be means of which to make clinical decisions. These simulation techniques are made available as Web services in one of two ways. Either the simulation technique is develop as a Web service in itself, or, in the case where some level of high performance computing is required, a simulation code is exposed as a Web service using the

System interfaces

Data and tools in the IMENSE system are accessed in one of two ways: either through the user oriented web portal, or the application oriented programmatic interface. The ContraCancrum web portal provides a single interface for both searching and viewing the data stored in the data environment, and launching the associated tools. The programmatic interface provides simple web based services by which data can be uploaded, and by which data held in the environment can be discovered and downloaded.

System security

System security is of primary concern, since the data being held is of a personal medical nature. One of the challenges that ContraCancrum addresses is how to provide seamless and secure access to patient data for the benefits of academic, scientific and translational research. As high profile security breaches and data losses are frequent headline news, the protection of medical patient data is of critical importance for the ContraCancrum project. Hence, there is a need for a security

Hardware and software platform

The IMENSE system is modular and can be deployed in a number of configurations, but the configuration deployed in support of the ContraCancrum project runs all of the components in a single virtualized server environment hosted within the Center for Computational Science (CCS) at University College London. A dedicated server is used to run the web portal interface, DICOM server and database server. A second server is used to run the GridSpace workflow tool and the Application Hosting

Using IMENSE

In order to illustrate the interaction between the different components of IMENSE, we consider two exemplar workflows that make use of the system.

Clinical impact

Optimal treatment of patients is still grounded on medical research, which fundamentally needs basic research in biomedicine. Only such research leads to new insights into the etiology and pathobiology of diseases, and the discovery and development of new diagnostic tools, new drugs and new technologies. The knowledge coming from basic research needs to be translated into the daily clinical care of patients. This process is time consuming and costly as clinical patient-oriented research

Conclusions

The ContraCancrum project has complex data storage requirements, with clinical data being produced in several different forms, and further data being derived from the clinical data. Within ContraCancrum we have sought to unify these different data types, and we have produced interface tools to allow users to quickly find relevant data on which to experiment. In addition, these tools have been combined with workflow and Web services environments to provide a rich computational platform on which

Acknowledgements

The development of IMENSE has been supported by the European Commission DG Information Society through the Seventh Framework Programme of Information and Communication Technologies, under grant number 223979. The authors thank the members of the ContraCancrum project for their many helpful contributions which have led to the design and implementation of this system.

Stefan J. Zasada is IT consultant to the Computational Life and Medical Sciences Network, providing IT support and advice. He was previously involved in developing medical data sharing solutions in the ContraCancrum and VPH Network of Excellence projects, and also lightweight grid middleware and enabling tools for e-Science. He has a first degree in Computer Science from the University of Nottingham and a Masters degree in Advanced Software Engineering from the University of Manchester. In

References (55)

  • DolinR. et al.

    The HL7 clinical document architecture

    Journal of the American Medical Informatics Association

    (2001)
  • Microsoft HealthVault,...
  • Google Health...
  • IBM Cognos...
  • Microsoft Amalga...
  • RajasekaranH. et al.

    @neurIST-towards a system architecture for advanced disease management through integration of heterogeneous data, computing, and complex processing services

  • Halling-BrownM. et al.

    A computational grid framework for immunological applications

    Philosophical Transactions of the Royal Society A

    (2009)
  • BrochhausenM. et al.

    The ACGT Master Ontology on Cancer-A New Terminology Source for Oncological Practice

  • StamatakosG.S. et al.

    The “oncosimulator”: a multilevel, clinically oriented simulation system of tumor growth and organism response to therapeutic schemes. towards the clinical evaluation of in silico oncology

  • CooperJ. et al.

    The Virtual Physiological Human ToolKit

    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

    (2010)
  • MildenbergerP. et al.

    Introduction to the dicom standard

    European Radiology

    (2002)
  • RipleyB.D.

    The R project in statistical computing, MSOR Connections. The newsletter of the LTSN Maths

    Stats & OR Network

    (2001)
  • The UNICORE Project,...
  • The Globus Project,...
  • Distributed European Infrastructure for Supercomputing Applications,...
  • US TeraGrid,...
  • NGS, National Grid Service, UK,...
  • Cited by (11)

    • Cloud computing infrastructure for the VPH community

      2018, Journal of Computational Science
      Citation Excerpt :

      The Open Science Data Cloud supports large-scale data networks, including BioNimbus and the Sloan Digital Sky Survey. Finally, dedicated tools and consolidation frameworks have emerged that provide unified interfaces and access to various bioinformatics tools, including Galaxy [15], BioLinux [16] and IMENSE [17]. These frameworks make bioinformatics tools available and accessible to researchers; however they still require a computational infrastructure to run the analyses – the CloudMan [18] and CloudBioLinux [19] projects address this need.

    • Multiscale computing in the exascale era

      2017, Journal of Computational Science
      Citation Excerpt :

      Indeed, multiscale phenomena are everywhere around us [1–7]. If we study the origin and evolution of the universe [8] or properties of materials [9–13], if we try to understand health and disease [3,14–21] or develop fusion as a potential energy source of the future [22], in all these cases and many more we find that processes on quite disparate length and time scales interact in strong and non-linear ways. In short, multiscale modelling is ubiquitous and progress in most of these cases is determined by our ability to design and implement multiscale models of the particular systems under study [1,6,23].

    • Sequence of decisions on discrete event systems modeled by Petri nets with structural alternative configurations

      2014, Journal of Computational Science
      Citation Excerpt :

      There is a large field of application of decision-support systems based on simulation in medicine. For example, Zasadaa et al. [26] describe an open source environment based on simulation in order to support the clinical decision making for the treatment of a wide range of diseases. Before proceeding with a stage of simulation for decision making, it is necessary to develop a model of the given system in a formal language.

    • Multiscale computing for science and engineering in the era of exascale performance

      2019, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
    View all citing articles on Scopus

    Stefan J. Zasada is IT consultant to the Computational Life and Medical Sciences Network, providing IT support and advice. He was previously involved in developing medical data sharing solutions in the ContraCancrum and VPH Network of Excellence projects, and also lightweight grid middleware and enabling tools for e-Science. He has a first degree in Computer Science from the University of Nottingham and a Masters degree in Advanced Software Engineering from the University of Manchester. In addition, he is currently completing his PhD in Computer Science, investigating the design and development of lightweight application virtualization toolkits and market based resource allocation solutions. His research interests cover many aspects of high performance and grid computing, and their application in the medical and life sciences domain.

    Tao Wang was a research assistant at the University of Bedfordshire (UK). He has a BSc in computer science from University of Ulster and an MSc in telecommunication and Internet systems from the University of Ulster (N. Ireland, UK). He has worked in two European projects. He has recently moved to industry and is working as a web developer. His research interests cover may aspects of web services and distributed applications.

    Ali Haidar received an MSc in Information Security from Royal Holloway University of London in 2003, and a PhD in Computer Science (Web Services Security and Formal Methods) from London South Bank University. From September 2008 to May 2009, he was a Research Associate at the Centre for Software Reliability at Newcastle University working on capturing user and security requirements for computational grid environments and providing formal models and analysis of these requirements to assist the design of security prototype. Currently, he is a Research Fellow at the Centre for Computational Science at UCL working on information security and assurance aspects of several EU projects and the new UCL Computational Life and Medical Sciences (CLMS) Network.

    Enjie Liu is a senior lecturer in Computing at the University of Bedfordshire (UK). She has a BSc in computer science from Northwest University in China and a PhD in telecommunications from Queen Mary, University of London. She has been involved in two previous European projects, one EPSRC project, and is currently involved in 3 European projects. Her research interests in web services focused in the area of semantic web, especially in services composition and data publishing/searching on the web.

    Norbert Graf is Professor of Paediatrics and Director of the Clinic for Paediatric Oncology and Haematology, a member of the Faculty of Medicine of Saarland University in Germany. Prof. Graf is a member of the German Society of Paediatrics, the Austrian Society of Paediatrics, the Berufsverband of Paediatricians in Germany, the German Society of Paediatric Oncology and Haematology (GPOH), the German Cancer Society, the Cancer Society of the Saarland, Germany, the German TNM Committee, representative for the German Paediatricians, the European Bone Marrow Transplantation, the International Society of Paediatric Oncology (SIOP), the Paediatric Society of Bone Marrow and Stem Cell Transplantation, the European Haematology Association and is an Associate Member of COG (Children's Oncology Group, North America). He is a member in the Editorial Boards of the Cochrane Childhood Cancer Review Group and the Pediatric Blood & Cancer. Prof. Graf is the chairman of the SIOP Renal Tumour Study Group (SIOPRTSG) and a member of the international study committee on SIOP Brain Tumour. He is also a member of GPOH study groups on Osteosarcoma (COSS 86, COSS 91, COSS 96, EURAMOS), Informatics in Paediatric Oncology, Non Hodgkin Lymphoma (NHLBFM 90, NHL BFM 95), Brain tumour studies (HIT 91, HIT SKK 92, HIT Rez 97, HIT LGG(SIOP)), Nephroblastoma (SIOP 93 01/GPOH and SIOP 2001/GPOH) (chairman), Acute Myelogenous Leukaemia (AML BFM 98, 2004).

    Gordon Clapworthy is Professor of Computer Graphics at the University of Bedfordshire (UK). He is the Head of Centre for Computer Graphics and Visualisation (CCGV). He has a BSc (Class 1) in Mathematics and a PhD in Aeronautical Engineering from the University of London, and an MSc (dist.) in Computer Science from City University (London). His research interests centre on computer graphics and visualisation but he becomes involved in a number of other areas on occasion. He has been involved in 27 European projects, coordinating 9 of them.

    Steven Manos is the Manager of Research Services in the Information Technology Services unit at the University of Melbourne. Previously he was a postdoctoral research fellow at the Centre for Computational Science, University College London. He completed his PhD at the University of Sydney, Australia, which explored the design of novel microstructured optical fibres using genetic algorithms and genetic representations capable of evolving designs with variable complexity. His research interests include the use of data mining, visualisation and genetic algorithms in the design of new ceramic materials, and blood flow simulation and visualisation as a clinical tool for neurosurgeons. His general research interests include automated computational design, and the use of grid computing from an applied, real world perspective.

    Peter V. Coveney holds a Chair in Physical Chemistry and is Director of the Centre for Computational Science (http://ccs.chem.ucl.ac.uk/), an Honorary Professor in Computer Science and a member of CoMPLEX at UCL. He is also Professor Adjunct within the Yale School of Medicine at Yale University, and Director of the UCL Computational Life & Medical Sciences Network (http://www.clms.ucl.ac.uk/). Coveney is active in a broad area of interdisciplinary theoretical research including condensed matter physics and chemistry, materials science, life and medical sciences including collaborations with clinicians. Coveney is PI of the Virtual Physiological Human (VPH) Network of Excellence. He is also leading the UK e-Science Minitheme which is developing the UK Strategy for a Research Computing Ecosystem. In 2010, Coveney was awarded funding for the EU FP7 p-medicine project which aims to develop a distributed data warehouse to store and exchange clinical data in heterogeneous formats, as well as the VPH-Share project which seeks to develop lightweight middleware that will simplify access to high performance computing resources for data-intensive and data-driven projects. In addition to major participation in the EU FP7 MAPPER project on multiscale modelling on EU infrastructures, he is also active in the new EU FP7 EUDAT and CRESTA projects, which aim respectively to build a persistent European data infrastructure and to evolve petascale codes to the exascale.

    1

    Now at Information Technology Services, University of Melbourne, Australia.

    View full text