Elsevier

Journal of Web Semantics

Volume 4, Issue 3, September 2006, Pages 207-215
Journal of Web Semantics

A semantic web approach to biological pathway data reasoning and integration

https://doi.org/10.1016/j.websem.2006.05.005Get rights and content

Abstract

This paper describes the use of semantic web technology and Description Logic (DL) for facilitating the integration of molecular pathway data, which is illustrated by an Web Ontology Language (OWL)-based transformation of a more complex pathway structure (Reactome) into a simpler one (HPRD). The process starts by adding OWL axioms to BioPAX, a pathway interchange standard. The axioms are designed for mapping BioPAX-formatted Reactome interactions to “molecular binding event” interactions, which can be easily aligned with the HPRD data. Using an automated OWL reasoner, we find overlapping and non-overlapping molecular interactions between the two pathway datasets. The paper demonstrates the potential of semantic web and its enabling technologies in biological pathway data reasoning and integration.

Introduction

According to the 2005 update on the molecular biology database collection [1], there are 719 publicly accessible life sciences databases (which are web-accessible). Many of these databases contain information about biological pathways (e.g., KEGG [2], Reactome [3], and BIND [4]). In the post-genomic era, life scientists have geared towards the understanding of how genes or proteins interact with each other at the systems level [5], [6], [7], [8]. As the number and diversity of these pathway databases continue to expand, there is a growing need to compare, validate, and integrate data across these databases. Cross-database pathway analyses have benefited a variety of biomedical and drug-related research including studies that aim at understanding disease mechanisms at the molecular level [9], [10], and studies in drug discovery [11], [12]. Despite these successes, there remain major difficulties in integrating and analyzing data across different pathway databases.

As the eXtensible Markup Language (XML) has become the lingua franca for representing different types of biological data [13], there has been a proliferation of semantically-overlapping XML formats that are used to represent diverse types of pathway data. Examples include the XML-derivatives KGML (http://www.genome.jp/kegg/xml/), SBML [14], CellML [15], PSI MI [16], BIND XML [17], and Genome Object Net XML [18]. Efforts have been underway to translate between these formats (e.g., between PSI MI and BIND XML, and between Genome Object Net and SBML [19]). However, the complexity of such a pair-wise translation approach increases dramatically with a growing number of different pathway data formats. To address this issue, a standard pathway data exchange format is needed. While the Resource Description Framework (RDF) (http://www.w3.org/RDF/) is an important first step towards the unification of XML formats in describing metadata (ontologies), it is not expressive enough to support formal knowledge representation. To address this problem, more sophisticated XML-based ontological languages such as the Web Ontology Language (OWL) [20] have been developed. An OWL-based pathway exchange standard, called BioPAX, has been released to the research community (http://www.biopax.org/).

Different pathway databases may model pathway data with different levels of details. For example, one database might treat “valine, leucine, and isoleucine biosynthesis” as a single process, while another database might treat these as separate processes. Also, one database might consider the deactivation of AKT by PP2A to be part of apoptosis, while another database might consider this step as part of the Phosphoinositide-3-Kinase pathway. To provide different levels of granularity and detail in pathway data modelling, different pathway data representation formats (e.g., SBML and BIND XML) have been developed.

Section snippets

Semantic web use case: integrating disparate pathway datasets

Making sense out of the ever-increasing research data on mechanisms of signaling transduction pathways is a challenging undertaking. Exact computer models of signaling events, such as stochastic molecular simulations, or models using differential equations, are hampered by the sheer complexity of the models, as well as data that is noisy and incomplete. A solution to these problems is the use of simplified network representations, which have been shown to be sufficient for answering essential

Methods

As a demonstration, we apply our approach to detecting molecular binding events from the apoptosis pathway of Reactome represented in the BioPAX format (http://www.reactome.org/cgi-bin/biopaxexporter?DB=gk_current&ID=109581). In general, our approach can be applied to other pathways as well as other BioPAX-formatted pathway datasets (e.g., HumanCyc [21]).

For implementation, we use Protégé 3.1 with OWL plugin 2.1 (build 284) and RacerPro v1.8.1 as the facilities for handling and reasoning over

Results

With the axioms defined, we load the Reactome apoptosis dataset into Protégé. From the instance window in Protégé user interface, we can see that there are 66 biochemicalReaction instances in this dataset. After configuring the reasoning connection to the Racer server, we execute the task of computing the inferred types of all the individuals in the dataset, a function that can be invoked by clicking the “I>” button provided by Protégé-OWL plugin. Here the task loops over the sub-datasets. Each

Discussion and conclusion

We have presented a novel approach for transforming a biochemical pathway representation into a simpler, ’molecular binding representation’, using semantic web technologies. Our use case consisted of the BioPAX release of the Reactome biochemical apoptosis pathway (http://www.reactome.org/cgi-bin/biopaxexporter?DB=gk_current&ID=109581), which we successfully transformed into a molecular binding network (see Fig. 4). We believe that our approach has many advantages: first, the conversion of a

Acknowledgements

This work was supported in part by NIH grant K25 HG02378 and NSF grant DBI-0135442.

Dr. Kei-Hoi Cheung, PhD is an Assistant Professor at the Yale Center for Medical Informatics. His research interests include bioinformatics, database integration, and semantic web in life sciences.

References (34)

  • A. Cornish-Bowden et al.

    Systems biology may work when we learn to understand the parts in terms of the whole

    Biochem. Soc. Trans.

    (2005)
  • L.G. Yengi

    Systems biology in drug safety and metabolism: integration of microarray, real-time PCR and enzyme approaches

    Pharmacogenomics

    (2005)
  • P. Li et al.

    Prostate cancer genomics

    Curr. Urol. Rep.

    (2001)
  • V. Mootha et al.

    From the cover: identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics

    PNAS

    (2003)
  • D. Ficenec et al.

    Computational knowledge integration in biopharmaceutical research

    Brief. Bioinform.

    (2003)
  • F. Achard et al.

    XML, bioinformatics and data integration

    Bioinformatics

    (2001)
  • M. Hucka et al.

    The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models

    Bioinformatics

    (2003)
  • Cited by (0)

    Dr. Kei-Hoi Cheung, PhD is an Assistant Professor at the Yale Center for Medical Informatics. His research interests include bioinformatics, database integration, and semantic web in life sciences.

    Peishen Qi is a PhD candidate at the Department of Computer Science of Yale University. He is currently working in the field of ontology translation, semantic tuplespace coordination, and semantic integration of pathway data.

    Dr. David Tuck, MD is an Assistant Professor at the Department of Pathology. His research expertise includes hematology/medical oncology, computational biology, bioinformatics, and cancer drug development/clinical trials.

    Dr. Michael Krauthammer, MD, PhD is an Assistant Professor at the Department of Pathology. His research interests include exploration of text mining, computational biology and bioinformatics to analyze functional and clinical data linked to human diseases.

    View full text