Challenges in Biological Data Integration in the Post-genome Sequence Era

Subramaniam, Shankar

doi:10.1007/11530084_1

Shankar Subramaniam²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3615))

Included in the following conference series:

International Workshop on Data Integration in the Life Sciences

877 Accesses

Abstract

We are witnessing the emergence of the “data rich” era in biology. The myriad data in biology ranging from sequence strings to complex phenotypic and disease-relevant data pose a huge challenge to modern biology. The standard paradigm in biology that deals with “hypothesis to experimentation (low throughput data) to models” is being gradually replaced by “data to hypothesis to models and experimentation to more data and models”. And unlike data in physical sciences, that in biological sciences is almost guaranteed to be highly heterogeneous and incomplete. In order to make significant advances in this data rich era, it is essential that there be robust data repositories that allow interoperable navigation, query and analysis across diverse data, a plug-and-play tools environment that will facilitate seamless interplay of tools and data and versatile user interfaces that will allow biologists to visualize and present the results of analysis in the most intuitive and user-friendly manner. This talk will address several of the challenges posed by enormous need for scientific data integration in biology with specific exemplars and strategies. The issues addressed will include:

– Architecture of Data and Knowledge Repositories

– Databases:Flat, Relational and Object-Oriented; what is most appropriate?

– The imminent need for Ontologies in biology

– The Middle Layer:How to design it?

– Applications and integration of applications into the middle layer

– Reduction and Analysis of Data: the largest challenge!

– How to integrate legacy knowledge with data?

– User Interfaces: web browser and beyond

The complex and diverse nature of biology mandates that there is no “one solution .ts all” model for the above issues. While there is a need to have similar solutions across multiple disciplines within biology, the dichotomy of having to deal with the context, which is everything in some cases, poses severe design challenges. For example, can a system that describes cellular signaling also describe developmental genetics? Can the ontologies that span di.erent areas (e.g. anatomy, gene and protein data, cellular biology) be compatible and connective? Can the detailed biological knowledge accrued painstakingly over decades be easily integrated with high throughput data? These are only few of the questions that arise in designing and building modern data and knowledge systems in biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Data integration in biological research: an overview

Article Open access 02 September 2015

Semantic Integration and Enrichment of Heterogeneous Biological Databases

SEEK: a systems biology data and model management platform

Article Open access 11 July 2015

Author information

Authors and Affiliations

University of California, San Diego
Shankar Subramaniam

Authors

Shankar Subramaniam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Davis,
Bertram Ludäscher
University of Maryland, College Park, 20742, MD, USA
Louiqa Raschid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Subramaniam, S. (2005). Challenges in Biological Data Integration in the Post-genome Sequence Era. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_1

Download citation

DOI: https://doi.org/10.1007/11530084_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27967-9
Online ISBN: 978-3-540-31879-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics