Abstract
We are witnessing the emergence of the “data rich” era in biology. The myriad data in biology ranging from sequence strings to complex phenotypic and disease-relevant data pose a huge challenge to modern biology. The standard paradigm in biology that deals with “hypothesis to experimentation (low throughput data) to models” is being gradually replaced by “data to hypothesis to models and experimentation to more data and models”. And unlike data in physical sciences, that in biological sciences is almost guaranteed to be highly heterogeneous and incomplete. In order to make significant advances in this data rich era, it is essential that there be robust data repositories that allow interoperable navigation, query and analysis across diverse data, a plug-and-play tools environment that will facilitate seamless interplay of tools and data and versatile user interfaces that will allow biologists to visualize and present the results of analysis in the most intuitive and user-friendly manner. This talk will address several of the challenges posed by enormous need for scientific data integration in biology with specific exemplars and strategies. The issues addressed will include:
– Architecture of Data and Knowledge Repositories
– Databases:Flat, Relational and Object-Oriented; what is most appropriate?
– The imminent need for Ontologies in biology
– The Middle Layer:How to design it?
– Applications and integration of applications into the middle layer
– Reduction and Analysis of Data: the largest challenge!
– How to integrate legacy knowledge with data?
– User Interfaces: web browser and beyond
The complex and diverse nature of biology mandates that there is no “one solution .ts all” model for the above issues. While there is a need to have similar solutions across multiple disciplines within biology, the dichotomy of having to deal with the context, which is everything in some cases, poses severe design challenges. For example, can a system that describes cellular signaling also describe developmental genetics? Can the ontologies that span di.erent areas (e.g. anatomy, gene and protein data, cellular biology) be compatible and connective? Can the detailed biological knowledge accrued painstakingly over decades be easily integrated with high throughput data? These are only few of the questions that arise in designing and building modern data and knowledge systems in biology.
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Subramaniam, S. (2005). Challenges in Biological Data Integration in the Post-genome Sequence Era. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_1
Download citation
DOI: https://doi.org/10.1007/11530084_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27967-9
Online ISBN: 978-3-540-31879-8
eBook Packages: Computer ScienceComputer Science (R0)