Abstract
We present a new approach to integrate annotation data from public sources for the expression analysis of genes and proteins. Expression data is materialized in a data warehouse supporting high performance for data-intensive analysis tasks. On the other hand, annotation data is integrated virtually according to analysis needs. Our virtual integration utilizes the commercial product SRS (Sequence Retrieval System) of LION bioscience. To couple the data warehouse and SRS, we implemented a query mediator exploiting correspondences between molecular-biological objects explicitly captured from public data sources. This hybrid integration approach has been implemented for a large gene expression warehouse and supports functional analysis using annotation data from GeneOntology, Locuslink and Ensembl. The paper motivates the chosen approach, details the integration concept and implementation, and provides results of preliminary performance tests.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000), http://www.geneontology.org
Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Research 28, 304–305 (2000), http://www.expasy.org/enzyme
Birney, E., et al.: An Overview of Ensembl. Genome Research 14, 925–928 (2004)
Chen, J., Chung, S.Y., Wong, L.: The Kleisli Query System as a Backbone for Bioinformatics Data Integration and Analysis. In: [LC 2003]: 147-187
Cheng, J. et al.: NetAffx gene ontology mining tool: a visual approach for microarray data analysis. Bioinformatics 20(9), 1462-1463, 2004.
Do, H.-H., Rahm, E.: Flexible Integration of Molecular-biological Annotation Data: The GenMapper Approach. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 811–822. Springer, Heidelberg (2004)
Etzold, T., Harris, H., Beaulah, S.: SRS: An Integration Platform for Databanks and Analysis Tools in Bioinformatics. In: [LC 2003], pp. 109–145 (2003)
Galperin, M.Y.: The Molecular Biology Database Collection - 2004 update. Nucleic Acids Research 32 (Database issue) (2004)
Haas, L., et al.: DiscoveryLink – A System for Integrated Access to Life Sciences Data Sources. IBM System Journal 40(2) (2001)
Hernandez, T., Kambhampati, S.: Integration of Biological Sources: Current Systems and Challenges Ahead. SIGMOD Record. 33(3) (2004)
Kirsten, T., Do, H.-H., Rahm, E.: A Multidimensional Data Warehouse for Gene Expression Analysis. In: Proc. German Conference on Bioinformatics, Munich (2003)
Kirsten, T., Do, H.-H., Rahm, E.: A Data Warehouse for Multidimensional Gene Expression Analysis. Technical Report, IZBI, University of Leipzig (2004)
Lacroix, Z., et al.: Links and Paths through Life Science Data Sources. In: Ra 2004, pp. 203–211 (2004)
Lacroix, Z., Critchlow, T. (Hrsg.): Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)
Leser, U., Naumann, F.: (Almost) Hands-Off Information Integration for the Life Sciences. In: Proc. 2nd Conference on Innovative Data Systems Research, CIDR 2005 (2005)
Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI Gene-centered Resources. Nucleic Acids Research 29(1) (2001), http://www.ncbi.nlm.nih.gov/projects/LocusLink/
Potter, S.C., et al.: The Ensembl Analysis Pipeline. Genome Research 14, 934–941 (2004)
Rahm, E. (ed.): DILS 2004. LNCS (LNBI), vol. 2994. Springer, Heidelberg (2004)
Rother, K., et al.: COLUMBA: Multidimensional Data Integration of Protein Annotations. In: Ra 2004, pp. 156–171 (2004)
Stein, L.: Integrating Biological Databases. Nature Review Genetics 4(5), 337–345 (2003)
Wheeler, D.L., et al.: Database Resources of the National Center for Biotechnology. Nucleic Acids Research 31, 28–33 (2003), http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene
Wong, L.: Kleisli, a Functional Query System. Journal of Functional Programming 1(1), 102–111 (1998)
Zdobnov, E.M., et al.: The EBI SRS server – recent developments. Bioinformatics 18, 368–373 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kirsten, T., Do, HH., Körner, C., Rahm, E. (2005). Hybrid Integration of Molecular-Biological Annotation Data. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_17
Download citation
DOI: https://doi.org/10.1007/11530084_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27967-9
Online ISBN: 978-3-540-31879-8
eBook Packages: Computer ScienceComputer Science (R0)