Abstract
For execution of complex biological queries, data integration systems often use several intermediate data sources because the domain coverage of individual sources is limited. Quality of intermediate sources differs greatly based on the method used for curation, frequency of updates and breadth of domain coverage, which affects the quality of the results. Therefore, integration systems should provide data provenance; i.e. information about the path used to obtain every record in the result. Furthermore, since query capabilities of web-accessible sources are limited, integration systems need to support refinement queries of finer granularity issued over the integrated data. However, unlike the individual sources, integration systems have to handle the absence of data and conflicts in the integrated data caused by inconsistencies among the sources. This paper describes the solution proposed by BACIIS, the Biological and Chemical Information Integration System, for providing data provenance and for supporting refinement queries over integrated data. Semantic correspondence between records from different sources is defined based on the links connecting these data sources including cross-references. Two characteristics of semantic correspondence, namely degree and cardinality, are identified based on the closeness of the links that exist between data records and based on the mappings between domains of data records respectively. An algorithm based on semantic correspondence is presented to handle absence of data and conflicts in the integrated data.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baxevanis, A.D.: The Molecular Biology Database Collection: 2003 update. Nucleic Acids Res 31(1), 1–12 (2003)
Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server-recent developments. Bioinformatics 18(2), 368–373 (2002)
Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources. IBM Systems Journal 40(2), 532–552 (2001)
Hernandez, T., Kambhampati, S.: Integration of Biological Sources: Current Systems and Challenges Ahead. To appear in SIGMOD Record 33(3) (September 2004)
Ben Miled, Z., Bukhres, O., Wang, Y., Li, N., Baumgartner, M., Sipes, B.: Biological and Chemical Information Integration System. In: Network Tools and Applications in Biology, Genoa, Italy (May 2001)
Ben Miled, Z., Webster, Y., Li, N., Liu, Y.: An Ontology for the Semantic Integration of Life Science Web Databases. International Journal of Cooperative Information Systems 12(2) (2003)
Ben-Miled, Z., Li, N., Kellett, G., Sipes, B., Bukhres, O.: Complex Life Science Multidatabase Queries. Proceedings of the IEEE 90(11) (2002)
Sheth, A., Kashyap, V.: So Far (Schematically) yet So Close (Semantically). In: Proceedings of the MT DS-5 Conference on Semantics of Interoperable Database Systems, Lorne, Australia. Elsvier Publishers, Amsterdam (1992)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. 9th ACM-SIAM Symposium on Discrete Algorithms (1998); Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076 (May 1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahoui, M., Kulkarni, H., Li, N., Ben-Miled, Z., Börner, K. (2005). Semantic Correspondence in Federated Life Science Data Integration Systems. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_12
Download citation
DOI: https://doi.org/10.1007/11530084_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27967-9
Online ISBN: 978-3-540-31879-8
eBook Packages: Computer ScienceComputer Science (R0)