Abstract
The integration of life science web databases is an important research subject that has an impact on the rate at which new biological discoveries are made. However, addressing the interoperability of life science databases presents serious challenges, particularly when the databases are accessed through their web interfaces. Some of these challenges include the fact that life science databases are numerous and their access interface may change often. This paper proposes techniques that take into account these challenges and shows how these techniques were implemented in the context of BACIIS, a federation of life science web databases.
This work is supported in part by NSF CAREER DBI-0133946 and NSF DBI-0110854
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent Access to Multiple Bioinformatics Information Sources. IBM Systems Journal 40(2), 532–552 (2001)
Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server-recent developments. Bioinformatics 18, 368–373 (2002)
Davidson, S.B., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, C., Stoeckert, C.: K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM Systems Journal 40(2) (2001)
Haas, L.M., Rice, J.E., Schwarz, P.M., Swops, W.C., Kodali, P., Kotlar, E.: DiscoveryLink: A system for integrated access to life sciences data sources. IBM System Journal 40(2) (2001)
McEntyre, J.: Linking up with Entrez. Trends Genet. 14(1), 39–40 (1998)
Ben Miled, Z., Li, N., Kellett, G., Sipes, B., Bukhres, O.: Complex Life Science Multidatabase Queries. Proceedings of the IEEE 90(11) (2002)
Peim, M., Franconi, E., Paton, N.W., Goble, C.A.: Query Processing with Description Logic Ontologies Over Object-Wrapped Databases. In: Proc. 14th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 27–36. IEEE Computer Society, Los Alamitos (2002)
Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server - new features. Bioinformatics 18(8), 1149–1150 (2002)
Toner, B.: Rise of the Middle Class: Integration Vendors Differentiate Range of ’N-Tier’ Offerings. Bioinform Online, http://www.bioinform.com 6(16) (2002)
Wong, L.: Kleisli, a Functional Query System. Journal of Functional Programming 10(1), 19–56 (2000)
Paton, N.W., Stevens, R., Baker, P.G., Goble, C.A., Bechhofer, S., Brass, A.: Query Processing in the TAMBIS Bioinformatics Source Integration System. In: Proc. 11th Int. Conf. on Scientific and Statistical Databases (SSDBM), pp. 138–147. IEEE Press, Los Alamitos (1999)
Davidson, S.B., Overton, C., Tanen, V., Wong, L.: BioKleisli: A Digital Library for biomedical Researchers. Journal of Digital Libraries 1(1), 36–53 (1997)
Ben Miled, Z., Wang, Y., Li, N., Bukhres, O., Martin, J., Nayar, A., Oppelt, R.: BAO, A Biological and Chemical Ontology For Information Integration. Online Journal Bioinformatics 1, 60–73 (2002)
Baxevanis, A.D.: The Molecular Biology Database Collection: 2003 update. Nucleic Acids Res. 31(1), 1–12 (2003)
Ben Miled, Z., Li, N., Kellett, G., Sipes, B., Bukhres, O.: Complex Life Science Multidatabase Queries. Proceedings of the IEEE 90(11) (2002)
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A., Wheeler, D.L.: GenBank. Nucleic Acids Research 30(1), 17–20 (2002)
O’Donovan, C., Martin, M.J., Gattiker, A., Gasteiger, E., Bairoch, A., Pweiler, R.: High-quality protein knowledge resource: SWISS-PROT and TrEMBL Brief. Bioinform 3, 275–284 (2002)
Wu, C.H., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Ledley, R.S., Lewis, K.C., Mewes, H., Orcutt, B.C., Suzek, B.E., Tsugita, A., Vinayaka, C.R., Yeh, L.L., Zhang, J., Barker, W.C.: The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Research 30, 35–37 (2002)
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Research 30, 235–238 (2002)
Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Research 28, 304–305 (2000)
Westbrook, J., Feng, Z., Chen, L., Yang, H., Berman, H.: The Protein Data Bank and structural genomics. Nucleic Acids Research 31, 489–491 (2003)
Hamosh, A., Scott, A., Amberger, J., Bocchini, C., Valle, D., McKusick, V.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 30, 52–55 (2002)
Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breuning, M., Vassalos, V.: Template-Based Wrappers in the TSIMMIS System. In: Proceedings of 23rd ACM SIGMOD International Conference on Management of Data, pp. 532–535 (1997)
Crescenzi, V., Mecca, G., Merialdo, P.: ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites. The VLDB Journal, 109–118 (2001)
Knobloc, C., Lerman, K., Minton, S., Muslea, I.: Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23(4), 33–41 (2000)
Soderland, S.: Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning 34(1-3), 233–272 (1999)
Califf, M.E., Mooney, R.J.: Relational Learning of Pattern-Match Rules for Information Extraction. In: Proceedings of AAAI Spring Symposium, vol. 6-11 (1996)
Cohen, W.: Text categorization and relational learning. In: Proceedings of the 12th International Conference on Machine Learning, pp. 124–132 (1995)
Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper Induction for Information Extraction. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 729–737 (1997)
Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting Semistructured Information from the Web. In: Proceedings of the 1st Workshop on Management for Semistructured Data, pp. 18–25 (1997)
Huck, G., Frankhausewr, P., Aberer, K., Neuhold, E.: Jedi: Extracting and Synthesizing Information from the Web. In: Proceedings of Conference on Cooperative Information Systems, pp. 32–43 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ben Miled, Z., Li, N., Liu, Y., He, Y., Lynch, E., Bukhres, O. (2004). On the Integration of a Large Number of Life Science Web Databases. In: Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2004. Lecture Notes in Computer Science(), vol 2994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24745-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-24745-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21300-0
Online ISBN: 978-3-540-24745-6
eBook Packages: Springer Book Archive