Abstract
Data intensive applications in Life Sciences extensively use the hidden web as a platform for information sharing. Access to these heterogeneous hidden web resources is limited through the use of predefined web forms and interactive interfaces that users navigate manually, and assume responsibility for reconciling schema heterogeneity, extracting information and piping, transforming formats and so on in order to implement desired query sequences or scientific work flows. In this paper, we present a new data management system, called LifeDB, in which we offer support for currency without view materialization, and autonomous reconciliation of schema heterogeneity in one single platform through a declarative query language called BioFlow. In our approach, schema heterogeneity is resolved at run time by treating the hidden web resources as a virtual warehouses, and by supporting a set of primitives for data integration on-the-fly, extracting information and piping to other resources, and manipulating data in a way similar to traditional database systems to respond to application demands.
Research supported in part by National Science Foundation grants CNS 0521454 and IIS 0612203, and National Institutes of Health NIDA grant 1R03DA026021-01.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amin, M.S., Jamil, H.: FastWrap: An efficient wrapper for tabular data extraction from the web. In: IEEE International Conference on Information Reuse and Integration, Las Vegas, Nevada (August 2009)
Bauckmann, J.: Automatically Integrating Life Science Data Sources. In: VLDB PhD Workshop (2007)
Bhattacharjee, A., Jamil, H.: OntoMatch: A monotonically improving schema matching system for autonomous data integration. In: IEEE International Conference on Information Reuse and Integration, Las Vegas, Nevada (August 2009)
Chang, K., He, B., Zhang, Z.: Toward large scale integration: Building a MetaQuerier over databases on the web. In: CIDR Conference (2005)
Chen, L., Jamil, H.M.: On using remote user defined functions as wrappers for biological database interoperability. International Journal of Cooperative Information Systems 12(2), 161–195 (2003)
Chu, E., Baid, A., Chen, T., Doan, A., Naughton, J.F.: A relational approach to incrementally extracting and querying structure in unstructured data. In: VLDB 2007, Vienna, Austria, pp. 1045–1056 (2007)
Davidson, S.B., Overton, G.C., Tannen, V., Wong, L.: BioKleisli: A digital library for biomedical researchers. International Journal on Digital Libraries 1(1), 36–53 (1997)
Gusfield, D., Stoye, J.: Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593–602 (2006)
Hoon, S., Ratnapu, K.K., Chia, J.-M., Kumarasamy, B., Juguang, X., Clamp, M., Stabenau, A., Potter, S., Clarke, L., Stupka, E.: Biopipe: A flexible framework for protocol-based bioinformatics analysis. Genome Research 13(8), 1904–1915 (2003)
Hossain, S., Jamil, H.: A visual interface for on-the-fly biological database integration and workflow design using VizBuilder. In: 6th International Workshop on Data Integration in the Life Sciences, Manchester, UK (July 2009)
Jamil, H., El-Hajj-Diab, B.: BioFlow: A web-based declarative workflow language for Life Sciences. In: 2nd IEEE Workshop on Scientific Workflows, Honolulu, Hawaii, pp. 453–460. IEEE Computer Society Press, Los Alamitos (2008)
Jamil, H., Islam, A.: The power of declarative languages: A comparative exposition of scientific workflow design using BioFlow and Taverna. In: 3rd IEEE Workshop on Scientific Workflows, Los Angeles, CA, July 2009, IEEE Computer, Los Alamitos (2009)
Laender, A., Ribeiro-Neto, B., da Silva, A.S.: DEByE - date extraction by example. Data Knowl. Eng. 40(2), 121–154 (2002)
Minton, S.N., Nanjo, C., Knoblock, C.A., Michalowski, M., Michelson, M.: A heterogeneous field matching method for record linkage. In: ICDM, November 2005, vol. 27 (2005)
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB 2006, pp. 691–702 (2006)
Zhang, Y., Boncz, P.: XRPC: interoperable and efficient distributed XQuery. In: VLDB, pp. 99–110 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhattacharjee, A. et al. (2009). On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2009. Lecture Notes in Computer Science, vol 5690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03573-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-03573-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03572-2
Online ISBN: 978-3-642-03573-9
eBook Packages: Computer ScienceComputer Science (R0)