Abstract
The problem of identifying the data contributed to a query answer is referred to as lineage tracing. While this has been studied extensively in data warehouse systems, it is identified as a research topic in the mediator-based approach to information integration [10]. A main problem in this context is that a mediator does not store data, and hence for query processing and tracing, it has to communicate with the data sources. While this communication could be expensive, the real issue is that in some situations, after a query is being processed, lineage tracing may be more difficult, e.g., when the schema of a source has changed, or may even be impossible, e.g., when a source becomes unavailable. In this paper, we study the lineage tracing problem in mediator-based systems and propose a solution by collecting “enough” data and metadata during query processing so that tracing is possible in such situations.. We have developed a system prototype, called ELIT (for Explorationand LIneage Tracing). To allow more flexibility, ELIT supports lineage tracing in two modes: batch and interactive. Due to the distributed nature of the context, efficiency is of primary concern for practical reasons. We therefore investigate ways to reduce the overhead of lineage tracing in the proposed framework while processing queries. Using some basic query optimization techniques in ELIT, our preliminary experimental results show considerable increase in efficiency. This indicates the proposed ideas in the framework of ELIT could lend themselves to powerful lineage tracing and data analysis tools, by incorporating more sophisticated query optimization techniques.
This work was supported in part by grants from Natural Sciences and Engineering Research Council (NSERC) of Canada, and by Concordia University, ENCS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arens, Y., Knoblock, C.A., Shen, W.: Query reformulation for dynamic information integration. Journal of Intelligent Information Systems (1996)
Arora, T., Ramakrishnan, R., Roth, W.G., Seshadri, P., Srivastava, D.: The CORAL deductive database system. In: Proc. 3rd Int’l Conference on Deductive and Object-Oriented Databases (1993)
Buneman, P., Davidson, S., Hillebrand, G., Suciu, D.: A query language and optimization techniques for unstructured data. In: Proc. of ACM SIGMOD, Montreal, Canada, June 1996, pp. 505–516 (1996)
Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. In: Proc. of VLDB (2001)
Duschka, O.M.: Query planning and optimization in information integration (December 1997)
Doan, P.D., Levy, A.Y.: Learning source descriptions for data integration. In: Proc. Intl. Workshop on The Web and Databases, WebDB (2000)
Draper, D., Halevy, A.Y., Weld, D.S.: The Nimble XML data integration system. In: Proc. Int’l Conference on Data Engineering (ICDE), pp. 155–160 (2001)
Fan, H., Poulovassilis, A.: Tracing data lineage using schema transformation pathways. In: Proc. Of Workshop on Knowledge Transformation for the Semantic Web (2002)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database systems: The complete book. Prentice-Hall, Englewood Cliffs (2001)
Halevy, A., Li, C.: Information integration research: Summary of NSF IDM workshop breakout session (December 2003)
Haas, L.M., Miller, R.J., Niswonger, B., Roth, M.T., Schwarz, P.M., Wimmers, E.L.: Transforming heterogeneous data with database middleware: Beyond integration. IEEE Data Eng., Bulletin (1999)
Kementsietsidis, M.A., Renée, J.: Mapping data in peer-to-peer systems: Semantics and Algorithmic Issues. In: SIGMOD Conference, pp. 325–336 (2003)
Katchaounov, T.R.: Interface capabilities for query processing in peer mediator system, September 2003, Uppsala university (2003)
Widom, J., Quass, D.: On-line warehouse view maintenance. In: Proc. Int’l Conference on Management of Data, ACM SIGMOD (1997)
Popa, L., Hernandez, M.A., Velegrakis, Y., Miller, R.J., Naumann, F., Ho, H.: Mapping XML and relational schemas with CLIO, Demo. ICDE (2002)
Papakonstantinou, Y., Vassalos, V.: Query rewriting for semi-structured data. In: Proc. Int’l. Conf. on Management of Data, ACM SIGMOD, pp. 455–466 (1999)
Taghizadeh-Azari, A.: Supporting Lineage Tracing in Mediator-based Information Integration Systems. Master thesis, Computer Science Software Engineering, Concordia University, Montreal, Canada (February 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shiri, N., Taghizadeh-Azari, A. (2005). Lineage Tracing in Mediator-Based Information Integration Systems. In: Ramos, F.F., Larios Rosillo, V., Unger, H. (eds) Advanced Distributed Systems. ISSADS 2005. Lecture Notes in Computer Science, vol 3563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11533962_24
Download citation
DOI: https://doi.org/10.1007/11533962_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28063-7
Online ISBN: 978-3-540-31674-9
eBook Packages: Computer ScienceComputer Science (R0)