Skip to main content

Lineage Tracing in Mediator-Based Information Integration Systems

  • Conference paper
Advanced Distributed Systems (ISSADS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 3563))

Included in the following conference series:

Abstract

The problem of identifying the data contributed to a query answer is referred to as lineage tracing. While this has been studied extensively in data warehouse systems, it is identified as a research topic in the mediator-based approach to information integration [10]. A main problem in this context is that a mediator does not store data, and hence for query processing and tracing, it has to communicate with the data sources. While this communication could be expensive, the real issue is that in some situations, after a query is being processed, lineage tracing may be more difficult, e.g., when the schema of a source has changed, or may even be impossible, e.g., when a source becomes unavailable. In this paper, we study the lineage tracing problem in mediator-based systems and propose a solution by collecting “enough” data and metadata during query processing so that tracing is possible in such situations.. We have developed a system prototype, called ELIT (for Explorationand LIneage Tracing). To allow more flexibility, ELIT supports lineage tracing in two modes: batch and interactive. Due to the distributed nature of the context, efficiency is of primary concern for practical reasons. We therefore investigate ways to reduce the overhead of lineage tracing in the proposed framework while processing queries. Using some basic query optimization techniques in ELIT, our preliminary experimental results show considerable increase in efficiency. This indicates the proposed ideas in the framework of ELIT could lend themselves to powerful lineage tracing and data analysis tools, by incorporating more sophisticated query optimization techniques.

This work was supported in part by grants from Natural Sciences and Engineering Research Council (NSERC) of Canada, and by Concordia University, ENCS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arens, Y., Knoblock, C.A., Shen, W.: Query reformulation for dynamic information integration. Journal of Intelligent Information Systems (1996)

    Google Scholar 

  2. Arora, T., Ramakrishnan, R., Roth, W.G., Seshadri, P., Srivastava, D.: The CORAL deductive database system. In: Proc. 3rd Int’l Conference on Deductive and Object-Oriented Databases (1993)

    Google Scholar 

  3. Buneman, P., Davidson, S., Hillebrand, G., Suciu, D.: A query language and optimization techniques for unstructured data. In: Proc. of ACM SIGMOD, Montreal, Canada, June 1996, pp. 505–516 (1996)

    Google Scholar 

  4. Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. In: Proc. of VLDB (2001)

    Google Scholar 

  5. Duschka, O.M.: Query planning and optimization in information integration (December 1997)

    Google Scholar 

  6. Doan, P.D., Levy, A.Y.: Learning source descriptions for data integration. In: Proc. Intl. Workshop on The Web and Databases, WebDB (2000)

    Google Scholar 

  7. Draper, D., Halevy, A.Y., Weld, D.S.: The Nimble XML data integration system. In: Proc. Int’l Conference on Data Engineering (ICDE), pp. 155–160 (2001)

    Google Scholar 

  8. Fan, H., Poulovassilis, A.: Tracing data lineage using schema transformation pathways. In: Proc. Of Workshop on Knowledge Transformation for the Semantic Web (2002)

    Google Scholar 

  9. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database systems: The complete book. Prentice-Hall, Englewood Cliffs (2001)

    Google Scholar 

  10. Halevy, A., Li, C.: Information integration research: Summary of NSF IDM workshop breakout session (December 2003)

    Google Scholar 

  11. Haas, L.M., Miller, R.J., Niswonger, B., Roth, M.T., Schwarz, P.M., Wimmers, E.L.: Transforming heterogeneous data with database middleware: Beyond integration. IEEE Data Eng., Bulletin (1999)

    Google Scholar 

  12. Kementsietsidis, M.A., Renée, J.: Mapping data in peer-to-peer systems: Semantics and Algorithmic Issues. In: SIGMOD Conference, pp. 325–336 (2003)

    Google Scholar 

  13. Katchaounov, T.R.: Interface capabilities for query processing in peer mediator system, September 2003, Uppsala university (2003)

    Google Scholar 

  14. Widom, J., Quass, D.: On-line warehouse view maintenance. In: Proc. Int’l Conference on Management of Data, ACM SIGMOD (1997)

    Google Scholar 

  15. Popa, L., Hernandez, M.A., Velegrakis, Y., Miller, R.J., Naumann, F., Ho, H.: Mapping XML and relational schemas with CLIO, Demo. ICDE (2002)

    Google Scholar 

  16. Papakonstantinou, Y., Vassalos, V.: Query rewriting for semi-structured data. In: Proc. Int’l. Conf. on Management of Data, ACM SIGMOD, pp. 455–466 (1999)

    Google Scholar 

  17. Taghizadeh-Azari, A.: Supporting Lineage Tracing in Mediator-based Information Integration Systems. Master thesis, Computer Science Software Engineering, Concordia University, Montreal, Canada (February 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shiri, N., Taghizadeh-Azari, A. (2005). Lineage Tracing in Mediator-Based Information Integration Systems. In: Ramos, F.F., Larios Rosillo, V., Unger, H. (eds) Advanced Distributed Systems. ISSADS 2005. Lecture Notes in Computer Science, vol 3563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11533962_24

Download citation

  • DOI: https://doi.org/10.1007/11533962_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28063-7

  • Online ISBN: 978-3-540-31674-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics