skip to main content
10.1145/1645953.1646040acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Provenance query evaluation: what's so special about it?

Published:02 November 2009Publication History

ABSTRACT

While provenance has been extensively studied in the literature, the efficient evaluation of provenance queries remains an open problem. Traditional query optimization techniques, like the use of general-purpose indexes, or the materialization of provenance data, fail on different fronts to address the problem. Therefore, the need to develop provenance-aware access methods becomes apparent. This paper starts by identifying some key requirements that are to a large extent specific to provenance queries and are necessary for their efficient evaluation. The first such property, called duality, requires that a single access method is used to evaluate both backward provenance queries (which input items of some analysis generate an output item) and forward provenance queries (which outputs of some analysis does an input item generate). The second property, called locality, guarantees that provenance query evaluation times should depend mainly on the size of the provenance query results and should be largely independent of the total size of provenance data. Motivated by the above, we identify proper data structures with the aforementioned properties, we implement them, and through a detailed set of experiments, we illustrate their effectiveness on the evaluation of provenance queries.

References

  1. J. Barbay, A. Golynski, J. I. Munro, and S. S. Rao. Adaptive searching in succinctly encoded binary relations and tree-structured documents. Theor. Comput. Sci., 387(3):284--297, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, pages 900--911, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Biton, S. C. Boulakia, and S. B. Davidson. Zoom*userviews: Querying relevant provenance in workflow systems. In VLDB, pages 1366--1369, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, pages 316--330, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Buneman and W.-C. Tan. Provenance in databases. In SIGMOD, pages 1171--1173, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. P. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD, pages 993--1006, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Chiticariu and W. C. Tan. Debugging schema mappings with routes. In VLDB, pages 79--90, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst., 25(2):179--227, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. B. Davidson. On provenance and user views in scientific workflows. In DBIR2008 (Keynote speech), 2008.Google ScholarGoogle Scholar
  10. S. B. Davidson and J. Freire. Provenance and scientific workflows: challenges and opportunities. In SIGMOD, pages 1345--1350, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Geerts, A. Kementsietsidis, and D. Milano. Mondrian: Annotating and querying databases through colors and blocks. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Golynski, J. I. Munro, and S. S. Rao. Rank/select operations on large alphabets: a tool for text indexing. In SODA, pages 368--373, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. T. Liu and M. J. Franklin. The design of griddb: A data-centric overlay for the scientific grid. In VLDB, pages 600--611, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Mavromatis. Indexing in the mondrian annotation management system. Technical Report EDI-INF-IM060399, School of Informatics, University of Edinburgh, 2006.Google ScholarGoogle Scholar
  16. A. Misra, M. Blount, A. Kementsietsidis, D. Sow, and M. Wang. Advances and challenges for scalable data provenance in stream processing systems. In IPAW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. R. Morrison. Patricia-practical algorithm to retrieve information coded in alphanumeric. J. ACM, 15(4), 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Srivastava and Y. Velegrakis. Intensional associations between data and metadata. In SIGMOD Conference, pages 401--412, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. C. Tan. Provenance in databases: Past, current, and future. IEEE Data Eng. Bull., 30(4):3--12, 2007.Google ScholarGoogle Scholar
  20. J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, pages 262--276, 2005.Google ScholarGoogle Scholar
  21. D. E. Willard. Log-logarithmic worst-case range queries are possible in space theta(n). Inf. Process. Lett., 17(2):81--84, 1983.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Provenance query evaluation: what's so special about it?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
        November 2009
        2162 pages
        ISBN:9781605585123
        DOI:10.1145/1645953

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 November 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader