research-article

Provenance query evaluation: what's so special about it?

Authors:
Anastasios Kementsietsidis

IBM T.J. Watson Research Center, Hawthorne, NY, USA

IBM T.J. Watson Research Center, Hawthorne, NY, USA
View Profile

,
Min Wang

IBM T.J. Watson Research Center, Hawthorne, NY, USA

IBM T.J. Watson Research Center, Hawthorne, NY, USA
View Profile

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementNovember 2009Pages 681–690https://doi.org/10.1145/1645953.1646040

Published:02 November 2009Publication History

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 681–690

ABSTRACT

While provenance has been extensively studied in the literature, the efficient evaluation of provenance queries remains an open problem. Traditional query optimization techniques, like the use of general-purpose indexes, or the materialization of provenance data, fail on different fronts to address the problem. Therefore, the need to develop provenance-aware access methods becomes apparent. This paper starts by identifying some key requirements that are to a large extent specific to provenance queries and are necessary for their efficient evaluation. The first such property, called duality, requires that a single access method is used to evaluate both backward provenance queries (which input items of some analysis generate an output item) and forward provenance queries (which outputs of some analysis does an input item generate). The second property, called locality, guarantees that provenance query evaluation times should depend mainly on the size of the provenance query results and should be largely independent of the total size of provenance data. Motivated by the above, we identify proper data structures with the aforementioned properties, we implement them, and through a detailed set of experiments, we illustrate their effectiveness on the evaluation of provenance queries.

References

J. Barbay, A. Golynski, J. I. Munro, and S. S. Rao. Adaptive searching in succinctly encoded binary relations and tree-structured documents. Theor. Comput. Sci., 387(3):284--297, 2007. Google ScholarDigital Library
D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, pages 900--911, 2004. Google ScholarDigital Library
O. Biton, S. C. Boulakia, and S. B. Davidson. Zoom*userviews: Querying relevant provenance in workflow systems. In VLDB, pages 1366--1369, 2007. Google ScholarDigital Library
P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, pages 316--330, 2001 Google ScholarDigital Library
P. Buneman and W.-C. Tan. Provenance in databases. In SIGMOD, pages 1171--1173, 2007. Google ScholarDigital Library
A. P. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD, pages 993--1006, 2008. Google ScholarDigital Library
L. Chiticariu and W. C. Tan. Debugging schema mappings with routes. In VLDB, pages 79--90, 2006. Google ScholarDigital Library
Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst., 25(2):179--227, 2000. Google ScholarDigital Library
S. B. Davidson. On provenance and user views in scientific workflows. In DBIR2008 (Keynote speech), 2008.Google Scholar
S. B. Davidson and J. Freire. Provenance and scientific workflows: challenges and opportunities. In SIGMOD, pages 1345--1350, 2008. Google ScholarDigital Library
F. Geerts, A. Kementsietsidis, and D. Milano. Mondrian: Annotating and querying databases through colors and blocks. In ICDE, 2006. Google ScholarDigital Library
A. Golynski, J. I. Munro, and S. S. Rao. Rank/select operations on large alphabets: a tool for text indexing. In SODA, pages 368--373, 2006. Google ScholarDigital Library
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarDigital Library
D. T. Liu and M. J. Franklin. The design of griddb: A data-centric overlay for the scientific grid. In VLDB, pages 600--611, 2004. Google ScholarDigital Library
M. Mavromatis. Indexing in the mondrian annotation management system. Technical Report EDI-INF-IM060399, School of Informatics, University of Edinburgh, 2006.Google Scholar
A. Misra, M. Blount, A. Kementsietsidis, D. Sow, and M. Wang. Advances and challenges for scalable data provenance in stream processing systems. In IPAW, 2008. Google ScholarDigital Library
D. R. Morrison. Patricia-practical algorithm to retrieve information coded in alphanumeric. J. ACM, 15(4), 1968. Google ScholarDigital Library
D. Srivastava and Y. Velegrakis. Intensional associations between data and metadata. In SIGMOD Conference, pages 401--412, 2007. Google ScholarDigital Library
W. C. Tan. Provenance in databases: Past, current, and future. IEEE Data Eng. Bull., 30(4):3--12, 2007.Google Scholar
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, pages 262--276, 2005.Google Scholar
D. E. Willard. Log-logarithmic worst-case range queries are possible in space theta(n). Inf. Process. Lett., 17(2):81--84, 1983.Google ScholarCross Ref

Index Terms

Provenance query evaluation: what's so special about it?
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

On Provenance Minimization

Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g., view maintenance, trust assessment, or ...
Read More
The perm provenance management system in action
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

In this demonstration we present the Perm provenance management system (PMS). Perm is capable of computing, storing and querying provenance information for the relational data model. Provenance is computed by using query rewriting techniques to annotate ...
Read More
On the expressiveness of implicit provenance in query and update languages

Information describing the origin of data, generally referred to as provenance, is important in scientific and curated databases where it is the basis for the trust one puts in their contents. Since such databases are constructed using operations of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation
provenance
query
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 460
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Provenance query evaluation: what's so special about it?

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Provenance Minimization

The perm provenance management system in action

On the expressiveness of implicit provenance in query and update languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Provenance query evaluation: what's so special about it?

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Provenance Minimization

The perm provenance management system in action

On the expressiveness of implicit provenance in query and update languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media