Efficient Computation of Provenance for Query Result Exploration

Mani, Murali; Singaraj, Naveenkumar; Liu, Zhenyan

doi:10.1007/978-3-030-80960-7_8

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12839))

Included in the following conference series:

829 Accesses

Abstract

Users typically interact with a database by asking queries and examining the results. We refer to the user examining the query results and asking follow-up questions as query result exploration. Our work builds on two decades of provenance research useful for query result exploration. Three approaches for computing provenance have been described in the literature: lazy, eager, and hybrid. We investigate lazy and eager approaches that utilize constraints that we have identified in the context of query result exploration, as well as novel hybrid approaches. For the TPC-H benchmark, these constraints are applicable to 19 out of the 22 queries, and result in a better performance for all queries that have a join. Furthermore, the performance benefits from our approaches are significant, sometimes several orders of magnitude.

Partially supported by Office of Research, University of Michigan-Flint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Provenance in Databases: Principles and Applications

Efficient provenance tracking for datalog using top-k queries

Article 22 February 2018

Provenance and Privacy

Notes

1.
https://souffle-lang.github.io/.

References

TPC-H, a decision support benchmark (2018). http://www.tpc.org/tpch/
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995). http://webdam.inria.fr/Alice/
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)
Article Google Scholar
Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. VLDB J. 14(4), 373–396 (2005)
Article Google Scholar
Buneman, P., Khanna, S., Wang-Chiew, T.: Why and where: a characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44503-X_20
Chapter Google Scholar
Chapman, A., Jagadish, H.V.: Why not? In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, 29 June–2 July 2009, pp. 523–534 (2009). https://doi.org/10.1145/1559845.1559901
Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)
Article Google Scholar
Cui, Y., Widom, J.: Storing auxiliary data for efficient maintenance and lineage tracing of complex views. In: Proceedings of the Second Intl. Workshop on Design and Management of Data Warehouses, DMDW 2000, Stockholm, Sweden, 5–6 June 2000, p. 11 (2000). http://ceur-ws.org/Vol-28/paper11.pdf
Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25(2), 179–227 (2000)
Article Google Scholar
Eder, L.: Join elimination: an essential optimizer feature for advanced SQL usage. DZone (2017). https://dzone.com/articles/join-elimination-an-essential-optimizer-feature-fo
Glavic, B., Miller, R.J., Alonso, G.: Using SQL for efficient generation and querying of provenance information. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.-C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation. LNCS, vol. 8000, pp. 291–320. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41660-6_16
Chapter MATH Google Scholar
Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Beijing, China, 11–13 June 2007, pp. 31–40 (2007). https://doi.org/10.1145/1265530.1265535
Green, T.J., Tannen, V.: The semiring framework for database provenance. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, 14–19 May 2017, pp. 93–99 (2017). https://doi.org/10.1145/3034786.3056125
Huang, J., Chen, T., Doan, A., Naughton, J.F.: On the provenance of non-answers to queries over extracted data. PVLDB 1(1), 736–747 (2008). https://doi.org/10.14778/1453856.1453936. http://www.vldb.org/pvldb/1/1453936.pdf
Jia, Y.: Running the TPC-H benchmark on Hive (2009). https://issues.apache.org/jira/browse/HIVE-600
Lee, S., Ludäscher, B., Glavic, B.: PUG: a framework and practical implementation for why and why-not provenance. VLDB J. 28(1), 47–71 (2019)
Article Google Scholar
Niu, X., Kapoor, R., Glavic, B., Gawlick, D., Liu, Z.H., Krishnaswamy, V., Radhakrishnan, V.: Heuristic and cost-based optimization for diverse provenance tasks. CoRR abs/1804.07156 (2018). http://arxiv.org/abs/1804.07156
Roy, S., Orr, L., Suciu, D.: Explaining query answers with explanation-ready databases. PVLDB 9(4), 348–359 (2015). https://doi.org/10.14778/2856318.2856329. http://www.vldb.org/pvldb/vol9/p348-roy.pdf
Wu, E., Madden, S.: Scorpion: explaining away outliers in aggregate queries. PVLDB 6(8), 553–564 (2013). https://doi.org/10.14778/2536354.2536356. http://www.vldb.org/pvldb/vol6/p553-wu.pdf
Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R.T., Subrahmanian, V.S., Zicari, R.: Advanced Database Systems. Morgan Kaufmann, Burlington (1997)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Michigan Flint, Flint, MI, 48502, USA
Murali Mani, Naveenkumar Singaraj & Zhenyan Liu

Authors

Murali Mani
View author publications
You can also search for this author in PubMed Google Scholar
Naveenkumar Singaraj
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Murali Mani .

Editor information

Editors and Affiliations

Illinois Institute of Technology, Chicago, IL, USA
Boris Glavic
Fluminense Federal University, Niterói, Brazil
Vanessa Braganholo
Northern Illinois University, DeKalb, IL, USA
David Koop

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mani, M., Singaraj, N., Liu, Z. (2021). Efficient Computation of Provenance for Query Result Exploration. In: Glavic, B., Braganholo, V., Koop, D. (eds) Provenance and Annotation of Data and Processes. IPAW IPAW 2020 2021. Lecture Notes in Computer Science(), vol 12839. Springer, Cham. https://doi.org/10.1007/978-3-030-80960-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-80960-7_8
Published: 09 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80959-1
Online ISBN: 978-3-030-80960-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics