Abstract
One of the main problems in data minimization is the determination of the relevant data set. Combining the Chase—a universal tool for transforming databases—and data provenance, a (anonymized) minimal sub-database of an original data set can be calculated. To ensure reproducibility, the evaluations performed on the original data set must be feasible on the sub-database, too. For this, we extend the Chase &Backchase with additional why-provenance to handle lost attribute values, null tuples, and duplicates occurring during the query evaluation and its inversion. In this article, we present the ProSA pipeline, which describes the method of data minimization using the Chase &Backchase extended with additional provenance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ChaTEAU repository: https://git.informatik.uni-rostock.de/ta093/chateau-demo.
- 2.
ProSA repository: https://git.informatik.uni-rostock.de/ta093/prosa-demo.
References
Auge, T., Heuer, A.: ProSA—using the CHASE for provenance management. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds.) ADBIS 2019. LNCS, vol. 11695, pp. 357–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28730-6_22
Auge, T., Heuer, A.: Tracing the history of the Baltic sea oxygen level. In: BTW, LNI, vol. P-311, pp. 337–348. Gesellschaft für Informatik, Bonn (2021)
Auge, T., Heuer, A.: Testing provenance systems. Technical report CS 01-22, Computer Science Division, University of Rostock (2022)
Auge, T., Scharlau, N., Görres, A., Zimmer, J., Heuer, A.: ChaTEAU: a universal toolkit for applying the Chase. https://arxiv.org/abs/2206.01643
Benedikt, M., et al.: Benchmarking the chase. In: PODS, pp. 37–52. ACM (2017)
Benczúr, A., Kiss, A., Márkus, T.: On a general class of data dependencies in the relational model and its implication problems. Comput. Math. Appl. 21(1), 1–11 (1991)
Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)
Deutsch, A., Hull, R.: Provenance-directed chase &backchase. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.-C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation. LNCS, vol. 8000, pp. 227–236. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41660-6_11
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Schema mapping evolution through composition and inversion. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-16518-4_7
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Reverse data exchange: coping with nulls. ACM Trans. Database Syst. 36(2), 11:1–11:42 (2011)
Herschel, M., Diestelkämper, R., Ben Lahmar, H.: A survey on provenance - what for? what form? what from? VLDB J. 26(6), 881–906 (2017)
Han, J., Cai, Y., Cercone, N.: Data-driven discovery of quantitative rules in relational databases. IEEE Trans. Knowl. Data Eng. 5(1), 29–40 (1993)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Acknowledgments
We thank all students involved in implementing ProSA: Leonie Förster, Melinda Heuser, Ivo Kavisanczki, Judith-Henrike Overath, Tobias Rudolph, Nic Scharlau, Tom Siegl, Dennis Spolwind, Anne-Sophie Waterstradt, Anja Wolpers, and Marian Zuska. Also, thanks to Bertram Ludäscher for comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Auge, T., Hanzig, M., Heuer, A. (2022). ProSA Pipeline: Provenance Conquers the Chase. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-15743-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15742-4
Online ISBN: 978-3-031-15743-1
eBook Packages: Computer ScienceComputer Science (R0)