Skip to main content

ProSA Pipeline: Provenance Conquers the Chase

  • Conference paper
  • First Online:
New Trends in Database and Information Systems (ADBIS 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1652))

Included in the following conference series:

  • 953 Accesses

Abstract

One of the main problems in data minimization is the determination of the relevant data set. Combining the Chase—a universal tool for transforming databases—and data provenance, a (anonymized) minimal sub-database of an original data set can be calculated. To ensure reproducibility, the evaluations performed on the original data set must be feasible on the sub-database, too. For this, we extend the Chase &Backchase with additional why-provenance to handle lost attribute values, null tuples, and duplicates occurring during the query evaluation and its inversion. In this article, we present the ProSA pipeline, which describes the method of data minimization using the Chase &Backchase extended with additional provenance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ChaTEAU repository: https://git.informatik.uni-rostock.de/ta093/chateau-demo.

  2. 2.

    ProSA repository: https://git.informatik.uni-rostock.de/ta093/prosa-demo.

References

  1. Auge, T., Heuer, A.: ProSA—using the CHASE for provenance management. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds.) ADBIS 2019. LNCS, vol. 11695, pp. 357–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28730-6_22

    Chapter  Google Scholar 

  2. Auge, T., Heuer, A.: Tracing the history of the Baltic sea oxygen level. In: BTW, LNI, vol. P-311, pp. 337–348. Gesellschaft für Informatik, Bonn (2021)

    Google Scholar 

  3. Auge, T., Heuer, A.: Testing provenance systems. Technical report CS 01-22, Computer Science Division, University of Rostock (2022)

    Google Scholar 

  4. Auge, T., Scharlau, N., Görres, A., Zimmer, J., Heuer, A.: ChaTEAU: a universal toolkit for applying the Chase. https://arxiv.org/abs/2206.01643

  5. Benedikt, M., et al.: Benchmarking the chase. In: PODS, pp. 37–52. ACM (2017)

    Google Scholar 

  6. Benczúr, A., Kiss, A., Márkus, T.: On a general class of data dependencies in the relational model and its implication problems. Comput. Math. Appl. 21(1), 1–11 (1991)

    Article  MathSciNet  Google Scholar 

  7. Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)

    Article  Google Scholar 

  8. Deutsch, A., Hull, R.: Provenance-directed chase &backchase. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.-C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation. LNCS, vol. 8000, pp. 227–236. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41660-6_11

    Chapter  Google Scholar 

  9. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Schema mapping evolution through composition and inversion. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-16518-4_7

  10. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Reverse data exchange: coping with nulls. ACM Trans. Database Syst. 36(2), 11:1–11:42 (2011)

    Google Scholar 

  11. Herschel, M., Diestelkämper, R., Ben Lahmar, H.: A survey on provenance - what for? what form? what from? VLDB J. 26(6), 881–906 (2017)

    Article  Google Scholar 

  12. Han, J., Cai, Y., Cercone, N.: Data-driven discovery of quantitative rules in relational databases. IEEE Trans. Knowl. Data Eng. 5(1), 29–40 (1993)

    Article  Google Scholar 

  13. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

Download references

Acknowledgments

We thank all students involved in implementing ProSA: Leonie Förster, Melinda Heuser, Ivo Kavisanczki, Judith-Henrike Overath, Tobias Rudolph, Nic Scharlau, Tom Siegl, Dennis Spolwind, Anne-Sophie Waterstradt, Anja Wolpers, and Marian Zuska. Also, thanks to Bertram Ludäscher for comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanja Auge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Auge, T., Hanzig, M., Heuer, A. (2022). ProSA Pipeline: Provenance Conquers the Chase. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15743-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15742-4

  • Online ISBN: 978-3-031-15743-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics