skip to main content
10.1145/1938551.1938554acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

On provenance and privacy

Published: 21 March 2011 Publication History

Abstract

Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and reproducibility of results. On the other hand, a scientific workflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows -- data, module, and structural privacy - and frame several natural questions: (i) Can we formally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work.

References

[1]
C. C. Aggarwal and P. S. Yu, editors. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2008.
[2]
G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving anonymity via clustering. In PODS, pages 153--162, New York, NY, USA, 2006. ACM.
[3]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In ICDE, pages 5--16, 2002.
[4]
L. Backstrom, C. Dwork, and J. M. Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW, pages 181--190, 2007.
[5]
C. Beeri, A. Eyal, T. Milo, and A. Pilberg. Monitoring business processes with queries. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 603--614. VLDB Endowment, 2007.
[6]
E. Bertino and E. Ferrari. Secure and selective dissemination of XML documents. ACM Trans. Inf. Syst. Secur., 5(3):290--331, 2002.
[7]
O. Biton, S. C. Boulakia, S. B. Davidson, and C. S. Hara. Querying and managing provenance through user views in scientific workflows. In ICDE, pages 1072--1081, 2008.
[8]
O. Biton, S. B. Davidson, S. Khanna, and S. Roy. Optimizing user views for workflows. In ICDT '09: Proceedings of the 12th International Conference on Database Theory, pages 310--323, 2009.
[9]
S. Bowers and B. Ludäscher. Actor-oriented design of scientific workflows. In Int. Conf. on Concept. Modeling, pages 369--384, 2005.
[10]
U. Braun, A. Shinnar, and M. Seltzer. Securing provenance. In USENIX HotSec, The 3rd USENIX Workshop on Hot Topics in Security, USENIX HotSec, pages 1--5, Berkeley, CA, USA, July 2008. USENIX Association.
[11]
A. Chebotko, S. Chang, S. Lu, F. Fotouhi, and P. Yang. Scientific workflow provenance querying with security views. WAIM, pages 349--356, July 2008.
[12]
E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P. Samarati. A fine-grained access control system for XML documents. ACM Trans. Inf. Syst. Secur., 5(2):169--202, 2002.
[13]
S. Davidson, S. Khanna, T. Milo, D. Panigrahi, and S. Roy. Provenance views for module privacy. Unpublished manuscript, 2011.
[14]
S. B. Davidson, S. Khanna, D. Panigrahi, and S. Roy. Preserving module privacy in workflow provenance. Manuscript available at http://arxiv.org/abs/1005.5543.
[15]
S. B. Davidson, S. Khanna, S. Roy, and S. Cohen-Boulakia. Privacy issues in scientific workflow provenance. In Proceedings of the 1st International Workshop on Workflow Approaches for New Data-Centric Science, June 2010.
[16]
I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202--210, New York, NY, USA, 2003. ACM.
[17]
C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008.
[18]
C. Dwork. The differential privacy frontier (extended abstract). In TCC, pages 496--502, 2009.
[19]
W. Fan, C. Y. Chan, and M. N. Garofalakis. Secure XML querying with security views. In SIGMOD Conference, pages 587--598, 2004.
[20]
J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In IPAW, volume 4145 of LNCS, pages 10--18. Springer, 2006.
[21]
A. Gil, W. K. Cheung, V. Ratnakar, and K. kin Chan. Privacy enforcement in data analysis workflows. In PEAS, 2007.
[22]
Y. Gil and C. Fritz. Reasoning about the appropriate use of private data through computational workflows. In Intelligent Information Privacy Management, Papers from the AAAI Spring Symposium, pages 69--74, March 2010.
[23]
R. Hasan, R. Sion, and M. Winslett. Introducing secure provenance: problems and challenges. In StorageSS '07, pages 13--18, New York, NY, USA, 2007. ACM.
[24]
G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, pages 951--962, 2010.
[25]
Z. Liu and Y. Chen. Identifying Meaningful Return Information for XML Keyword Search. In SIGMOD, 2007.
[26]
Z. Liu, Q. Shao, and Y. Chen. Searching workflows with hierarchical views. PVLDB, 3(1), 2010.
[27]
J. Lyle and A. Martin. Trusted computing and provenance: Better together. In TaPP '10: 2nd Workshop on the Theory and Practice of Provenance, 2010.
[28]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3, 2007.
[29]
G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, 2004.
[30]
L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson. The open provenance model: An overview. In IPAW, pages 323--326, 2008.
[31]
R. Motwani, S. U. Nabar, and D. Thomas. Auditing SQL queries. In ICDE, pages 287--296, 2008.
[32]
myExperiment. http://www.myexperiment.org/workflows.
[33]
T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, R. Greenwood, K. Carver, M. G. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(1):3045--3054, 2003.
[34]
R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, 3rd edition, 2002.
[35]
V. Rastogi, M. Hay, G. Miklau, and D. Suciu. Relationship privacy: output perturbation for queries with joins. In PODS, pages 107--116, 2009.
[36]
S. S. Shapiro. Privacy by design: moving from art to practice. Commun. ACM, 53(6):27--29, 2010.
[37]
J. Stoyanovich and I. Pe'er. MutaGeneSys: estimating individual disease susceptibility based on genome-wide SNP array data. Bioinformatics, 24(3):440--442, 2008.
[38]
P. Sun, Z. Liu, S. B. Davidson, and Y. Chen. Detecting and resolving unsound workflow views for correct provenance analysis. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 549--562, New York, NY, USA, 2009. ACM.
[39]
L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557--570, 2002.
[40]
I. Taylor, M. Shields, I. Wang, and A. Harrison. The Triana Workflow Environment: Architecture and Applications. In I. Taylor, E. Deelman, D. Gannon, and M. Shields, editors, Workflows for e-Science, pages 320--339. Springer, New York, 2007.
[41]
V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Rec., 33(1):50--57, 2004.

Cited By

View all
  • (2025)Differentially private explanations for aggregate query answersThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00895-434:2Online publication date: 1-Mar-2025
  • (2024)A Variation-Based Genetic Algorithm for Privacy-Preserving Data Publishing2024 11th International Conference on Machine Intelligence Theory and Applications (MiTA)10.1109/MiTA60795.2024.10751708(1-8)Online publication date: 14-Jul-2024
  • (2023)Data Is the New Oil–Sort of: A View on Why This Comparison Is Misleading and Its Implications for Modern Data AdministrationFuture Internet10.3390/fi1502007115:2(71)Online publication date: 12-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDT '11: Proceedings of the 14th International Conference on Database Theory
March 2011
285 pages
ISBN:9781450305297
DOI:10.1145/1938551
  • Program Chair:
  • Tova Milo
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. privacy
  2. provenance
  3. scientific workflows

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '11
EDBT/ICDT '11: EDBT/ICDT '11 joint conference
March 21 - 24, 2011
Uppsala, Sweden

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Differentially private explanations for aggregate query answersThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00895-434:2Online publication date: 1-Mar-2025
  • (2024)A Variation-Based Genetic Algorithm for Privacy-Preserving Data Publishing2024 11th International Conference on Machine Intelligence Theory and Applications (MiTA)10.1109/MiTA60795.2024.10751708(1-8)Online publication date: 14-Jul-2024
  • (2023)Data Is the New Oil–Sort of: A View on Why This Comparison Is Misleading and Its Implications for Modern Data AdministrationFuture Internet10.3390/fi1502007115:2(71)Online publication date: 12-Feb-2023
  • (2023)Data Provenance in Security and PrivacyACM Computing Surveys10.1145/359329455:14s(1-35)Online publication date: 22-Apr-2023
  • (2022)DPXPlainProceedings of the VLDB Endowment10.14778/3561261.356127116:1(113-126)Online publication date: 1-Sep-2022
  • (2022)A Security Framework for Scientific Workflow Provenance Access Control PoliciesIEEE Transactions on Services Computing10.1109/TSC.2019.292158615:1(97-109)Online publication date: 1-Jan-2022
  • (2021)On Optimizing the Trade-off between Privacy and Utility in Data ProvenanceProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452835(379-391)Online publication date: 9-Jun-2021
  • (2021)Fifty Shades of GreyProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency10.1145/3442188.3445871(64-76)Online publication date: 3-Mar-2021
  • (2021)Verifiable Badging System for Scientific Data ReproducibilityBlockchain: Research and Applications10.1016/j.bcra.2021.100015(100015)Online publication date: Jun-2021
  • (2021)Privacy Aspects of Provenance QueriesProvenance and Annotation of Data and Processes10.1007/978-3-030-80960-7_15(218-221)Online publication date: 9-Jul-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media