ABSTRACT
Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and reproducibility of results. On the other hand, a scientific workflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows -- data, module, and structural privacy - and frame several natural questions: (i) Can we formally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work.
- C. C. Aggarwal and P. S. Yu, editors. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2008. Google ScholarDigital Library
- G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving anonymity via clustering. In PODS, pages 153--162, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In ICDE, pages 5--16, 2002. Google ScholarDigital Library
- L. Backstrom, C. Dwork, and J. M. Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW, pages 181--190, 2007. Google ScholarDigital Library
- C. Beeri, A. Eyal, T. Milo, and A. Pilberg. Monitoring business processes with queries. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 603--614. VLDB Endowment, 2007. Google ScholarDigital Library
- E. Bertino and E. Ferrari. Secure and selective dissemination of XML documents. ACM Trans. Inf. Syst. Secur., 5(3):290--331, 2002. Google ScholarDigital Library
- O. Biton, S. C. Boulakia, S. B. Davidson, and C. S. Hara. Querying and managing provenance through user views in scientific workflows. In ICDE, pages 1072--1081, 2008. Google ScholarDigital Library
- O. Biton, S. B. Davidson, S. Khanna, and S. Roy. Optimizing user views for workflows. In ICDT '09: Proceedings of the 12th International Conference on Database Theory, pages 310--323, 2009. Google ScholarDigital Library
- S. Bowers and B. Ludäscher. Actor-oriented design of scientific workflows. In Int. Conf. on Concept. Modeling, pages 369--384, 2005. Google ScholarDigital Library
- U. Braun, A. Shinnar, and M. Seltzer. Securing provenance. In USENIX HotSec, The 3rd USENIX Workshop on Hot Topics in Security, USENIX HotSec, pages 1--5, Berkeley, CA, USA, July 2008. USENIX Association. Google ScholarDigital Library
- A. Chebotko, S. Chang, S. Lu, F. Fotouhi, and P. Yang. Scientific workflow provenance querying with security views. WAIM, pages 349--356, July 2008. Google ScholarDigital Library
- E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P. Samarati. A fine-grained access control system for XML documents. ACM Trans. Inf. Syst. Secur., 5(2):169--202, 2002. Google ScholarDigital Library
- S. Davidson, S. Khanna, T. Milo, D. Panigrahi, and S. Roy. Provenance views for module privacy. Unpublished manuscript, 2011.Google Scholar
- S. B. Davidson, S. Khanna, D. Panigrahi, and S. Roy. Preserving module privacy in workflow provenance. Manuscript available at http://arxiv.org/abs/1005.5543.Google Scholar
- S. B. Davidson, S. Khanna, S. Roy, and S. Cohen-Boulakia. Privacy issues in scientific workflow provenance. In Proceedings of the 1st International Workshop on Workflow Approaches for New Data-Centric Science, June 2010. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202--210, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008. Google ScholarDigital Library
- C. Dwork. The differential privacy frontier (extended abstract). In TCC, pages 496--502, 2009. Google ScholarDigital Library
- W. Fan, C. Y. Chan, and M. N. Garofalakis. Secure XML querying with security views. In SIGMOD Conference, pages 587--598, 2004. Google ScholarDigital Library
- J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In IPAW, volume 4145 of LNCS, pages 10--18. Springer, 2006. Google ScholarDigital Library
- A. Gil, W. K. Cheung, V. Ratnakar, and K. kin Chan. Privacy enforcement in data analysis workflows. In PEAS, 2007.Google Scholar
- Y. Gil and C. Fritz. Reasoning about the appropriate use of private data through computational workflows. In Intelligent Information Privacy Management, Papers from the AAAI Spring Symposium, pages 69--74, March 2010.Google Scholar
- R. Hasan, R. Sion, and M. Winslett. Introducing secure provenance: problems and challenges. In StorageSS '07, pages 13--18, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, pages 951--962, 2010. Google ScholarDigital Library
- Z. Liu and Y. Chen. Identifying Meaningful Return Information for XML Keyword Search. In SIGMOD, 2007. Google ScholarDigital Library
- Z. Liu, Q. Shao, and Y. Chen. Searching workflows with hierarchical views. PVLDB, 3(1), 2010. Google ScholarDigital Library
- J. Lyle and A. Martin. Trusted computing and provenance: Better together. In TaPP '10: 2nd Workshop on the Theory and Practice of Provenance, 2010. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3, 2007. Google ScholarDigital Library
- G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, 2004. Google ScholarDigital Library
- L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson. The open provenance model: An overview. In IPAW, pages 323--326, 2008. Google ScholarDigital Library
- R. Motwani, S. U. Nabar, and D. Thomas. Auditing SQL queries. In ICDE, pages 287--296, 2008. Google ScholarDigital Library
- myExperiment. http://www.myexperiment.org/workflows.Google Scholar
- T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, R. Greenwood, K. Carver, M. G. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(1):3045--3054, 2003. Google ScholarDigital Library
- R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, 3rd edition, 2002. Google ScholarDigital Library
- V. Rastogi, M. Hay, G. Miklau, and D. Suciu. Relationship privacy: output perturbation for queries with joins. In PODS, pages 107--116, 2009. Google ScholarDigital Library
- S. S. Shapiro. Privacy by design: moving from art to practice. Commun. ACM, 53(6):27--29, 2010. Google ScholarDigital Library
- J. Stoyanovich and I. Pe'er. MutaGeneSys: estimating individual disease susceptibility based on genome-wide SNP array data. Bioinformatics, 24(3):440--442, 2008. Google ScholarDigital Library
- P. Sun, Z. Liu, S. B. Davidson, and Y. Chen. Detecting and resolving unsound workflow views for correct provenance analysis. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 549--562, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557--570, 2002. Google ScholarDigital Library
- I. Taylor, M. Shields, I. Wang, and A. Harrison. The Triana Workflow Environment: Architecture and Applications. In I. Taylor, E. Deelman, D. Gannon, and M. Shields, editors, Workflows for e-Science, pages 320--339. Springer, New York, 2007.Google Scholar
- V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Rec., 33(1):50--57, 2004. Google ScholarDigital Library
Index Terms
- On provenance and privacy
Recommendations
On Optimizing the Trade-off between Privacy and Utility in Data Provenance
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataOrganizations that collect and analyze data may wish or be mandated by regulation to justify and explain their analysis results. At the same time, the logic that they have followed to analyze the data, i.e., their queries, may be proprietary and ...
Privacy-preserving publication of provenance workflows
CODASPY '14: Proceedings of the 4th ACM conference on Data and application security and privacyProvenance workflows capture the data movement and the operations changing the data in complex applications such as scientific computations, document management in large organizations, content generation in social media, etc. Provenance is essential to ...
Provenance views for module privacy
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsScientific workflow systems increasingly store provenance information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions. However, authors/owners of ...
Comments