skip to main content
10.1145/1938551.1938554acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

On provenance and privacy

Published:21 March 2011Publication History

ABSTRACT

Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and reproducibility of results. On the other hand, a scientific workflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows -- data, module, and structural privacy - and frame several natural questions: (i) Can we formally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work.

References

  1. C. C. Aggarwal and P. S. Yu, editors. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving anonymity via clustering. In PODS, pages 153--162, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In ICDE, pages 5--16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Backstrom, C. Dwork, and J. M. Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW, pages 181--190, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Beeri, A. Eyal, T. Milo, and A. Pilberg. Monitoring business processes with queries. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 603--614. VLDB Endowment, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Bertino and E. Ferrari. Secure and selective dissemination of XML documents. ACM Trans. Inf. Syst. Secur., 5(3):290--331, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Biton, S. C. Boulakia, S. B. Davidson, and C. S. Hara. Querying and managing provenance through user views in scientific workflows. In ICDE, pages 1072--1081, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Biton, S. B. Davidson, S. Khanna, and S. Roy. Optimizing user views for workflows. In ICDT '09: Proceedings of the 12th International Conference on Database Theory, pages 310--323, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Bowers and B. Ludäscher. Actor-oriented design of scientific workflows. In Int. Conf. on Concept. Modeling, pages 369--384, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Braun, A. Shinnar, and M. Seltzer. Securing provenance. In USENIX HotSec, The 3rd USENIX Workshop on Hot Topics in Security, USENIX HotSec, pages 1--5, Berkeley, CA, USA, July 2008. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Chebotko, S. Chang, S. Lu, F. Fotouhi, and P. Yang. Scientific workflow provenance querying with security views. WAIM, pages 349--356, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P. Samarati. A fine-grained access control system for XML documents. ACM Trans. Inf. Syst. Secur., 5(2):169--202, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Davidson, S. Khanna, T. Milo, D. Panigrahi, and S. Roy. Provenance views for module privacy. Unpublished manuscript, 2011.Google ScholarGoogle Scholar
  14. S. B. Davidson, S. Khanna, D. Panigrahi, and S. Roy. Preserving module privacy in workflow provenance. Manuscript available at http://arxiv.org/abs/1005.5543.Google ScholarGoogle Scholar
  15. S. B. Davidson, S. Khanna, S. Roy, and S. Cohen-Boulakia. Privacy issues in scientific workflow provenance. In Proceedings of the 1st International Workshop on Workflow Approaches for New Data-Centric Science, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202--210, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Dwork. The differential privacy frontier (extended abstract). In TCC, pages 496--502, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Fan, C. Y. Chan, and M. N. Garofalakis. Secure XML querying with security views. In SIGMOD Conference, pages 587--598, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In IPAW, volume 4145 of LNCS, pages 10--18. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Gil, W. K. Cheung, V. Ratnakar, and K. kin Chan. Privacy enforcement in data analysis workflows. In PEAS, 2007.Google ScholarGoogle Scholar
  22. Y. Gil and C. Fritz. Reasoning about the appropriate use of private data through computational workflows. In Intelligent Information Privacy Management, Papers from the AAAI Spring Symposium, pages 69--74, March 2010.Google ScholarGoogle Scholar
  23. R. Hasan, R. Sion, and M. Winslett. Introducing secure provenance: problems and challenges. In StorageSS '07, pages 13--18, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, pages 951--962, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Liu and Y. Chen. Identifying Meaningful Return Information for XML Keyword Search. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Liu, Q. Shao, and Y. Chen. Searching workflows with hierarchical views. PVLDB, 3(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Lyle and A. Martin. Trusted computing and provenance: Better together. In TaPP '10: 2nd Workshop on the Theory and Practice of Provenance, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson. The open provenance model: An overview. In IPAW, pages 323--326, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Motwani, S. U. Nabar, and D. Thomas. Auditing SQL queries. In ICDE, pages 287--296, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. myExperiment. http://www.myexperiment.org/workflows.Google ScholarGoogle Scholar
  33. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, R. Greenwood, K. Carver, M. G. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(1):3045--3054, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, 3rd edition, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. V. Rastogi, M. Hay, G. Miklau, and D. Suciu. Relationship privacy: output perturbation for queries with joins. In PODS, pages 107--116, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. S. Shapiro. Privacy by design: moving from art to practice. Commun. ACM, 53(6):27--29, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Stoyanovich and I. Pe'er. MutaGeneSys: estimating individual disease susceptibility based on genome-wide SNP array data. Bioinformatics, 24(3):440--442, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Sun, Z. Liu, S. B. Davidson, and Y. Chen. Detecting and resolving unsound workflow views for correct provenance analysis. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 549--562, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. I. Taylor, M. Shields, I. Wang, and A. Harrison. The Triana Workflow Environment: Architecture and Applications. In I. Taylor, E. Deelman, D. Gannon, and M. Shields, editors, Workflows for e-Science, pages 320--339. Springer, New York, 2007.Google ScholarGoogle Scholar
  41. V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Rec., 33(1):50--57, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On provenance and privacy

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICDT '11: Proceedings of the 14th International Conference on Database Theory
        March 2011
        285 pages
        ISBN:9781450305297
        DOI:10.1145/1938551
        • Program Chair:
        • Tova Milo

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 March 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader