Skip to main content

Towards a Scalable Semantic Provenance Management System

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 7720))

Abstract

Provenance is a key metadata for assessing electronic documents trustworthiness. It gives an indicator on the reliability and the quality of the document content. Most of the applications exchanging and processing documents on the web or in the cloud become provenance aware and provide heterogeneous, decentralized and not interoperable provenance data. Most of provenance management systems are either dedicated to a specific application (workflow, database) or a specific data type. Those systems were not conceived to support provenance over distributed and heterogeneous sources. This implies that end-users are faced with different provenance models and different query languages. For these reasons, modeling, collecting and querying provenance across heterogeneous distributed sources is still considered as a challenging task.

This work presents a new provenance management system (PMS) based on semantic web technologies. It allows to import provenance sources, to enrich them semantically to obtain high level representation of provenance. It supports semantic correlation between different provenance sources and allows the use of a high level semantic query language. In the context of cloud infrastructure where most of applications will be deployed in a near future, scalability is a major issue for provenance management systems. We described here an implementation of our PMS based on an NoSQL database management system coupled with the map-reduce parallel model and show that it scales linearly depending on the size of the processed logs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goble, C.: Position statement: Musings on provenance, workflow and (semantic web) annotations for bioinformatics. In: Workshop on Data Provenance and Derivation (October 2002)

    Google Scholar 

  2. Curbera, F., Doganata, Y., Martens, A., Mukhi, N.K., Slominski, A.: Business Provenance – A Technology to Increase Traceability of End-to-End Operations. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 100–119. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, C., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Very Large Data Bases, pp. 1151–1154 (2006)

    Google Scholar 

  4. Hasan, R., Sion, R., Winslett, M.: Preventing history forgery with secure provenance. ACM Transactions on Storage 5, 12:1–12:43 (2009)

    Google Scholar 

  5. Sakka, M.A., Defude, B., Tellez, J.: Document Provenance in the Cloud: Constraints and Challenges. In: Aagesen, F.A., Knapskog, S.J. (eds.) EUNICE 2010. LNCS, vol. 6164, pp. 107–117. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Kiran-Kumar, M.R., Margo, S.: Provenance as first class cloud data. SIGOPS Oper. Syst. Rev. 43, 11–16 (2010)

    Article  Google Scholar 

  7. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y.L., Stephan, E., Bussche, J.V.: The open provenance model core specification (v1.1). Future Generation Computer Systems (July 2010)

    Google Scholar 

  8. Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  9. Liu, X., Thomsen, C., Pedersen, T.B.: 3XL: Supporting efficient operations on very large OWL Lite triple-stores. Information Systems (December 2010)

    Google Scholar 

  10. Neumann, T., Weikum, G.: Rdf-3x: a risc-style engine for rdf. Proc. VLDB Endow. 1(1), 647–659 (2008)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  12. Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: friends or foes? Commun. ACM 53(1), 64–71 (2010)

    Article  Google Scholar 

  13. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 165–178. ACM, New York (2009)

    Chapter  Google Scholar 

  14. Fielding, R.T.: Architectural styles and the design of network-based software architectures. PhD thesis (2000)

    Google Scholar 

  15. Software Foundation Apache: Apache couchdb: introduction (2008-2010), http://couchdb.apache.org/docs/intro.html

  16. Kiran Kumar, M.R.: Foundations for Provenance-Aware Systems. PhD thesis, Harvard University (2010)

    Google Scholar 

  17. Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of ACM SIGMOD, pp. 1345–1350 (2008)

    Google Scholar 

  18. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34, 31–36 (2005)

    Article  Google Scholar 

  19. Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: A survey. Computing in Science and Engineering, 11–21 (2008)

    Google Scholar 

  20. Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., Moreau, L.: An architecture for provenance systems. Technical report (February 2006), http://eprints.ecs.soton.ac.uk/13196 (access on December 2011)

  21. Freitas, A., Legendre, A., O’Riain, S., Curry, E.: Prov4j: A semantic web framework for generic provenance management. In: The Second International Workshop on Role of Semantic Web in Provenance Management, SWPM 2010 (2010)

    Google Scholar 

  22. Ram, S., Liu, J.: A new perspective on semantics of data provenance. In: The First International Workshop on Role of Semantic Web in Provenance Management, SWPM 2009 (2009)

    Google Scholar 

  23. Sahoo, S.S., Sheth, A., Henson, C.: Semantic provenance for escience: Managing the deluge of scientific data. IEEE Internet Computing 12, 46–54 (2008)

    Article  Google Scholar 

  24. Sahoo, S.S., Barga, R., Sheth, A., Thirunarayan, K., Hitzler, P.: Prom: A semantic web framework for provenance management in science. Technical Report KNOESIS-TR-2009, Kno.e.sis Center (2009)

    Google Scholar 

  25. Hartig, O.: Provenance information in the web of data. In: Second Workshop on Linked Data on the Web, LDOW (2009)

    Google Scholar 

  26. Moreau, L.: The foundations for provenance on the web. Found. Trends Web Sci. 2, 99–241 (2010)

    Article  MathSciNet  Google Scholar 

  27. Chebotko, A., Lu, S., Fei, X., Fotouhi, F.: Rdfprov: A relational rdf store for querying and managing scientific workflow provenance. Data Knowl. Eng., 836–865 (2010)

    Google Scholar 

  28. Zhao, J., Simmhan, Y., Gomadam, K., Prasanna, V.K.: Querying provenance information in distributed environments. International Journal of Computers and Their Applications (IJCA) 18(3), 196–215 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sakka, M.A., Defude, B. (2012). Towards a Scalable Semantic Provenance Management System. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems VII. Lecture Notes in Computer Science, vol 7720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35332-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35332-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35331-4

  • Online ISBN: 978-3-642-35332-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics