Abstract
Provenance is a key metadata for assessing electronic documents trustworthiness. It gives an indicator on the reliability and the quality of the document content. Most of the applications exchanging and processing documents on the web or in the cloud become provenance aware and provide heterogeneous, decentralized and not interoperable provenance data. Most of provenance management systems are either dedicated to a specific application (workflow, database) or a specific data type. Those systems were not conceived to support provenance over distributed and heterogeneous sources. This implies that end-users are faced with different provenance models and different query languages. For these reasons, modeling, collecting and querying provenance across heterogeneous distributed sources is still considered as a challenging task.
This work presents a new provenance management system (PMS) based on semantic web technologies. It allows to import provenance sources, to enrich them semantically to obtain high level representation of provenance. It supports semantic correlation between different provenance sources and allows the use of a high level semantic query language. In the context of cloud infrastructure where most of applications will be deployed in a near future, scalability is a major issue for provenance management systems. We described here an implementation of our PMS based on an NoSQL database management system coupled with the map-reduce parallel model and show that it scales linearly depending on the size of the processed logs.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Goble, C.: Position statement: Musings on provenance, workflow and (semantic web) annotations for bioinformatics. In: Workshop on Data Provenance and Derivation (October 2002)
Curbera, F., Doganata, Y., Martens, A., Mukhi, N.K., Slominski, A.: Business Provenance – A Technology to Increase Traceability of End-to-End Operations. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 100–119. Springer, Heidelberg (2008)
Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, C., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Very Large Data Bases, pp. 1151–1154 (2006)
Hasan, R., Sion, R., Winslett, M.: Preventing history forgery with secure provenance. ACM Transactions on Storage 5, 12:1–12:43 (2009)
Sakka, M.A., Defude, B., Tellez, J.: Document Provenance in the Cloud: Constraints and Challenges. In: Aagesen, F.A., Knapskog, S.J. (eds.) EUNICE 2010. LNCS, vol. 6164, pp. 107–117. Springer, Heidelberg (2010)
Kiran-Kumar, M.R., Margo, S.: Provenance as first class cloud data. SIGOPS Oper. Syst. Rev. 43, 11–16 (2010)
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y.L., Stephan, E., Bussche, J.V.: The open provenance model core specification (v1.1). Future Generation Computer Systems (July 2010)
Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)
Liu, X., Thomsen, C., Pedersen, T.B.: 3XL: Supporting efficient operations on very large OWL Lite triple-stores. Information Systems (December 2010)
Neumann, T., Weikum, G.: Rdf-3x: a risc-style engine for rdf. Proc. VLDB Endow. 1(1), 647–659 (2008)
Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 165–178. ACM, New York (2009)
Fielding, R.T.: Architectural styles and the design of network-based software architectures. PhD thesis (2000)
Software Foundation Apache: Apache couchdb: introduction (2008-2010), http://couchdb.apache.org/docs/intro.html
Kiran Kumar, M.R.: Foundations for Provenance-Aware Systems. PhD thesis, Harvard University (2010)
Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of ACM SIGMOD, pp. 1345–1350 (2008)
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34, 31–36 (2005)
Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: A survey. Computing in Science and Engineering, 11–21 (2008)
Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., Moreau, L.: An architecture for provenance systems. Technical report (February 2006), http://eprints.ecs.soton.ac.uk/13196 (access on December 2011)
Freitas, A., Legendre, A., O’Riain, S., Curry, E.: Prov4j: A semantic web framework for generic provenance management. In: The Second International Workshop on Role of Semantic Web in Provenance Management, SWPM 2010 (2010)
Ram, S., Liu, J.: A new perspective on semantics of data provenance. In: The First International Workshop on Role of Semantic Web in Provenance Management, SWPM 2009 (2009)
Sahoo, S.S., Sheth, A., Henson, C.: Semantic provenance for escience: Managing the deluge of scientific data. IEEE Internet Computing 12, 46–54 (2008)
Sahoo, S.S., Barga, R., Sheth, A., Thirunarayan, K., Hitzler, P.: Prom: A semantic web framework for provenance management in science. Technical Report KNOESIS-TR-2009, Kno.e.sis Center (2009)
Hartig, O.: Provenance information in the web of data. In: Second Workshop on Linked Data on the Web, LDOW (2009)
Moreau, L.: The foundations for provenance on the web. Found. Trends Web Sci. 2, 99–241 (2010)
Chebotko, A., Lu, S., Fei, X., Fotouhi, F.: Rdfprov: A relational rdf store for querying and managing scientific workflow provenance. Data Knowl. Eng., 836–865 (2010)
Zhao, J., Simmhan, Y., Gomadam, K., Prasanna, V.K.: Querying provenance information in distributed environments. International Journal of Computers and Their Applications (IJCA) 18(3), 196–215 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sakka, M.A., Defude, B. (2012). Towards a Scalable Semantic Provenance Management System. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems VII. Lecture Notes in Computer Science, vol 7720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35332-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-35332-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35331-4
Online ISBN: 978-3-642-35332-1
eBook Packages: Computer ScienceComputer Science (R0)