Abstract
Provenance, from the French word “provenir”, describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Society, B.C.: Grand challenges in computing research, BCS Survey (2004)
Sahoo, S.S., Sheth, A., Henson, C.: Semantic Provenance for eScience: Managing the Deluge of Scientific Data. IEEE Internet Computing 12(4), 46–54 (2008)
Sahoo, S.S., Barga, R.S., Goldstein, J., Sheth, A.: Provenance Algebra and Materialized View-based Provenance Management: Microsoft Research Technical Report; (November 2008)
Tan, W.C.: Provenance in Databases: Past, Current, and Future. IEEE Data Eng. Bull. 30(4), 3–12 (2007)
Simmhan, Y.L., Plale, A.B., Gannon, A.D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)
Sahoo, S.S., Barga, R.S., Goldstein, J., Sheth, A.P., Thirunarayan, K.: Where did you come from.Where did you go? An Algebra and RDF Query Engine for Provenance Kno.e.sis Center, Wright State University (2009)
Aurrecoechea, C., Heiges, M., Wang, H., Wang, Z., Fischer, S., Rhodes, P., Miller, J., Kraemer, E., Stoeckert Jr., C.J., Roos, D.S., Kissinger, J.C.: ApiDB: integrated resources for the apicomplexan bioinformatics resource center. Nucleic Acids Research 35(D), 427–430 (2007)
http://www.w3.org/TR/rdf-mt/#defentail (January 22, 2008)
Kelly, B.K., Anderson, P.E., Reo, N.V., DelRaso, N.J., Doom, T.E., Raymer, M.L.: A proposed statistical protocol for the analysis of metabolic toxicological data derived from NMR spectroscopy. In: 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), Cambridge - Boston, Massachusetts, USA, pp. 1414–1418 (2007)
http://www.oracle.com/technology/industries/life_sciences/olsug.html
http://www.w3.org/TR/rdf-sparql-query (January 22, 2008)
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: 31st international Conference on Very Large Data Bases, August 30 - September 02, pp. 1216–1227. VLDB Endowment, Trondheim (2005)
Sahoo, S.S., Thomas, C., Sheth, A., York, W.S., Tartir, S.: Knowledge modeling and its application in life sciences: a tale of two ontologies. In: Proceedings of the 15th international Conference on World Wide Web WWW 2006, Edinburgh, Scotland, May 23 - 26, pp. 317–326 (2006)
Smith, B., Ceusters, W., Klagges, B., Kohler, J., Kumar, A., Lomax, J., et al.: Relations in biomedical ontologies. Genome Biol. 6(5), R46 (2005)
http://www.w3.org/TR/owl-features/ (January 22, 2008)
Hobbs, J.R., Pan, F.: Time Ontology in OWL In: W3C Working Draft (2006)
Pérez, J., Arenas, M., Gutiérrez, C.: Semantics and Complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)
Vardi, M.: The Complexity of Relational Query Languages. In: 14th Ann. ACM Symp. Theory of Computing (STOC 1982), pp. 137–146 (1982)
Buneman, P., Khanna, S., Tan, W.C.: Why and Where: A Characterization of Data Provenance. In: 8th International Conference on Database Theory, pp. 316–330 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sahoo, S.S., Weatherly, D.B., Mutharaju, R., Anantharam, P., Sheth, A., Tarleton, R.L. (2009). Ontology-Driven Provenance Management in eScience: An Application in Parasite Research. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05151-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-05151-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05150-0
Online ISBN: 978-3-642-05151-7
eBook Packages: Computer ScienceComputer Science (R0)