ABSTRACT
Information extraction (IE) programs for the web consume and produce a lot of data. In order to better understand the program output, the developer and user often desire to know the details of how the output was created. Provenance can be used to learn about the creation of the output. We collect fine-grained provenance by leveraging ongoing work in the IE community to write IE programs in a logic programming language. The logic programming language exposes the semantics of the program, allowing us to gather fine-grained provenance during program execution. We discuss a case study using a web-based community information management system, then present results regarding the performance of queries over the provenance data gathered by our logic program interpreter. Our findings show that it is possible to gather useful fine-grained provenance during the execution of a logic based web information extraction program. Additionally, queries over this provenance information can be performed in a reasonable amount of time.
- P. DeRose, W. Shen, F. Chen, Y. Lee, D. Burdick, A. Doan, and R. Ramakrishnan. DBLife: A community information management platform for the database research community. In CIDR-07, 2007.Google Scholar
- A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases., 29(1), 2006.Google Scholar
- J. Freire, D. Koop, E. Santos, and C. T. Silva. Provenance for computational tasks: A survey. Computing in Science and Engineering, May/June 2008. Google ScholarDigital Library
- L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, 27(6):743--756, 2011. Google ScholarDigital Library
- K.-K. Muniswamy-Reddy, U. Braun, D. A. Holland, P. Macko, D. Margo, M. Seltzer, and R. Smogor. Layering in provenance systems. In Proceedings of the 2009 USENIX Annual Technical Conference, San Diego, California, June 2009. Google ScholarDigital Library
- W. Shen, A. Doan, J. Naughton, and R. Ramakrishnan. Declarative information extraction using datalog with embedded extraction predicates. In Proceedings of the 33rd VLDB Conference, pages 1033--1044. VLDB Endowment, 2007. Google ScholarDigital Library
Index Terms
- Instrumenting a logic programming language to gather provenance from an information extraction application
Recommendations
From answer set logic programming to circumscription via logic of GK
We first embed Pearce's equilibrium logic and Ferraris's propositional general logic programs in Lin and Shoham's logic of GK, a nonmonotonic modal logic that has been shown to include as special cases both Reiter's default logic in the propositional ...
‘Classical’ Negation in Nonmonotonic Reasoning and Logic Programming
Gelfond and Lifschitz were the first to point out the need for a symmetric negation in logic programming and they also proposed a specific semantics for such negation for logic programs with the stable semantics, which they called ‘classical’. Subsequently,...
Comments