Abstract
This paper introduces PIRE, a probabilistic IR engine. For both document indexing and retrieval, PIRE makes heavy use of probabilistic Datalog, a probabilistic extension of predicate Horn logics. Using such a logical framework together with probability theory allows for defining and using data types (e.g. text, names, numbers), different weighting schemes (e.g. normalised tf, tf.idf or BM25) and retrieval functions (e.g. uncertain inference, language models). Extending the system thus is reduced to adding new rules. Furthermore, this logical framework provide a powerful tool for including additional background knowledge into the retrieval process.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cooper, W.S., Gey, F.C., Dabney, D.P.: Probabilistic retrieval based on staged logistic regression. In: Belkin, N.J., Ingwersen, P., Pejtersen, A.M. (eds.) Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21-24, pp. 198–210. ACM, New York (1992)
Fienberg, S.: The Analysis of Cross-Classified Categorical Data, 2nd edn. MIT Press, Cambridge (1980)
Freeman, D.H.: Applied Categorical Data Analysis. Dekker, New York (1987)
Fuhr, N.: A probabilistic framework for vague queries and imprecise information in databases. In: Proceedings of the 16th International Conference on Very Large Databases, Los Altos, California, pp. 696–707. Morgan Kaufman, San Francisco (1990)
Fuhr, N.: Towards data abstraction in networked information retrieval systems. Information Processing and Management 35(2), 101–119 (1999)
Fuhr, N.: Probabilistic Datalog: Implementing logical information retrieval for advanced applications. Journal of the American Society for Information Science 51(2), 95–110 (2000)
Fuhr, N., Pfeifer, U.: Combining model-oriented and description-oriented approaches for probabilistic indexing. In: Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–56. ACM, New York (1991)
Fuhr, N., Rölleke, T.: HySpirit – a probabilistic inference engine for hypermedia retrieval in large databases. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 24–38. Springer, Heidelberg (1998)
Gey, F.C.: Inferring probability of relevance using the method of logistic regression. In: Croft, B.W., van Rijsbergen, C.J. (eds.) Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–231. Springer, London (1994)
Nottelmann, H., Fuhr, N.: Decision-theoretic resource selection for different data types in MIND. In: Callan, J., Crestani, F., Sanderson, M. (eds.) SIGIR 2003 Ws Distributed IR 2003. LNCS, vol. 2924, pp. 43–57. Springer, Heidelberg (2003)
Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Callan, J., Cormack, G., Clarke, C., Hawking, D., Smeaton, A. (eds.) Proceedings of the 26st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York (2003)
Nottelmann, H., Fuhr, N.: From retrieval status values to probabilities of relevance for advanced IR applications. Information Retrieval 6(4) (2003)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Croft, W.B., Moffat, A., van Rijsbergen, C.J., Wilkinson, R., Zobel, J. (eds.) Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM, New York (1998)
Robertson, S.E.: The probability ranking principle in IR. Journal of Documentation 33, 294–304 (1977)
Ross, K.: Modular stratification and magic sets for Datalog programs with negation. Journal of the ACM 41(6), 1216–1266 (1994)
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. I. Computer Science Press, Rockville (1988)
van Gelder, A., Ross, K., Schlipf, J.: The well-founded semantics for general logic programs. Journal of the ACM 38(3), 620–650 (1991)
van Rijsbergen, C.J.: A non-classical logic for information retrieval. The Computer Journal 29(6), 481–485 (1986)
van Rijsbergen, C.J.: Probabilistic retrieval revisited. The Computer Journal 35(3), 291–298 (1992)
Wong, S.K.M., Yao, Y.Y.: On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems 13(1), 38–68 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nottelmann, H. (2005). PIRE: An Extensible IR Engine Based on Probabilistic Datalog. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)