Abstract
This paper introduces an expressive formal Information Retrieval model developed for the Web. It is based on the Bayesian inference network model and views IR as an evidential reasoning process. It supports the explicit combination of multiple Web document representations under a single framework. Information extracted from the content of Web documents and derived from the analysis of the Web link structure is used as source of evidence in support of the ranking algorithm. This content and link-based evidential information is utilised in the generation of the multiple Web document representations used in the combination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agosti, M. & Melucci, M. Information Retrieval on the Web. Lectures on Information Retrieval: Third European Summer-School ESSIR 2000, Varenna, Italy, September 11–15, 2000, Agosti, M. Crestani, F. & Pasi, G. eds. Revised Lectures, Springer-Verlag, Berlin/Heidelberg, 2001, 242–285.
Amitay, E. Using common hypertext links to identify the best phrasal description of target Web documents. In Proceedings of the SIGIR Post-Conference Workshop on Hypertext Information Retrieval for the Web, Melbourne, Australia, 1998.
Amitay, E. InCommonSense-Rethinking Web Results. IEEE International Conference on Multimedia and Expo (ICME 2000), New York City, NY, USA.
Attardi, G., Gullì, A. & Sebastiani, F. Automatic Web Page Categorization by Link and Context Analysis. European Symposium on Telematics, Hypermedia and Artificial Intelligence, Varese, 1999.
Belkin, N. J., Kantor, P., Fox, E. A. & Shaw, J. A. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3), pp. 431–448, 1995.
Bharat, K. & Henzinger, M. Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 104–111.
Brin, S. & Page, L. The Anatomy of a Large-Scale HyperTextual Web Search Engine. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998.
Callan, J.P., Croft, W.B., & Harding, S.M. The INQUERY Retrieval System. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, 1992, pp. 78–83.
Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P. & Rajagopalan, S. Automatic resource list compilation by analysing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, 1998.
Croft, W.B. Combining Approaches to Information Retrieval, Advances in Information Retrieval: Recent Research from the CIIR, W. Bruce Croft, ed., Kluwer Academic Publishers, Chapter 1, pp.1–36, 2000.
Croft, W.B. & Turtle, H. A Retrieval Model Incorporating Hypertext Links. In Proceedings of the second annual ACM conference on Hypertext, Pittsburgh, PA USA, 1989, pp. 213–224.
Cutler, M., Deng H., Manicaam S., & Meng W. A New Study on Using HTML Structures to Improve Retrieval. The Eleventh IEEE International Conference on Tools with Artificial Intelligence (ICTAI99), Chicago IL, November 9–11, 1999
Davison, B. D. Topical Locality in the Web. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 24–28, pages 272–279.
Dunlop, M. D. & van Rijsbergen, C. J. Hypermedia and free text retrieval, Information Processing and Management, vol. 29(3), May 1993.
Fischer, H. & Elchesen, D. Effectiveness of combining title words and index terms in machine retrieval searches, Nature, 238:109–11, 1972.
Fox, E., Nunn, G. & Lee, W. Coefficients for combining concept classes in a collection. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291–308, 1988.
Gauch, S., Wang, H. & Gomez, M. ProFusion: Intelligent Fusion from Multiple, Distributed Search Engines. Journal of Universal Computing, Springer-Verlag, Volume 2 (9), September 1996.
Géry, M. & Chevallet, J. P. Toward a Structured Information Retrieval System on the Web: Automatic Structure Extraction of Web Pages. In International Workshop on Web Dynamics. In conjunction with the 8th International Conference on Database Theory. London, UK, 3 January 2001.
Katzer, J., McGill, M., Tessier, J., Frakes, W. & DasGupta, P. A study of the overlap among document representations. Information Technology: Research and Development, 1(4): 261–274, 1982.
Kleinberg, J. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46[1999]. Also appears as IBM Research Report RJ 10076, May 1997.
Lawrence, S. Context in Web Search. IEEE Data Engineering Bulletin, Volume 23, Number 3, pp.25–32, 2000.
Pearl, J. Probabilistic Reasoning in Intelligent systems: Networks of plausible inference., Revised second printing, Morgan Kaufmann Publishers Inc., 1997.
Rajashekar, T. & Croft, B. Combining automatic and manual index representation in probabilistic retrieval. Journal of the American Society for Information Science, 46(4):272–283, 1995.
Robertson, S.E. Theories and models in Information Retrieval. Journal of Documentation, 33, pp. 126–148, 1977.
Robertson, S. & Sparck-Jones, K. Relevance weighting of search terms. Journal of American society for Information Science, 27:129–146, 1976.
Ribeiro-Neto, B., daSilva, I. & Muntz, R. Bayesian Network Models for IR. In Soft Computing in Information Retrieval: Techniques and Applications, Crestani, F. & Pasi, G. editors. Springer Verlag, 2000. pp 259–291
Ruthven, I., Lalmas, M. & van Rijsbergen, K. Combining and selecting characteristics of information use. Journal of the American Society of Information Science and Technology, 2002 (To appear).
Salton, G., Yang, C. & Wong, A. A vector space model for automatic indexing, Communications of the ACM, 18(11), pp. 613–620, 1975.
Savoy, J., Le Calvé, A. & Vrajitoru, D. Report on the TREC-5 Experiment: Data Fusion and Collection Fusion. Proceedings TREC5, 1996.NIST Publication 500-238, Gaithersburg (MD), 489–502, 1996.
Selberg, E. & Etzioni, O. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, January / February 1997, Volume 12 No. 1, pp. 8–14.
Silva, I., Ribeiro-Neto, B., Calado, P., Moura, E. & Ziviani, N. Link-Based and Content-Based Evidential Information in aBelief Network Model. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000, pp 96–103
Sparck Jones, K. & Willett, P. Readings in Information Retrieval, Sparck Jones, K. & Willett, P. eds, Morgan Kaufmann Publishers, 1997.
Tsikrika, T. & Lalmas, M. Merging Techniques for Performing Data Fusion on the Web. Proceedings of the Tenth International Conference on Information and Knowledge Management (ACM CIKM 2001), Atlanta, Georgia, November 5–10, 2001.
Turtle H. R. Inference Networks for Document Retrieval. Ph.D. dissertation.
Turtle, H. & Croft, W.B. Evaluation of an Inference Network-Based Retrieval Model. ACM Transactions on Information Systems, 9(3), pp. 187–222.
van Rijsbergen, C. J. A Non-Classical Logic for Information Retrieval. In Readings in Information Retrieval, Sparck-Jones, K. & Willett, P. editors. The Morgan Kaufmann Series in Multimedia Information and Systems, Edward Fox Series Editor, 1997.
Zhu, X. & Gauch, S. Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web. In the Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 24–28, 2000, Athens, Greece, pp. 288–295.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsikrika, T., Lalmas, M. (2002). Combining Web Document Representations in a Bayesian Inference Network Model Using Link and Content-Based Evidence. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_4
Download citation
DOI: https://doi.org/10.1007/3-540-45886-7_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43343-9
Online ISBN: 978-3-540-45886-9
eBook Packages: Springer Book Archive