The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

Theobald, Anja; Weikum, Gerhard

doi:10.1007/3-540-45876-X_31

Anja Theobald⁷ &
Gerhard Weikum⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2287))

Included in the following conference series:

International Conference on Extending Database Technology

590 Accesses
56 Citations

Abstract

Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic- similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java servlets. Experiments with a variety of structurally diverse XML data demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Abiteboul, P. Buneman, D. Suciu: Data on the Web-From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, 2000.
Google Scholar
K. Böhm, K. Aberer, E.J. Neuhold, X. Yang: Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in Hyper-StorM, VLDB Journal Vol.6 No.4, Springer, 1997.
Google Scholar
S. Brin, L. Page: The Anatomy of a Large Scale Hypertextual Web Search Engine, 7^th WWW Conference, 1998.
Google Scholar
R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval, Addison Wesley, 1999.
Google Scholar
T. Boehme, E. Rahm: XMach-1: A Benchmark for XML Data Management. 9^th German Conference on Databases in Office, Engineering, and Scientific Applications (BTW), Oldenburg, Germany, 2001.
Google Scholar
T. T. Chinenyanga, N. Kushmerick: Expressive and Efficient Ranked Querying of XML Data. 4^th International Workshop on the Web and Databases (WebDB), Santa Barbara, California, 2001.
Google Scholar
W.W. Cohen: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity, ACM SIGMOD Conference, Seattle, Washington, 1998.
Google Scholar
W. W. Cohen: Recognizing Structure in Web Pages using Similarity Queries. 16. Nat. Conf. on Artif. Intelligence (AAAI) / 11^th Conf. on Innovative Appl. Of Artif. Intelligence (IAAI), 1999.
Google Scholar
M. Cutler, Y. Shih, W. Meng: Using the Structure of HTML Documents to Improve Retrieval, USENIX Symposium on Internet Technologies and Systems, Monterey, California 1997.
Google Scholar
N. Fuhr, K. Groβjohann: XIRQL: An Extension of XQL for Information Retrieval, ACM SIGIR Workshop on XML and Information Retrieval, Athens, Greece, 2000.
Google Scholar
D. Florescu, D. Kossmann: Storing and Querying XML Data using RDBMS. In: IEEE Data Eng. Bulletin (Special Issues on XML), 22(3), pp. 27–34, 1999.
Google Scholar
D. Florescu, D. Kossmann, I. Manolescu: Integrating Keyword Search into XML Query Processing, 9^th WWW Conference, 2000.
Google Scholar
T. Fiebig, G. Moerkotte: Evaluating Queries on Structure with Extended Access Support Relations. 3^rd International Workshop on Web and Databases (WebDB), Dallas, USA, 2000, LNCS 1997, Springer, 2001.
Google Scholar
N. Fuhr, T. Rölleke: HySpirit-a Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases, 6^th International Conference on Extending Database Technology (EDBT), Valencia, Spain, 1998.
Google Scholar
R. Goldman, J. Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, Very Large Data Base (VLDB) Conference, 1997.
Google Scholar
Y. Hayashi, J. Tomita, G. Kikui: Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, 2000.
Google Scholar
J.M. Kleinberg: Authoritative Sources in a Hyperlinked Environment, Journal of the ACM Vol. 46, No. 5, 1999.
Google Scholar
D. Kossmann (Editor), Special Issue on XML, IEEE Data Engineering Bulletin Vol. 22, No. 3, 1999.
Google Scholar
S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal: The Web as a Graph, ACM Symposium on Principles of Database Systems (PODS), Dallas, Texas, 2000.
Google Scholar
J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, 26(3): 54–66 (1997).
Article Google Scholar
S.-H. Myaeng, D.-H. Jang, M.-S. Kim, Z.-C. Zhoo: A Flexible Model for Retrieval of SGML Documents, ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.
Google Scholar
J. McHugh, J. Widom, S. Abiteboul, Q. Luo, A. Rajaraman: Indexing Semistructured Data. Technical Report 01/1998, Computer Science Department, Stanford University, 1998.
Google Scholar
P. Mitra, G. Wiederhold, M.L. Kersten: Articulation of Ontology Interdependencies Using a Graph-Oriented Approach, Proceedings of the 7^th International Conference on Extending Database Technology (EDBT), Constance, Germany, 2000.
Google Scholar
J. Naughton, D. DeWitt, D. Maier, et al.: The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.html
Oracle 8i interMedia: Platform Service for Internet Media and Document Content, http://technet.oracle.com/products/intermedia/
Raghavan, P.: Information Retrieval Algorithms: A Survey, ACM-SIAM Symposium on Discrete Algorithms, 1997.
Google Scholar
A. Theobald, G. Weikum: Adding Relevance to XML, 3^rd International Workshop on the Web and Databases, Dallas, Texas, 2000, LNCS 1997, Springer, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

University of the Saarland, Germany
Anja Theobald & Gerhard Weikum

Authors

Anja Theobald
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Weikum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Aalborg University, Aalborg
Christian S. Jensen & Simonas Šaltenis &
Business and Information Technology Dept., CLRC Rutherford Appleton Laboratory, UK
Keith G. Jeffery
Faculty of Mathematics and Physics, Charles University, Czech Republic
Jaroslav Pokorny
Department of Information Science, University of Milan, Milan
Elisa Bertino
Institute of Information Systems, ETH Zurich, Zurich
Klemens Böhn
Informatik V, RWTH Aachen, Aachen
Matthias Jarke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theobald, A., Weikum, G. (2002). The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_31

Download citation

DOI: https://doi.org/10.1007/3-540-45876-X_31
Published: 14 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43324-8
Online ISBN: 978-3-540-45876-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics