Skip to main content

The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT 2002 (EDBT 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2287))

Included in the following conference series:

Abstract

Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic- similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java servlets. Experiments with a variety of structurally diverse XML data demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul, P. Buneman, D. Suciu: Data on the Web-From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, 2000.

    Google Scholar 

  2. K. Böhm, K. Aberer, E.J. Neuhold, X. Yang: Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in Hyper-StorM, VLDB Journal Vol.6 No.4, Springer, 1997.

    Google Scholar 

  3. S. Brin, L. Page: The Anatomy of a Large Scale Hypertextual Web Search Engine, 7th WWW Conference, 1998.

    Google Scholar 

  4. R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval, Addison Wesley, 1999.

    Google Scholar 

  5. T. Boehme, E. Rahm: XMach-1: A Benchmark for XML Data Management. 9th German Conference on Databases in Office, Engineering, and Scientific Applications (BTW), Oldenburg, Germany, 2001.

    Google Scholar 

  6. T. T. Chinenyanga, N. Kushmerick: Expressive and Efficient Ranked Querying of XML Data. 4th International Workshop on the Web and Databases (WebDB), Santa Barbara, California, 2001.

    Google Scholar 

  7. W.W. Cohen: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity, ACM SIGMOD Conference, Seattle, Washington, 1998.

    Google Scholar 

  8. W. W. Cohen: Recognizing Structure in Web Pages using Similarity Queries. 16. Nat. Conf. on Artif. Intelligence (AAAI) / 11th Conf. on Innovative Appl. Of Artif. Intelligence (IAAI), 1999.

    Google Scholar 

  9. M. Cutler, Y. Shih, W. Meng: Using the Structure of HTML Documents to Improve Retrieval, USENIX Symposium on Internet Technologies and Systems, Monterey, California 1997.

    Google Scholar 

  10. N. Fuhr, K. Groβjohann: XIRQL: An Extension of XQL for Information Retrieval, ACM SIGIR Workshop on XML and Information Retrieval, Athens, Greece, 2000.

    Google Scholar 

  11. D. Florescu, D. Kossmann: Storing and Querying XML Data using RDBMS. In: IEEE Data Eng. Bulletin (Special Issues on XML), 22(3), pp. 27–34, 1999.

    Google Scholar 

  12. D. Florescu, D. Kossmann, I. Manolescu: Integrating Keyword Search into XML Query Processing, 9th WWW Conference, 2000.

    Google Scholar 

  13. T. Fiebig, G. Moerkotte: Evaluating Queries on Structure with Extended Access Support Relations. 3rd International Workshop on Web and Databases (WebDB), Dallas, USA, 2000, LNCS 1997, Springer, 2001.

    Google Scholar 

  14. N. Fuhr, T. Rölleke: HySpirit-a Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases, 6th International Conference on Extending Database Technology (EDBT), Valencia, Spain, 1998.

    Google Scholar 

  15. R. Goldman, J. Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, Very Large Data Base (VLDB) Conference, 1997.

    Google Scholar 

  16. Y. Hayashi, J. Tomita, G. Kikui: Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, 2000.

    Google Scholar 

  17. J.M. Kleinberg: Authoritative Sources in a Hyperlinked Environment, Journal of the ACM Vol. 46, No. 5, 1999.

    Google Scholar 

  18. D. Kossmann (Editor), Special Issue on XML, IEEE Data Engineering Bulletin Vol. 22, No. 3, 1999.

    Google Scholar 

  19. S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal: The Web as a Graph, ACM Symposium on Principles of Database Systems (PODS), Dallas, Texas, 2000.

    Google Scholar 

  20. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, 26(3): 54–66 (1997).

    Article  Google Scholar 

  21. S.-H. Myaeng, D.-H. Jang, M.-S. Kim, Z.-C. Zhoo: A Flexible Model for Retrieval of SGML Documents, ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.

    Google Scholar 

  22. J. McHugh, J. Widom, S. Abiteboul, Q. Luo, A. Rajaraman: Indexing Semistructured Data. Technical Report 01/1998, Computer Science Department, Stanford University, 1998.

    Google Scholar 

  23. P. Mitra, G. Wiederhold, M.L. Kersten: Articulation of Ontology Interdependencies Using a Graph-Oriented Approach, Proceedings of the 7th International Conference on Extending Database Technology (EDBT), Constance, Germany, 2000.

    Google Scholar 

  24. J. Naughton, D. DeWitt, D. Maier, et al.: The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.html

  25. Oracle 8i interMedia: Platform Service for Internet Media and Document Content, http://technet.oracle.com/products/intermedia/

  26. Raghavan, P.: Information Retrieval Algorithms: A Survey, ACM-SIAM Symposium on Discrete Algorithms, 1997.

    Google Scholar 

  27. A. Theobald, G. Weikum: Adding Relevance to XML, 3rd International Workshop on the Web and Databases, Dallas, Texas, 2000, LNCS 1997, Springer, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Theobald, A., Weikum, G. (2002). The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-45876-X_31

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43324-8

  • Online ISBN: 978-3-540-45876-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics