Skip to main content

Efficient, Effective and Flexible XML Retrieval Using Summaries

  • Conference paper
Comparative Evaluation of XML Information Retrieval Systems (INEX 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Abstract

Retrieval queries that combine structural constraints with keyword search are placing new challenges on retrieval systems. This paper presents TReX—a new retrieval system for XML. TReX can efficiently return either all the answers to a given query or only the top-k answers. In this paper, we discuss our participation in the annual Initiative for the Evaluation of XML Retrieval (INEX) workshop in the ad-hoc track. Our main contribution is to investigate the use of summaries and the flexibility they provide when dealing with structural constraints. We describe algorithms for retrieval using summaries. Experimental results are presented showing that TReX answers queries efficiently and effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. INEX: Initiative for the evaluation of XML retrieval (2005), http://inex.is.informatik.uni-duisburg.de:2005

  2. EvalJ: INEX evaluation package (2006), http://evalj.sourceforge.net

  3. Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an XML databases. In: Proc. SIGMOD Conf., pp. 4–15 (2003)

    Google Scholar 

  4. Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: a full-text search extension to XQuer. In: Proc. WWW Conf., pp. 583–594 (2004)

    Google Scholar 

  5. Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: flexible structure and full-text querying for XML. In: Proc. SIGMOD Conf., pp. 83–94 (2004)

    Google Scholar 

  6. Barta, A., Consens, M.P., Mendelzon, A.O.: Benefits of path summaries in an xml query optimizer supporting multiple access methods. In: Proc. VLDB Conf., pp. 133–144 (2005)

    Google Scholar 

  7. Clark, J., DeRose, S.: XML Path Language (XPath) version 1.0 (1999), http://www.w3.org/TR/xpath

  8. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A semantic search engine for XML. In: Proc. VLDB Conf., pp. 45–56 (2003)

    Google Scholar 

  9. Consens, M.P., Milo, T.: Optimizing queries on files. In: Proc. SIGMOD Conf., pp. 301–312 (1994)

    Google Scholar 

  10. Robertson, et al.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proc. SIGIR Conf., pp. 232–241 (1994)

    Google Scholar 

  11. Fagin, et al.: Optimal aggregation algorithms for middleware. In: Proc. PODS Conf., pp. 102–113 (2001)

    Google Scholar 

  12. Goldman, R., Widom, J.: Dataguides: Enabling query formulation and optimization in semistructured databases. In: Proc. VLDB Conf., pp. 436–445 (1997)

    Google Scholar 

  13. Guo, L., et al.: XRANK: Ranked keyword search over XML documents. In: Proc. SIGMOD Conf., pp. 16–27 (2003)

    Google Scholar 

  14. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: Proc. ICDE Conf., pp. 367–378 (2003)

    Google Scholar 

  15. Kaushik, et al.: Covering indexes for branching path queries. In: Proc. SIGMOD Conf., pp. 133–144 (2002)

    Google Scholar 

  16. Kaushik, et al.: Exploiting local similarity for indexing paths in graph-structured data. In: Proc. ICDE Conf., pp. 129–140 (2002)

    Google Scholar 

  17. Kaushik, et al.: On the integration of structure indexes and inverted lists. In: Proc. SIGMOD Conf., pp. 779–790 (2004)

    Google Scholar 

  18. Lu, W., Robertson, S.E., MacFarlane, A.: Field-weighted xml retrieval based on bm25. In: Proc. INEX Workshop, pp. 161–171 (2006)

    Google Scholar 

  19. Malik, S., et al.: Overview of INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in XML. In: Proc. ICDE Conf., pp. 162–173 (2005)

    Google Scholar 

  21. Milo, T., Suciu, D.: Index structures for path expressions. In: Proc. ICDT Conf., pp. 277–295 (1999)

    Google Scholar 

  22. Rizzolo, F., Mendelzon, A.O.: Indexing XML data with ToXin. In: Proc. WebDB Workshop, pp. 49–54 (2001)

    Google Scholar 

  23. Schlieder, T., Meuss, H.: Querying and ranking XML documents. Journal of the American Society for Information, Science and Technology 53(6), 489–503 (2002)

    Article  Google Scholar 

  24. Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for TopX search. In: Proc. VLDB Conf., pp. 625–636 (2005)

    Google Scholar 

  25. Trotman, A., Sigurbjornsson, B.: Narrowed extended XPath I (NEXI). In: Proc. INEX Workshop, pp. 16–39 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ali, M.S., Consens, M., Gu, X., Kanza, Y., Rizzolo, F., Stasiu, R. (2007). Efficient, Effective and Flexible XML Retrieval Using Summaries. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73888-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73887-9

  • Online ISBN: 978-3-540-73888-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics