Skip to main content

TopX 2.0 at the INEX 2008 Efficiency Track

A (Very) Fast Object-Store for Top-k-Style XML Full-Text Search

  • Conference paper
Advances in Focused Retrieval (INEX 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5631))

Abstract

For the INEX Efficiency Track 2008, we were just on time to finish and evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode—an average of merely 89 ms per CAS query and 49 ms per CO query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bast, H., Majumdar, D., Theobald, M., Schenkel, R., Weikum, G.: IO-Top-k: Index-optimized top-k query processing. In: VLDB, pp. 475–486 (2006)

    Google Scholar 

  2. Broschart, A., Schenkel, R., Theobald, M., Weikum, G.: TopX @ INEX 2007. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 49–56. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Clarke, C.L.A.: Controlling overlap in content-oriented XML retrieval. In: Baeza-Yates, R.A., Ziviani, N., Marchionini, G., Moffat, A., Tait, J. (eds.) SIGIR, pp. 314–321. ACM Press, New York (2005)

    Google Scholar 

  4. Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. In: SIGIR Forum (2006)

    Google Scholar 

  5. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS. ACM Press, New York (2001)

    Google Scholar 

  6. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Grust, T.: Accelerating XPath location steps. In: Franklin, M.J., Moon, B., Ailamaki, A. (eds.) SIGMOD Conference, pp. 109–120. ACM Press, New York (2002)

    Google Scholar 

  8. Helmer, S., Neumann, T., Moerkotte, G.: A robust scheme for multilevel extendible hashing. In: Computer and Information Sciences - 18th International Symposium (ISCIS), pp. 220–227 (2003)

    Google Scholar 

  9. Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: TopX: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)

    Article  Google Scholar 

  10. Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for TopX search. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.-Å., Ooi, B.C. (eds.) VLDB, pp. 625–636. ACM Press, New York (2005)

    Google Scholar 

  11. Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web, pp. 387–396. ACM Press, New York (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Theobald, M., AbuJarour, M., Schenkel, R. (2009). TopX 2.0 at the INEX 2008 Efficiency Track. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03761-0_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03760-3

  • Online ISBN: 978-3-642-03761-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics