Skip to main content

Exhaustive Hybrid Posting Lists Traversing Technique

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques (IScIDE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9243))

  • 2734 Accesses

Abstract

A large amount of optimization techniques have been studied in addressing the performance challenges of web search engines, but still leave much room for further improvement. In this paper, we focus on the inverted index traversal techniques, which make directly scans of the posting lists with different loop schemes, providing preliminary results for a complicated ranking procedure. We propose a novel exhaustive index traversal technique called hybrid-scoring at a time (HAAT) on document-ordered indexes, which can reduce memory consumption and candidate selection cost of existing document at a time (DAAT) and term at a time (TAAT) at the expense of revisiting the posting lists of the remaining query terms. Preliminary analysis show comparable computational complexity between HAAT and existing methods. Experimental results with the TREC GOV2 collection show that our approach is comparable with the existing DAAT baseline and considerable performance gains compared to TAAT baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean, J.: Challenges in building large-scale information retrieval systems: invited talk. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 1–1. ACM (2009)

    Google Scholar 

  2. Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. 31(6), 831–850 (1995)

    Article  Google Scholar 

  3. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. (TOIS) 14(4), 349–379 (1996)

    Article  Google Scholar 

  4. Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice. Addison-Wesley Reading, Boston (2010)

    Google Scholar 

  5. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (CSUR) 38(2), 6 (2006)

    Article  Google Scholar 

  6. Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, pp. 59–59. IEEE (2006)

    Google Scholar 

  7. Chierichetti, F., Lattanzi, S., Mari, F., Panconesi, A.: On placing skips optimally in expectation. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 15–24. ACM (2008)

    Google Scholar 

  8. Boldi, P., Vigna, S.: Compressed perfect embedded skip lists for quick inverted-index lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Lacour, P., Macdonald, C., Ounis, I.: Efficiency comparison of document matching techniques. In: Proceedings of ECIR (2008)

    Google Scholar 

  10. Büttcher, S., Clarke, C., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press, Boston (2010)

    MATH  Google Scholar 

  11. Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 993–1002. ACM (2011)

    Google Scholar 

  12. Jonassen, S., Bratsberg, S.E.: Efficient compressed inverted index skipping for disjunctive text-queries. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 530–542. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: Proceedings of SIGIR OSIR Workshop (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jiang, K., Yang, Y. (2015). Exhaustive Hybrid Posting Lists Traversing Technique. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23862-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23861-6

  • Online ISBN: 978-3-319-23862-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics