Abstract
A large amount of optimization techniques have been studied in addressing the performance challenges of web search engines, but still leave much room for further improvement. In this paper, we focus on the inverted index traversal techniques, which make directly scans of the posting lists with different loop schemes, providing preliminary results for a complicated ranking procedure. We propose a novel exhaustive index traversal technique called hybrid-scoring at a time (HAAT) on document-ordered indexes, which can reduce memory consumption and candidate selection cost of existing document at a time (DAAT) and term at a time (TAAT) at the expense of revisiting the posting lists of the remaining query terms. Preliminary analysis show comparable computational complexity between HAAT and existing methods. Experimental results with the TREC GOV2 collection show that our approach is comparable with the existing DAAT baseline and considerable performance gains compared to TAAT baseline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dean, J.: Challenges in building large-scale information retrieval systems: invited talk. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 1–1. ACM (2009)
Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. 31(6), 831–850 (1995)
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. (TOIS) 14(4), 349–379 (1996)
Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice. Addison-Wesley Reading, Boston (2010)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (CSUR) 38(2), 6 (2006)
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, pp. 59–59. IEEE (2006)
Chierichetti, F., Lattanzi, S., Mari, F., Panconesi, A.: On placing skips optimally in expectation. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 15–24. ACM (2008)
Boldi, P., Vigna, S.: Compressed perfect embedded skip lists for quick inverted-index lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)
Lacour, P., Macdonald, C., Ounis, I.: Efficiency comparison of document matching techniques. In: Proceedings of ECIR (2008)
Büttcher, S., Clarke, C., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press, Boston (2010)
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 993–1002. ACM (2011)
Jonassen, S., Bratsberg, S.E.: Efficient compressed inverted index skipping for disjunctive text-queries. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 530–542. Springer, Heidelberg (2011)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: Proceedings of SIGIR OSIR Workshop (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jiang, K., Yang, Y. (2015). Exhaustive Hybrid Posting Lists Traversing Technique. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-23862-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)