Skip to main content

On Structures of Inverted Index for Query Processing Efficiency

  • Conference paper
  • First Online:
Book cover Information Retrieval Technology (AIRS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Included in the following conference series:

  • 826 Accesses

Abstract

Inverted index has been widely adopted by modern search engines to effectively manage billions of documents and respond to users’ queries. Recently, many auxiliary index variants are brought up to enhance the engine’s compression ratio or query processing efficiency. The most successful auxiliary index structures are Block-Max Index and Dual-Sorted Index, both used for quickening the query processing. More precisely, Block-Max Index is designed for efficient top-k query processing while Dual-Sorted Index introduces pattern matching to solve complex query. There is little work thoroughly analyses and compares the performance of the two auxiliary structures. In this paper, an in-depth study on Block-Max Index and Dual-Sorted Index is presented, with a survey on related top-k query processing strategies. Finally, experimental results on TREC GOV2 dataset with detailed analysis show that Dual-Sorted Index achieves the best query processing performance at the price of huge space occupation, moreover, it sheds light upon the prospect of combining compact data structures with inverted index.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (CSUR) 38(2), 6 (2006)

    Article  Google Scholar 

  2. Lemire, D., Boytsov, L., Kurz, N.: SIMD Compression and the Intersection of Sorted Integers (2014). arXiv:1401.6399

  3. Barbay, J., López-Ortiz, A., Lu, T., Salinger, A.: An experimental investigation of set intersection algorithms for text searching. J. Exp. Algorithmics (JEA) 14, 7 (2009)

    Google Scholar 

  4. Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993–1002. ACM (2011)

    Google Scholar 

  5. Shan, D., Ding, S., He, J., Yan, H., Li, X.: Optimized top-k processing with global page scores on block-max indexes. In: Proceedings of the fifth ACM International Conference on Web Search and Data Mining, pp. 423–432. ACM (2012)

    Google Scholar 

  6. Dimopoulos, C., Nepomnyachiy, S., Suel, T.: Optimizing top-k document retrieval strategies for block-max indexes. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 113–122. ACM (2013)

    Google Scholar 

  7. Navarro, G., Puglisi, S.J.: Dual-sorted inverted lists. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 309–321. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Konow, R., Navarro, G.: Dual-Sorted Inverted Lists in Practice. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 295–306. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Catena, M., Macdonald, C., Ounis, I.: On inverted index compression for search engine efficiency. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 359–371. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  10. Navarro, G.: Wavelet trees for all. J. Discrete Algorithms 25, 2–20 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gagie, T., Navarro, G., Puglisi, S.J.: New algorithms on wavelet trees and applications to information retrieval. Theoret. Comput. Sci. 426, 25–41 (2012)

    Article  MathSciNet  Google Scholar 

  12. Claude, F., Navarro, G.: The wavelet matrix. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 167–179. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable information retrieval platform. In: Proceedings of the OSIR Workshop, pp. 18–25 (2006)

    Google Scholar 

  14. Li, X., Wang, Y., Li, X., et al.: Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index. Knowl. Inf. Syst. 41(2), 277–309 (2014)

    Article  Google Scholar 

  15. Ma, D., Rao, L., Wang, T.: An empirical study of SLDA for information retrieval. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 84–92. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Petri, M., Culpepper, J.S., Moffat, A.: Exploring the magic of WAND. In: Proceedings of the 18th Australasian Document Computing Symposium, pp. 58–65. ACM (2013)

    Google Scholar 

  17. Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. 31(6), 831–850 (1995)

    Article  Google Scholar 

  18. Brisaboa, N.R., Ladra, S., Navarro, G.: DACs: bringing direct access to variable-length codes. Inf. Process. Manage. 49(1), 392–404 (2013)

    Article  Google Scholar 

  19. Culpepper, J., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k Ranked document search in general text databases. In: Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Culpepper, J.S., Petri, M., Scholer, F.: Efficient in-memory top-k document retrieval. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 225–234. ACM (2012)

    Google Scholar 

  21. Petri, M., Moffat, A., Culpepper, J.S.: Score-safe term-dependency processing with hybrid indexes. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 899–902. ACM (2014)

    Google Scholar 

  22. Chakrabarti, K., Chaudhuri, S., Ganti, V.: Interval-based pruning for top-k processing over compressed lists. In: IEEE 27th International Conference on Data Engineering (ICDE), 2011, pp. 709–720. IEEE (2011)

    Google Scholar 

  23. González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Proceeding of WEA, pp. 27–38 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingshen Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Song, X., Zhang, X., Yang, Y., Quan, J., Jiang, K. (2015). On Structures of Inverted Index for Query Processing Efficiency. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28940-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28939-7

  • Online ISBN: 978-3-319-28940-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics