ABSTRACT
The three generations of postings list compression strategies (Variable Byte Encoding, Word Aligned Codes, and SIMD Codecs) are examined in order to test whether or not each truly represented a generational change -- they do. Some weaknesses of the current SIMD-based schemes are identified and a new scheme, QMX, is introduced to address both space and decoding inefficiencies. Improvements are examined on multiple architectures and it is shown that different SSE implementations (Intel and AMD) perform differently.
- Anh, V.N., A. Moffat, Inverted Index Compression using Word-Aligned Binary Codes. Inf. Ret., 2005. 8(1):151--166. Google ScholarDigital Library
- Anh, V.N., A. Moffat, Index compression using 64-bit words. Softw. Pract. Exper., 2010. 40(2):131--147. Google ScholarDigital Library
- Catena, M., C. Macdonald, I. Ounis, On Inverted Index Compression for Search Engine Efficiency, in ECIR 2014, pp. 359--371.Google Scholar
- Dean, J., Challenges in Building Large-scale Information Retrieval Systems: Invited Talk, in WSDM 2009. Google ScholarDigital Library
- Elias, P., Universal Codeword Sets and the Representation of the Integers. IEEE Trans. Inf. Theory, 1975. 21(2):194--203. Google ScholarDigital Library
- Golomb, S.W., Run-length Encodings. IEEE Trans. Inf. Theory, 1966. 12(3):399--401. Google ScholarDigital Library
- Lemire, D., L. Boytsov, Decoding Billions of Integers per Second through Vectorization. Software: Prac. Exper.Google Scholar
- Moffat, A., L. Stuiver, Binary Interpolative Coding for Effective Index Compression. Inf. Ret., 2000. 3(1):25--47. Google ScholarDigital Library
- Scholer, F., H.E. Williams, J. Yiannis, J. Zobel. Compression of Inverted Indexes for Fast Query Evaluation. in SIGIR 2002, pp. 222--229 Google ScholarDigital Library
- Silvestri, F., R. Venturini, VSEncoding: Efficient Coding and Fast Decoding of Integer Lists via Dynamic Programming, in CIKM 2010, pp. 1219--1228. Google ScholarDigital Library
- Stepanov, A.A., A.R. Gangolli, D.E. Rose, R.J. Ernst, P.S. Oberoi, SIMD-based Decoding of Posting Lists, in CIKM 2011, pp. 317--326. Google ScholarDigital Library
- Trotman, A., Compressing Inverted Files. Inf Ret., 2003. 6(1):5--19. Google ScholarDigital Library
- Trotman, A., X.-F. Jia, M. Crane, Towards an Efficient and Effective Search Engine, in SIGIR 2012 Workshop on Open Source Information Retrieval. 2012. pp. 40--47.Google Scholar
- Williams, H.E., J. Zobel, Compressing Integers for Fast File Access. Computer Journal, 1999. 42(3):193--201.Google Scholar
- Zhang, J., X. Long, T. Suel, Performance of Compressed Inverted List Caching in Search Engines, in WWW 2008, pp. 387--396. Google ScholarDigital Library
- Zukowski, M., S. Heman, N. Nes, P. Boncz, Super-Scalar RAM-CPU Cache Compression, in ICDE 2006. Google ScholarDigital Library
Index Terms
- Compression, SIMD, and Postings Lists
Recommendations
In Vacuo and In Situ Evaluation of SIMD Codecs
ADCS '16: Proceedings of the 21st Australasian Document Computing SymposiumThe size of a search engine index and the time to search are inextricably related through the compression codec. This investigation examines this tradeoff using several relatively unexplored SIMD-based codecs including QMX, TurboPackV, and TurboPFor. It ...
SIMD-based decoding of posting lists
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementPowerful SIMD instructions in modern processors offer an opportunity for greater search performance. In this paper, we apply these instructions to decoding search engine posting lists. We start by exploring variable-length integer encoding formats used ...
Building a large instruction window through ROB compression
MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architectureCurrent processors require a large number of in-flight instructions in order to look for further parallelism and hide the increasing gap between memory latency and processor cycle time. These in-flight instructions are typically stored in centralized ...
Comments