skip to main content
10.1145/2682862.2682870acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
research-article

Compression, SIMD, and Postings Lists

Authors Info & Claims
Published:26 November 2014Publication History

ABSTRACT

The three generations of postings list compression strategies (Variable Byte Encoding, Word Aligned Codes, and SIMD Codecs) are examined in order to test whether or not each truly represented a generational change -- they do. Some weaknesses of the current SIMD-based schemes are identified and a new scheme, QMX, is introduced to address both space and decoding inefficiencies. Improvements are examined on multiple architectures and it is shown that different SSE implementations (Intel and AMD) perform differently.

References

  1. Anh, V.N., A. Moffat, Inverted Index Compression using Word-Aligned Binary Codes. Inf. Ret., 2005. 8(1):151--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anh, V.N., A. Moffat, Index compression using 64-bit words. Softw. Pract. Exper., 2010. 40(2):131--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Catena, M., C. Macdonald, I. Ounis, On Inverted Index Compression for Search Engine Efficiency, in ECIR 2014, pp. 359--371.Google ScholarGoogle Scholar
  4. Dean, J., Challenges in Building Large-scale Information Retrieval Systems: Invited Talk, in WSDM 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Elias, P., Universal Codeword Sets and the Representation of the Integers. IEEE Trans. Inf. Theory, 1975. 21(2):194--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Golomb, S.W., Run-length Encodings. IEEE Trans. Inf. Theory, 1966. 12(3):399--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lemire, D., L. Boytsov, Decoding Billions of Integers per Second through Vectorization. Software: Prac. Exper.Google ScholarGoogle Scholar
  8. Moffat, A., L. Stuiver, Binary Interpolative Coding for Effective Index Compression. Inf. Ret., 2000. 3(1):25--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Scholer, F., H.E. Williams, J. Yiannis, J. Zobel. Compression of Inverted Indexes for Fast Query Evaluation. in SIGIR 2002, pp. 222--229 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Silvestri, F., R. Venturini, VSEncoding: Efficient Coding and Fast Decoding of Integer Lists via Dynamic Programming, in CIKM 2010, pp. 1219--1228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stepanov, A.A., A.R. Gangolli, D.E. Rose, R.J. Ernst, P.S. Oberoi, SIMD-based Decoding of Posting Lists, in CIKM 2011, pp. 317--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Trotman, A., Compressing Inverted Files. Inf Ret., 2003. 6(1):5--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Trotman, A., X.-F. Jia, M. Crane, Towards an Efficient and Effective Search Engine, in SIGIR 2012 Workshop on Open Source Information Retrieval. 2012. pp. 40--47.Google ScholarGoogle Scholar
  14. Williams, H.E., J. Zobel, Compressing Integers for Fast File Access. Computer Journal, 1999. 42(3):193--201.Google ScholarGoogle Scholar
  15. Zhang, J., X. Long, T. Suel, Performance of Compressed Inverted List Caching in Search Engines, in WWW 2008, pp. 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zukowski, M., S. Heman, N. Nes, P. Boncz, Super-Scalar RAM-CPU Cache Compression, in ICDE 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compression, SIMD, and Postings Lists

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ADCS '14: Proceedings of the 19th Australasian Document Computing Symposium
        November 2014
        132 pages
        ISBN:9781450330008
        DOI:10.1145/2682862

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate30of57submissions,53%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader