Skip to main content

Enhanced FastPFOR for Inverted Index Compression

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Abstract

Information Retrieval System is facing enormous performance challenges due to the rapid growth of the size of the data in information retrieval applications and the increasing number of users for these applications. The performance of IRS has been improved by compressing inverted index, which is commonly used data structure for indexing in IRS. Inverted index compression has focused on reducing the index size for fast interactive searching. Among many latest compression techniques, the performance of FastPFOR is significantly good in inverted index compression. However, its compression performance is still to be improved. In this paper, we propose a new compression technique, called Enhanced FastPFOR, to enhance the performance of FastPFOR. In the proposed method, the Predictive coding, Elias-Fano coding, Hybrid coding (Predictive + Elias - Fano), Golomb Coding and Gamma coding are used to compress the positional values of the exceptions to improve the compression performance of FastPFOR. For performance evaluations, we have used TREC data collections in our experiments and the results show that that the proposed method could improve the compression and decompression significantly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zobel, J., Moffat, A.: Inverted Files for Text Search Engines. ACM Computing Surveys 38(2), 1–56 (2006)

    Article  Google Scholar 

  2. Salomon, D.: Variable-length Codes for Data Compression. Springer (2007)

    Google Scholar 

  3. Elias, P.: Universal Codeword Sets and Representations of the Integers. IEEE Trans. Inf. Theory 21(2), 194–203 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  4. Golomb, S.W.: Run Length Encoding. IEEE Trans. Inf. Theory 12(3), 399–401 (1966)

    Article  MATH  MathSciNet  Google Scholar 

  5. Rice, R.F.: Some Practical Universal Noiseless Coding Techniques.Technical Report,pp. 79–22. JPL Publication Pasadena, CA: Jet Propulsion Laboratory (1979)

    Google Scholar 

  6. Salomon, D.: Variable-length Codes for Data Compression. Springer (2007)

    Google Scholar 

  7. Domnic, S., Glory, V.: Inverted File Compression using EGC and FEGC. In: Proc. ICCCS 2012, pp. 735–742 (2012)

    Google Scholar 

  8. Glory, V., Domnic, S.: Re-Ordered FEGC and Block Based FEGC for Inverted File Compression. Int. J. Inf. Retr. Research 3(1), 71–88 (2013)

    Google Scholar 

  9. Moffat, A., Stuiver, L.: Binary Interpolative Coding for Effective Index Compression. Inf. Retr. 3(1), 25–47 (2000)

    Article  Google Scholar 

  10. Elias, P.: Efficient Storage and Retrieval by Content and Address of Static Files. J. ACM 21(2), 246–260 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  11. Fano, R.M.: On the Number of Bits Required to Implement an Associative Memory. Memorandum 61. Computer Structures Group, MIT, Cambridge, MA (1971)

    Google Scholar 

  12. Anh, V.N., Moffat, A.: Inverted Index Compression using Word-Aligned Binary Codes. Inf. Retr. 8(1), 151–166 (2005)

    Article  Google Scholar 

  13. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing Relations and Indexes. In: Proc. ICDE 1998, pp. 370–379 (1998)

    Google Scholar 

  14. Ng, W.K., Ravishankar, C.V.: Block-oriented Compression Techniques for Large Statistical Databases. IEEE Trans. Knowledge and Data Engineering 9(2), 314–328 (1997)

    Article  Google Scholar 

  15. Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU Cache Compression. In: Proc. ICDE 2006, pp. 59–71 (2006)

    Google Scholar 

  16. Yan, H., Ding, S., Suel, T.: Inverted Index Compression and Query Processing with Optimized Document Ordering. In: Proc. WWW 2009, pp. 401–410 (2009)

    Google Scholar 

  17. Lemire, D., Boystov, L.: Decoding Billions of Integers per Second through Vectorization. Software: Practice and Experience (2013)

    Google Scholar 

  18. Silvestri, F., Venturini, R.: VSEncoding: Efficient Coding and Fast Decoding of Integer Lists via Dynamic Programming. In: Proc. CIKM 2010, pp. 1219–1228 (2010)

    Google Scholar 

  19. Zhang, J., Long, X., Suel, T.: Performance of Compressed Inverted List Caching in Search Engines. In: Proc. WWW 2008, pp. 387–396 (2008)

    Google Scholar 

  20. FastPFOR Java Code (2013), https://github.com/lemire/JavaFastPFOR

  21. Clueweb09 posting list data set, http://boytsov.info/datasets/clueweb09gap/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Domnic, S., Glory, V. (2014). Enhanced FastPFOR for Inverted Index Compression. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12844-3_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12843-6

  • Online ISBN: 978-3-319-12844-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics