Enhanced FastPFOR for Inverted Index Compression

Domnic, S.; Glory, V.

doi:10.1007/978-3-319-12844-3_19

Enhanced FastPFOR for Inverted Index Compression

S. Domnic²² &
V. Glory²²

Conference paper

1425 Accesses
1 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Abstract

Information Retrieval System is facing enormous performance challenges due to the rapid growth of the size of the data in information retrieval applications and the increasing number of users for these applications. The performance of IRS has been improved by compressing inverted index, which is commonly used data structure for indexing in IRS. Inverted index compression has focused on reducing the index size for fast interactive searching. Among many latest compression techniques, the performance of FastPFOR is significantly good in inverted index compression. However, its compression performance is still to be improved. In this paper, we propose a new compression technique, called Enhanced FastPFOR, to enhance the performance of FastPFOR. In the proposed method, the Predictive coding, Elias-Fano coding, Hybrid coding (Predictive + Elias - Fano), Golomb Coding and Gamma coding are used to compress the positional values of the exceptions to improve the compression performance of FastPFOR. For performance evaluations, we have used TREC data collections in our experiments and the results show that that the proposed method could improve the compression and decompression significantly.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zobel, J., Moffat, A.: Inverted Files for Text Search Engines. ACM Computing Surveys 38(2), 1–56 (2006)
Article Google Scholar
Salomon, D.: Variable-length Codes for Data Compression. Springer (2007)
Google Scholar
Elias, P.: Universal Codeword Sets and Representations of the Integers. IEEE Trans. Inf. Theory 21(2), 194–203 (1975)
Article MATH MathSciNet Google Scholar
Golomb, S.W.: Run Length Encoding. IEEE Trans. Inf. Theory 12(3), 399–401 (1966)
Article MATH MathSciNet Google Scholar
Rice, R.F.: Some Practical Universal Noiseless Coding Techniques.Technical Report,pp. 79–22. JPL Publication Pasadena, CA: Jet Propulsion Laboratory (1979)
Google Scholar
Salomon, D.: Variable-length Codes for Data Compression. Springer (2007)
Google Scholar
Domnic, S., Glory, V.: Inverted File Compression using EGC and FEGC. In: Proc. ICCCS 2012, pp. 735–742 (2012)
Google Scholar
Glory, V., Domnic, S.: Re-Ordered FEGC and Block Based FEGC for Inverted File Compression. Int. J. Inf. Retr. Research 3(1), 71–88 (2013)
Google Scholar
Moffat, A., Stuiver, L.: Binary Interpolative Coding for Effective Index Compression. Inf. Retr. 3(1), 25–47 (2000)
Article Google Scholar
Elias, P.: Efficient Storage and Retrieval by Content and Address of Static Files. J. ACM 21(2), 246–260 (1974)
Article MATH MathSciNet Google Scholar
Fano, R.M.: On the Number of Bits Required to Implement an Associative Memory. Memorandum 61. Computer Structures Group, MIT, Cambridge, MA (1971)
Google Scholar
Anh, V.N., Moffat, A.: Inverted Index Compression using Word-Aligned Binary Codes. Inf. Retr. 8(1), 151–166 (2005)
Article Google Scholar
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing Relations and Indexes. In: Proc. ICDE 1998, pp. 370–379 (1998)
Google Scholar
Ng, W.K., Ravishankar, C.V.: Block-oriented Compression Techniques for Large Statistical Databases. IEEE Trans. Knowledge and Data Engineering 9(2), 314–328 (1997)
Article Google Scholar
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU Cache Compression. In: Proc. ICDE 2006, pp. 59–71 (2006)
Google Scholar
Yan, H., Ding, S., Suel, T.: Inverted Index Compression and Query Processing with Optimized Document Ordering. In: Proc. WWW 2009, pp. 401–410 (2009)
Google Scholar
Lemire, D., Boystov, L.: Decoding Billions of Integers per Second through Vectorization. Software: Practice and Experience (2013)
Google Scholar
Silvestri, F., Venturini, R.: VSEncoding: Efficient Coding and Fast Decoding of Integer Lists via Dynamic Programming. In: Proc. CIKM 2010, pp. 1219–1228 (2010)
Google Scholar
Zhang, J., Long, X., Suel, T.: Performance of Compressed Inverted List Caching in Search Engines. In: Proc. WWW 2008, pp. 387–396 (2008)
Google Scholar
FastPFOR Java Code (2013), https://github.com/lemire/JavaFastPFOR
Clueweb09 posting list data set, http://boytsov.info/datasets/clueweb09gap/

Download references

Author information

Authors and Affiliations

Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu, India
S. Domnic & V. Glory

Authors

S. Domnic
View author publications
You can also search for this author in PubMed Google Scholar
V. Glory
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Visual Informatic, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Azizah Jaafar
Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Nazlena Mohamad Ali
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Shahrul Azman Mohd Noah
Insight Centre for Data Analytics, Dublin City University, Glasnevin, 9, Dublin, Ireland
Alan F. Smeaton
Information Systems, Queensland University of Technology, 4001, Brisbane, QLD, Australia
Peter Bruza
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
Zainab Abu Bakar & Nursuriati Jamil &
Cyber Security Center, Universiti Pertahanan Nasional Malaysia, Kem Sungai Besi, 57000, Kuala Lumpur, Malaysia
Tengku Mohd Tengku Sembok

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Domnic, S., Glory, V. (2014). Enhanced FastPFOR for Inverted Index Compression. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-12844-3_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics