Abstract
Information Retrieval System is facing enormous performance challenges due to the rapid growth of the size of the data in information retrieval applications and the increasing number of users for these applications. The performance of IRS has been improved by compressing inverted index, which is commonly used data structure for indexing in IRS. Inverted index compression has focused on reducing the index size for fast interactive searching. Among many latest compression techniques, the performance of FastPFOR is significantly good in inverted index compression. However, its compression performance is still to be improved. In this paper, we propose a new compression technique, called Enhanced FastPFOR, to enhance the performance of FastPFOR. In the proposed method, the Predictive coding, Elias-Fano coding, Hybrid coding (Predictive + Elias - Fano), Golomb Coding and Gamma coding are used to compress the positional values of the exceptions to improve the compression performance of FastPFOR. For performance evaluations, we have used TREC data collections in our experiments and the results show that that the proposed method could improve the compression and decompression significantly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zobel, J., Moffat, A.: Inverted Files for Text Search Engines. ACM Computing Surveys 38(2), 1–56 (2006)
Salomon, D.: Variable-length Codes for Data Compression. Springer (2007)
Elias, P.: Universal Codeword Sets and Representations of the Integers. IEEE Trans. Inf. Theory 21(2), 194–203 (1975)
Golomb, S.W.: Run Length Encoding. IEEE Trans. Inf. Theory 12(3), 399–401 (1966)
Rice, R.F.: Some Practical Universal Noiseless Coding Techniques.Technical Report,pp. 79–22. JPL Publication Pasadena, CA: Jet Propulsion Laboratory (1979)
Salomon, D.: Variable-length Codes for Data Compression. Springer (2007)
Domnic, S., Glory, V.: Inverted File Compression using EGC and FEGC. In: Proc. ICCCS 2012, pp. 735–742 (2012)
Glory, V., Domnic, S.: Re-Ordered FEGC and Block Based FEGC for Inverted File Compression. Int. J. Inf. Retr. Research 3(1), 71–88 (2013)
Moffat, A., Stuiver, L.: Binary Interpolative Coding for Effective Index Compression. Inf. Retr. 3(1), 25–47 (2000)
Elias, P.: Efficient Storage and Retrieval by Content and Address of Static Files. J. ACM 21(2), 246–260 (1974)
Fano, R.M.: On the Number of Bits Required to Implement an Associative Memory. Memorandum 61. Computer Structures Group, MIT, Cambridge, MA (1971)
Anh, V.N., Moffat, A.: Inverted Index Compression using Word-Aligned Binary Codes. Inf. Retr. 8(1), 151–166 (2005)
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing Relations and Indexes. In: Proc. ICDE 1998, pp. 370–379 (1998)
Ng, W.K., Ravishankar, C.V.: Block-oriented Compression Techniques for Large Statistical Databases. IEEE Trans. Knowledge and Data Engineering 9(2), 314–328 (1997)
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU Cache Compression. In: Proc. ICDE 2006, pp. 59–71 (2006)
Yan, H., Ding, S., Suel, T.: Inverted Index Compression and Query Processing with Optimized Document Ordering. In: Proc. WWW 2009, pp. 401–410 (2009)
Lemire, D., Boystov, L.: Decoding Billions of Integers per Second through Vectorization. Software: Practice and Experience (2013)
Silvestri, F., Venturini, R.: VSEncoding: Efficient Coding and Fast Decoding of Integer Lists via Dynamic Programming. In: Proc. CIKM 2010, pp. 1219–1228 (2010)
Zhang, J., Long, X., Suel, T.: Performance of Compressed Inverted List Caching in Search Engines. In: Proc. WWW 2008, pp. 387–396 (2008)
FastPFOR Java Code (2013), https://github.com/lemire/JavaFastPFOR
Clueweb09 posting list data set, http://boytsov.info/datasets/clueweb09gap/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Domnic, S., Glory, V. (2014). Enhanced FastPFOR for Inverted Index Compression. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-12844-3_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)