Optimizing partitioning strategies for faster inverted index compression

Song, Xingshen; Yang, Yuexiang; Jiang, Yu; Jiang, Kun

doi:10.1007/s11704-016-6252-5

Optimizing partitioning strategies for faster inverted index compression

Research Article
Published: 11 April 2019

Volume 13, pages 343–356, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Xingshen Song¹,
Yuexiang Yang¹,
Yu Jiang¹ &
…
Kun Jiang²

109 Accesses
3 Citations
Explore all metrics

Abstract

The inverted index is a key component for search engines to manage billions of documents and quickly respond to users’ queries.Whereas substantial effort has been devoted to reducing space occupancy and decoding speed, the encoding speed when constructing the index has been overlooked. Partitioning the index aligning to its clustered distribution can effectively minimize the compressed size while accelerating its construction procedure. In this study, we introduce compression speed as one criterion to evaluate compression techniques, and thoroughly analyze the performance of different partitioning strategies. Optimizations are also proposed to enhance state-of-the-art methods with faster compression speed and more flexibility to partition an index. Experiments show that our methods offer a much better compression speed, while retaining an excellent space occupancy and decompression speed. networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Optimizing Partitioning Strategies for Faster Inverted Index Compression

A Heuristically Optimized Partitioning Strategy on Elias-Fano Index

Speed Partitioning for Indexing Moving Objects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval, Vol. 1. Cambridge: Cambridge University Press, 2008
Book MATH Google Scholar
Witten I H, Moffat A, Bell T C. Managing Gigabytes: Compressing and Indexing Documents and Images. San Francisco, CA: Morgan Kaufmann, 1999
MATH Google Scholar
Zobel J, Moffat A. Inverted files for text search engines. ACM Computing Surveys, 2006, 38(2): 6
Article Google Scholar
Catena M, Macdonald C, Ounis I. On inverted index compression for search engine efficiency. In: Proceedings of European Conference on Information Retrieval. 2014, 359–371
Google Scholar
Lemire D, Boytsov L. Decoding billions of integers per second through vectorization. Software: Practice and Experience, 2015, 45(1): 1–29
Google Scholar
Ottaviano G, Tonellotto N, Venturini R. Optimal space-time tradeoffs for inverted indexes. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 2015, 47–56
Google Scholar
Silvestri F, Venturini R. Vsencoding: efficient coding and fast decoding of integer lists via dynamic programming. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1219–1228
Google Scholar
Yan H, Ding S, Suel T. Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 401–410
Google Scholar
Ottaviano G, Grossi R. Semi-indexing semi-structured data in tiny space. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 1485–1494
Google Scholar
Anh V N, Moffat A. Inverted index compression using word-aligned binary codes. Information Retrieval, 2005, 8(1): 151–166
Article Google Scholar
Anh V N, Moffat A. Index compression using 64-bit words. Software: Practice and Experience, 2010, 40(2): 131–147
Google Scholar
Anh V N, Moffat A. Index compression using fixed binary codewords. In: Proceedings of the 15th Australasian Database Conference. 2004, 61–67
Google Scholar
Delbru R, Campinas S, Tummarello G. Searching Web data: an entity retrieval and high-performance indexing model. Journal of Web Semantics, 2012, 10: 33–58
Article Google Scholar
Ottaviano G, Venturini R. Partitioned elias-fano indexes. In: Proceedings of the 37th international ACM SIGIR Conference on Research & Development in Information Retrieval. 2014, 273–282
Google Scholar
Ferragina P, Nitto I, Venturini R. On optimally partitioning a text to improve its compression. Algorithmica, 2011, 61(1): 51–74
Article MathSciNet MATH Google Scholar
Trotman A. Compression, SIMD, and postings lists. In: Proceedings of the Australasian Document Computing Symposium. 2014
Google Scholar
Ding S, Suel T. Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th international ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 993–1002
Google Scholar
Navarro G, Puglisi S J. Dual-sorted inverted lists. In: Proceedings of String Processing and Information Retrieval. 2010, 309–321
Chapter Google Scholar
Dimopoulos C, Nepomnyachiy S, Suel T. Optimizing top-k document retrieval strategies for block-max indexes. In: Proceedings of the 6th ACM International Conference onWeb Search and DataMining. 2013, 113–122
Google Scholar
Stepanov A A, Gangolli A R, Rose D E, Ernst R J, Oberoi P S. SIMDbased decoding of posting lists. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 317–326
Google Scholar
Zhao W X, Zhang X, Lemire D, Shan D, Nie J Y, Yan H F, Wen J R. A general SIMD-based approach to accelerating compression algorithms. ACM Transactions on Information Systems, 2015, 33(3): 15
Article Google Scholar
Goldstein J, Ramakrishnan R, Shaft U. Compressing relations and indexes. In: Proceedings of the 14th International Conference on Data Engineering. 1998, 370–379
Chapter Google Scholar
Boldi P, Vigna S. Compressed perfect embedded skip lists for quick inverted-index lookups. In: Proceedings of International Symposium on String Processing and Information Retrieval. 2005, 25–28
Chapter Google Scholar
Jonassen S, Bratsberg S E. Efficient compressed inverted index skipping for disjunctive text-queries. In: Proceedings of European Conference on Information Retrieval. 2011, 530–542
Google Scholar
Sacco G M. Fast block-compressed inverted lists. In: Proceedings of International Conference on Database and Expert Systems Applications. 2012, 412–421
Chapter Google Scholar
Culpepper J S, Moffat A. Efficient set intersection for inverted indexing. ACM Transactions on Information Systems, 2010, 29(1): 1
Article Google Scholar
Ao N Y, Zhang F, Wu D, Stones D S, Wang G, Liu X G, Liu J, Lin S. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment. 2011, 8(4): 470–481
Article Google Scholar
Lemire D, Boytsov L, Kurz N. SIMD Compression and the Intersection of Sorted Integers. Software: Practice and Experience, 2015
Google Scholar
Cormen T H, Leiserson C E, Rivest R L, Stein C. Introduction to Algorithms, Vol 3. Cambridge, MA: The MIT Press, 2009
MATH Google Scholar
Gog S, Venturini R. Succinct data structures in information retrieval: theory and practice. In: Proceedings of the 39th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2016, 1231–1233
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410000, China
Xingshen Song, Yuexiang Yang & Yu Jiang
School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710000, China
Kun Jiang

Authors

Xingshen Song
View author publications
Search author on:PubMed Google Scholar
Yuexiang Yang
View author publications
Search author on:PubMed Google Scholar
Yu Jiang
View author publications
Search author on:PubMed Google Scholar
Kun Jiang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Xingshen Song.

Additional information

Xingshen Song, a doctoral candidate, received his MS degree in remote sensing from the Aviation University of the Air Force, China in 2013. His research interests include data structures for search engines, inverted index compression, and query processing optimization.

Yuxiang Yang received his PhD degree in computer science from the National University of Defense Technology (NUDT), China in 2008. Currently, he is a professor of the College of Computer, NUDT. His main research fields include information retrieval, information security, and cloud computing.

Yu Jiang, a master candidate, received her BE degree in computer science and technology from Xi’an Jiaotong University, China in 2010. Her research interests include query processing optimization and data structures for search engines.

Kun Jiang received his PhD degree in computer science from the National University of Defense Technology, China in 2015. He is now a postdoctoral fellow in the School of the Electronic and Information Engineering, Xi’an Jiaotong University, China. His current research interests include information retrieval and machine learning.

Electronic supplementary material

Supplementary material, approximately 320 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, X., Yang, Y., Jiang, Y. et al. Optimizing partitioning strategies for faster inverted index compression. Front. Comput. Sci. 13, 343–356 (2019). https://doi.org/10.1007/s11704-016-6252-5

Download citation

Received: 02 May 2016
Accepted: 21 September 2016
Published: 11 April 2019
Issue Date: April 2019
DOI: https://doi.org/10.1007/s11704-016-6252-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing partitioning strategies for faster inverted index compression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Optimizing Partitioning Strategies for Faster Inverted Index Compression

A Heuristically Optimized Partitioning Strategy on Elias-Fano Index

Speed Partitioning for Indexing Moving Objects

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 320 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now