skip to main content
10.1145/3121050.3121092acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
short-paper
Public Access

Quantization in Append-Only Collections

Published: 01 October 2017 Publication History

Abstract

Quantization, the pre-calculation and conversion to integers of term/document weights in an inverted index, is a well studied aspect of search engines that substantially improves retrieval efficiency. Previous work has considered the impact of quantization on effectiveness-efficiency tradeoffs in retrieval, for example, exploring the relationship between collection size and quantization range in static web collections. We extend previous work to append-only collections and examine whether quantization settings derived from prior time periods can be applied to future time periods. Experiments confirm that previous results generalize to a collection with different characteristics and with a different ranking function, and that in an append-only collection, we can use previous quantization settings in future time periods without substantial losses in either effectiveness or efficiency.

References

[1]
Vo Ngoc Anh, Owen de Kretser, and Alistair Moffat. 2001. Vector-Space Ranking with Effective Early Termination SIGIR. 35--42.
[2]
Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien 2003. Efficient Query Evaluation using a Two-Level Retrieval Process CIKM. 426--434.
[3]
Michael Busch, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. 2012. Earlybird: Real-time Search at Twitter. In ICDE. 1360--1369.
[4]
Matt Crane, J. Shane Culpepper, Jimmy Lin, Joel Mackenzie, and Andrew Trotman 2017. A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation WSDM. 201--210.
[5]
Matt Crane, Andrew Trotman, and Richard O'Keefe. 2013. Maintaining Discriminatory Power in Quantized Indexes CIKM. 1221--1224.
[6]
Lisa Huang. 2016. Moving Top Tweet Search Results from Reverse Chronological Order to Relevance Order. (19 Dec. 2016). Retrieved July 31, 2017 from https://blog.twitter.com/2016/moving-top-tweet-search-results-from-reverse-chronological-order-to-relevance-order
[7]
Jimmy Lin and Miles Efron 2013. Overview of the TREC-2013 Microblog Track. In TREC.
[8]
Jimmy Lin, Miles Efron, Yulu Wang, and Garrick Sherman. 2014. Overview of the TREC-2014 Microblog Track. In TREC.
[9]
Jimmy Lin and Andrew Trotman 2015. Anytime Ranking for Impact-Ordered Indexes. In ICTIR. 301--304.
[10]
Alistair Moffat, Justin Zobel, and Ron Sacks-Davis. 1994. Memory Efficient Ranking. IP&M, Vol. 30, 6 (1994), 733--744.
[11]
Jesus A. Rodriguez Perez, Andrew J. McMinn, and Joemon M. Jose. 2013. University of Glasgow (UoG_TwTeam) at TREC Microblog 2013 TREC.
[12]
Michael Persin, Justin Zobel, and Ron Sacks-Davis. 1996. Filtered Document Retrieval With Frequency-sorted Indexes. JASIS, Vol. 47, 10 (1996), 749--764.
[13]
Andrew Trotman, Xiang-Fei Jia, and Matt Crane 2012. Towards an Efficient and Effective Search Engine. OSIR Workshop.
[14]
Yulu Wang and Jimmy Lin 2014. The Impact of Future Term Statistics in Real-time Tweet Search ECIR. 567--572.
[15]
Yue Wang, Hao Wu, and Hui Fang 2014. An Exploration of Tie-breaking for Microblog Retrieval ECIR. 713--719.
[16]
Chengxiang Zhai and John Lafferty 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval CIKM. 334--342.

Cited By

View all
  • (2021)Cost-Effective Updating of Distributed Reordered IndexesProceedings of the 25th Australasian Document Computing Symposium10.1145/3503516.3503528(1-8)Online publication date: 9-Dec-2021
  • (2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval
October 2017
348 pages
ISBN:9781450344906
DOI:10.1145/3121050
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. impact-ordered indexes
  2. temporal partitioning
  3. tweets
  4. web crawls

Qualifiers

  • Short-paper

Funding Sources

Conference

ICTIR '17
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)14
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Cost-Effective Updating of Distributed Reordered IndexesProceedings of the 25th Australasian Document Computing Symposium10.1145/3503516.3503528(1-8)Online publication date: 9-Dec-2021
  • (2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media