skip to main content
10.1145/3340531.3412743acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Relevance Ranking for Real-Time Tweet Search

Published: 19 October 2020 Publication History

Abstract

Relevance ranking is a key component of many search engines, including the Tweet search engine at Twitter. Users often use Tweet search to discover live discussions and different voices on trending topics or recent events. Tweet search is thus unique due to its focus on real-time content, where both the retrieved content and queries change drastically on an hourly basis. Another important property of Tweet search is that its relevance ranking takes the social endorsements from other users into account, e.g., "likes" and "retweets", which is different from mainly relying on clicks as implicit feedback. The relevance ranking of Tweet search is also subject to strict latency constraints, because every second, a large amount of Tweets are posted and indexed, while tens of thousands of queries are issued to search posted Tweets. Considering the above properties and constraints, we present a relevance ranking system for Tweet search addressing all these challenges at Twitter. We first discuss the formation of the relevance ranking pipeline, which consists of a series of ranking models. We then present the methodology for training the models and the various groups of features we use, including real-time and personalized features. We also investigate approaches of achieving unbiased model training and building up automatic online tuning of system parameters. Experiments using online A/B testing demonstrate the effectiveness of the proposed approaches and we have deployed the proposed relevance ranking system in production for more than three years.

Supplementary Material

MP4 File (3340531.3412743.mp4)
Relevance ranking is a key component of the Tweet search engine at Twitter. Tweet search is unique in that it focuses on real-time content, where both the retrieved content and queries change drastically on an hourly basis. Another property of Tweet search is that it takes the social endorsements from users into account, which is different from relying on clicks as implicit feedback. The relevance ranking of Tweet search is also subject to strict latency constraints, which is due to the high volume of tweets and queries it receives. Considering the above properties and constraints, we present a relevance ranking system for Tweet search at Twitter. We first discuss the formation of the relevance ranking pipeline, and then the methodology for training models and the features we use. We also investigate approaches of achieving unbiased model training and building up automatic online tuning of system parameters. We have deployed this relevance ranking system in production for more than three years.

References

[1]
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. arXiv preprint arXiv:1804.05938 (2018).
[2]
Apache. 2019 a. Lucene. https://lucene.apache.org Retrieved August 13, 2019 from
[3]
Apache. 2019 b. Storm. https://storm.apache.org Retrieved August 13, 2019 from
[4]
Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36.
[5]
Michael Busch, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. 2012. Earlybird: Real-time search at twitter. In ICDE. 1360--1369.
[6]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et almbox. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.
[7]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. 191--198.
[8]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In WSDM. 87--94.
[9]
Bernard J Jansen, Danielle L Booth, and Amanda Spink. 2008. Determining the informational, navigational, and transactional intent of Web queries. Information Processing & Management, Vol. 44, 3 (2008), 1251--1266.
[10]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In WSDM. 781--789.
[11]
Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade Ranking for Operational E-commerce Search. In KDD. 1557--1565.
[12]
Benjamin M Marlin, Richard S Zemel, Sam Roweis, and Malcolm Slaney. 2007. Collaborative filtering and the missing at random assumption. In UAI. 267--275.
[13]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119.
[14]
Tom Minka and Stephen Robertson. 2008. Selection bias in the LETOR datasets. In SIGIR Workshop on Learning to Rank for Information Retrieval. 48--51.
[15]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE, Vol. 104, 1 (2015), 148--175.
[16]
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In NIPS. 2951--2959.
[17]
Masashi Sugiyama and Motoaki Kawanabe. 2012. Machine learning in non-stationary environments: Introduction to covariate shift adaptation. MIT press.
[18]
Adith Swaminathan and Thorsten Joachims. 2015a. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML. 814--823.
[19]
Adith Swaminathan and Thorsten Joachims. 2015b. The self-normalized estimator for counterfactual learning. In NIPS. 3231--3239.
[20]
Twitter. 2016a. Blog. https://blog.twitter.com/engineering/en_us/topics/insights/2016/moving-top-tweet-search-results-from-reverse-chronological-order-to-relevance-order.html Retrieved August 13, 2019 from
[21]
Twitter. 2016b. Nodes. https://github.com/twitter/nodes Retrieved August 13, 2019 from https://github.com/twitter/nodes
[22]
Emmanuel Vazquez and Julien Bect. 2010. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and inference, Vol. 140, 11 (2010), 3088--3095.
[23]
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learning to rank in personal search. In WSDM. 610--618.
[24]
Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox. 2016. Ranking relevance in yahoo search. In KDD. 323--332.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. large-scale ml system
  2. social network
  3. tweet search

Qualifiers

  • Research-article

Conference

CIKM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 347
    Total Downloads
  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media