skip to main content
10.1145/1498759.1498761acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
invited-talk

Challenges in building large-scale information retrieval systems: invited talk

Published: 09 February 2009 Publication History

Abstract

Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I will discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I will also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I will describe some future challenges and open research problems in this area.

Cited By

View all
  • (2024)SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation InstructionsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663439(1-9)Online publication date: 10-Jun-2024
  • (2024)Scalable Distributed Inverted List Indexes in Disaggregated MemoryProceedings of the ACM on Management of Data10.1145/36549742:3(1-27)Online publication date: 30-May-2024
  • (2024)Swap-Robust and Almost Supermagic Complete Graphs for Dynamical Distributed StorageIEEE Transactions on Information Theory10.1109/TIT.2024.340443170:8(5606-5623)Online publication date: 1-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
February 2009
314 pages
ISBN:9781605583907
DOI:10.1145/1498759
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. scalability
  2. search engines

Qualifiers

  • Invited-talk

Conference

WSDM'09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation InstructionsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663439(1-9)Online publication date: 10-Jun-2024
  • (2024)Scalable Distributed Inverted List Indexes in Disaggregated MemoryProceedings of the ACM on Management of Data10.1145/36549742:3(1-27)Online publication date: 30-May-2024
  • (2024)Swap-Robust and Almost Supermagic Complete Graphs for Dynamical Distributed StorageIEEE Transactions on Information Theory10.1109/TIT.2024.340443170:8(5606-5623)Online publication date: 1-Aug-2024
  • (2024)TMan: A High-Performance Trajectory Data Management System Based on Key-Value Stores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00376(4951-4964)Online publication date: 13-May-2024
  • (2023)AdANNSProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669458(76311-76335)Online publication date: 10-Dec-2023
  • (2023)Learning Query-aware Embedding Index for Improving E-commerce Dense RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591834(3265-3269)Online publication date: 19-Jul-2023
  • (2023)Searching images in a web archive2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA60987.2023.10302607(1-10)Online publication date: 9-Oct-2023
  • (2023)Efficient immediate-access dynamic indexingInformation Processing & Management10.1016/j.ipm.2022.10324860:3(103248)Online publication date: May-2023
  • (2023)Time series compression based on reinforcement learningInformation Sciences10.1016/j.ins.2023.119490648(119490)Online publication date: Nov-2023
  • (2023)FF-IREnvironmental Modelling & Software10.1016/j.envsoft.2023.105734167:COnline publication date: 1-Sep-2023
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media