abstract

Large-Scale Data Processing for Information Retrieval Applications

Author:

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Page 3489

https://doi.org/10.1145/3539618.3591797

Published: 18 July 2023 Publication History

Get Access

Abstract

Developing Information Retrieval (IR) applications such as search engines and recommendation systems require training of models that are growing in complexity and size with immense collections of data that contain multiple dimensions (documents/items text, user profiles, and interactions). Much of the research in IR concentrates on improving the performance of ranking models; however, given the high training time and high computational resources required to improve the performance by designing new models, it is crucial to address efficiency aspects of the design and deployment of IR applications at large-scale. In my thesis, I aim to improve the training efficiency of IR applications and speed up the development phase of new models, by applying dataset distillation approaches to reduce the dataset size while preserving the ranking quality and employing efficient High-Performance Computing (HPC) solutions to increase the processing speed.

References

[1]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool.

Google Scholar

[2]

Ian Foster. 1995. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman Publishing Co., Inc.

Google Scholar

[3]

Pooya Khandel, Ilya Markov, Andrew Yates, and Ana-Lucia Varbanescu. 2022. ParClick: A Scalable Algorithm for EM-Based Click Models (WWW '22).

Google Scholar

[4]

Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recommendation ((KDD '20)).

Google Scholar

[5]

Noveen Sachdeva, Mehak Preet Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022a. Infinite Recommendation Networks: A Data-Centric Approach. In Advances in Neural Information Processing Systems.

Google Scholar

[6]

Noveen Sachdeva, Carole-Jean Wu, and Julian McAuley. 2022b. On Sampling Collaborative Filtering Datasets (WSDM '22).

Google Scholar

[7]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. 2018. Dataset Distillation. CoRR, Vol. abs/1811.10959 (2018). io

Google Scholar

Index Terms

Large-Scale Data Processing for Information Retrieval Applications
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
2. Information systems
  1. Information retrieval

Recommendations

Large-scale information retrieval experimentation with terrier
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

This tutorial aims to provide a practical introduction to conducting large-scale information retrieval (IR) experiments, using Terrier (http://terrier.org) as an experimentation platform. Written in Java, Terrier provides an open-source, feature-rich, ...
High-Recall Information Retrieval from Linked Big Data
COMPSAC '15: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 02

In the current era of big data, high volumes of valuable information are available in collections of documents, the web, social networks, and high varieties of linked data. To search and retrieve useful information from these linked data, users often ...
Incorporating rich features to boost information retrieval performance

Research highlights We propose a regression-based re-ranking framework that can take into account rich features for boosting information retrieval (IR) performance. A set of salient features that may affect IR performance are investigated. Extensive ...

Comments

Information & Contributors

Information

Published In

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Check for updates

Author Tags

Qualifiers

Abstract

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
96
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

Large-scale information retrieval experimentation with terrier

High-Recall Information Retrieval from Linked Big Data

Incorporating rich features to boost information retrieval performance

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations