skip to main content
10.1145/3539618.3591797acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
abstract

Large-Scale Data Processing for Information Retrieval Applications

Published:18 July 2023Publication History

ABSTRACT

Developing Information Retrieval (IR) applications such as search engines and recommendation systems require training of models that are growing in complexity and size with immense collections of data that contain multiple dimensions (documents/items text, user profiles, and interactions). Much of the research in IR concentrates on improving the performance of ranking models; however, given the high training time and high computational resources required to improve the performance by designing new models, it is crucial to address efficiency aspects of the design and deployment of IR applications at large-scale. In my thesis, I aim to improve the training efficiency of IR applications and speed up the development phase of new models, by applying dataset distillation approaches to reduce the dataset size while preserving the ranking quality and employing efficient High-Performance Computing (HPC) solutions to increase the processing speed.

References

  1. Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool.Google ScholarGoogle Scholar
  2. Ian Foster. 1995. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman Publishing Co., Inc.Google ScholarGoogle Scholar
  3. Pooya Khandel, Ilya Markov, Andrew Yates, and Ana-Lucia Varbanescu. 2022. ParClick: A Scalable Algorithm for EM-Based Click Models (WWW '22).Google ScholarGoogle Scholar
  4. Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recommendation ((KDD '20)).Google ScholarGoogle Scholar
  5. Noveen Sachdeva, Mehak Preet Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022a. Infinite Recommendation Networks: A Data-Centric Approach. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  6. Noveen Sachdeva, Carole-Jean Wu, and Julian McAuley. 2022b. On Sampling Collaborative Filtering Datasets (WSDM '22).Google ScholarGoogle Scholar
  7. Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. 2018. Dataset Distillation. CoRR, Vol. abs/1811.10959 (2018). ioGoogle ScholarGoogle Scholar

Index Terms

  1. Large-Scale Data Processing for Information Retrieval Applications

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2023
        3567 pages
        ISBN:9781450394086
        DOI:10.1145/3539618

        Copyright © 2023 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2023

        Check for updates

        Qualifiers

        • abstract

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)68
        • Downloads (Last 6 weeks)14

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader