ABSTRACT
Developing Information Retrieval (IR) applications such as search engines and recommendation systems require training of models that are growing in complexity and size with immense collections of data that contain multiple dimensions (documents/items text, user profiles, and interactions). Much of the research in IR concentrates on improving the performance of ranking models; however, given the high training time and high computational resources required to improve the performance by designing new models, it is crucial to address efficiency aspects of the design and deployment of IR applications at large-scale. In my thesis, I aim to improve the training efficiency of IR applications and speed up the development phase of new models, by applying dataset distillation approaches to reduce the dataset size while preserving the ranking quality and employing efficient High-Performance Computing (HPC) solutions to increase the processing speed.
- Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool.Google Scholar
- Ian Foster. 1995. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman Publishing Co., Inc.Google Scholar
- Pooya Khandel, Ilya Markov, Andrew Yates, and Ana-Lucia Varbanescu. 2022. ParClick: A Scalable Algorithm for EM-Based Click Models (WWW '22).Google Scholar
- Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recommendation ((KDD '20)).Google Scholar
- Noveen Sachdeva, Mehak Preet Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022a. Infinite Recommendation Networks: A Data-Centric Approach. In Advances in Neural Information Processing Systems.Google Scholar
- Noveen Sachdeva, Carole-Jean Wu, and Julian McAuley. 2022b. On Sampling Collaborative Filtering Datasets (WSDM '22).Google Scholar
- Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. 2018. Dataset Distillation. CoRR, Vol. abs/1811.10959 (2018). ioGoogle Scholar
Index Terms
- Large-Scale Data Processing for Information Retrieval Applications
Recommendations
Large-scale information retrieval experimentation with terrier
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementThis tutorial aims to provide a practical introduction to conducting large-scale information retrieval (IR) experiments, using Terrier (http://terrier.org) as an experimentation platform. Written in Java, Terrier provides an open-source, feature-rich, ...
High-Recall Information Retrieval from Linked Big Data
COMPSAC '15: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 02In the current era of big data, high volumes of valuable information are available in collections of documents, the web, social networks, and high varieties of linked data. To search and retrieve useful information from these linked data, users often ...
Incorporating rich features to boost information retrieval performance
Research highlights We propose a regression-based re-ranking framework that can take into account rich features for boosting information retrieval (IR) performance. A set of salient features that may affect IR performance are investigated. Extensive ...
Comments