research-article

Recipe-MPR: A Test Collection for Evaluating Multi-aspect Preference-based Natural Language Retrieval

Authors:

Parsa Farinneya,

Mohammad Mahdi Abdollah Pour,

Manasa Bharadwaj,

Ali Pesaranghader,

Scott SannerAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2744 - 2753

https://doi.org/10.1145/3539618.3591880

Published: 18 July 2023 Publication History

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Recipe-MPR: A Test Collection for Evaluating Multi-aspect Preference-based Natural Language Retrieval

Pages 2744 - 2753

Abstract
References

Abstract

The rise of interactive recommendation assistants has led to a novel domain of natural language (NL) recommendation that would benefit from improved multi-aspect reasoning to retrieve relevant items based on NL statements of preference. Such preference statements often involve multiple aspects, e.g., "I would like meat lasagna but I'm watching my weight". Unfortunately, progress in this domain is slowed by the lack of annotated data. To address this gap, we curate a novel dataset which captures logical reasoning over multi-aspect, NL preference-based queries and a set of multiple-choice, multi-aspect item descriptions. We focus on the recipe domain in which multi-aspect preferences are often encountered due to the complexity of the human diet. The goal of publishing our dataset is to provide a benchmark for joint progress in three key areas: 1) structured, multi-aspect NL reasoning with a variety of properties (e.g., level of specificity, presence of negation, and the need for commonsense, analogical, and/or temporal inference), 2) the ability of recommender systems to respond to NL preference utterances, and 3) explainable NL recommendation facilitated by aspect extraction and reasoning. We perform experiments using a variety of methods (sparse and dense retrieval, zero- and few-shot reasoning with large language models) in two settings: a monolithic setting which uses the full query and an aspect-based setting which isolates individual query aspects and aggregates the results. GPT-3 results in much stronger performance than other methods with 73% zero-shot accuracy and 83% few-shot accuracy in the monolithic setting. Aspect-based GPT-3, which facilitates structured explanations, also shows promise with 68% zero-shot accuracy. These results establish baselines for future research into explainable recommendations via multi-aspect preference-based NL reasoning.

References

[1]

Mohammad Mahdi Abdollah Pour, Parsa Farinneya, Armin Toroghi, Anton Korikov, Ali Pesaranghader, Touqir Sajed, Manasa Bharadwaj, Borislav Mavrin, and Scott Sanner. 2023. Self-supervised Contrastive BERT Fine-tuning for Fusion-Based Reviewed-Item Retrieval. In Advances in Information Retrieval: 45th European Conf. Information Retrieval, ECIR 2023, Dublin, Ireland, April 2-6, 2023, Proceedings, Part I. Springer, 3--17.

Digital Library

[2]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. "O'Reilly Media, Inc.".

Digital Library

[3]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.

[4]

Oana-Maria Camburu, Tim Rockt"aschel, Thomas Lukasiewicz, and Phil Blunsom. 2018. e-SNLI: Natural language inference with natural language explanations. Advances in Neural Information Processing Systems, Vol. 31 (2018).

[5]

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044 (2019).

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[7]

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. 2019. ERASER: A benchmark to evaluate rationalized NLP models. arXiv preprint arXiv:1911.03429 (2019).

[8]

Zuohui Fu, Yikun Xian, Yaxin Zhu, Yongfeng Zhang, and Gerard de Melo. 2020. COOKIE: A dataset for conversational recommendation over knowledge graphs in e-commerce. arXiv preprint arXiv:2008.09237 (2020).

[9]

Chongming Gao, Wenqiang Lei, Xiangnan He, Maarten de Rijke, and Tat-Seng Chua. 2021. Advances and challenges in conversational recommender systems: A survey. AI Open, Vol. 2 (2021), 100--126.

[10]

Jianfeng Gao, Chenyan Xiong, Paul Bennett, and Nick Craswell. 2022. Neural approaches to conversational information retrieval. arXiv preprint arXiv:2201.05176 (2022).

[11]

Braden Hancock, Antoine Bordes, Pierre-Emmanuel Mazare, and Jason Weston. 2019. Learning from dialogue after deployment: Feed yourself, chatbot! arXiv preprint arXiv:1901.05415 (2019).

[12]

Steven Haussmann, Oshani Seneviratne, Yu Chen, Yarden Ne'eman, James Codella, Ching-Hua Chen, Deborah L McGuinness, and Mohammed J Zaki. 2019. FoodKG: a semantics-driven knowledge graph for food recommendation. In Int'l Semantic Web Conf. Springer, 146--162.

[13]

Sebastian Hofst"atter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. 2021. Efficiently teaching an effective dense retriever with balanced topic aware sampling. In Proc. 44th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval. 113--122.

[14]

Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, and Rosalind Picard. 2019. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv preprint arXiv:1907.00456 (2019).

[15]

Chaitanya K Joshi, Fei Mi, and Boi Faltings. 2017. Personalization in goal-oriented dialog. arXiv preprint arXiv:1706.07503 (2017).

[16]

Changsung Kang, Xuanhui Wang, Yi Chang, and Belle Tseng. 2012. Learning to rank with multi-aspect relevance for vertical search. In Proc. Fifth ACM Int'l WSDM. 453--462.

Digital Library

[17]

Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences. In Proc. 2018 Conf. of the NAACL: Human Language Technologies, Vol. 1 (Long Papers). ACL, New Orleans, Louisiana, 252--262. https://doi.org/10.18653/v1/N18--1023

[18]

Weize Kong, Swaraj Khadanga, Cheng Li, Shaleen Kumar Gupta, Mingyang Zhang, Wensong Xu, and Michael Bendersky. 2022. Multi-Aspect Dense Retrieval. In Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining. 3178--3186.

Digital Library

[19]

Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016).

[20]

Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards Deep Conversational Recommendations. In Advances in Neural Information Processing Systems 31.

[21]

Shuokai Li, Ruobing Xie, Yongchun Zhu, Fuzhen Zhuang, Zhenwei Tang, Wayne Xin Zhao, and Qing He. 2022. Self-Supervised learning for Conversational Recommendation. Information Processing & Management, Vol. 59, 6 (2022), 103067.

Digital Library

[22]

Shuokai Li, Yongchun Zhu, Ruobing Xie, Zhenwei Tang, Zhao Zhang, Fuzhen Zhuang, Qing He, and Hui Xiong. 2023. Customized Conversational Recommender Systems. In Machine Learning and Knowledge Discovery in Databases: European Conf., ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part II. Springer, 740--756.

[23]

Shengnan Lyu, Arpit Rana, Scott Sanner, and Mohamed Reda Bouadjenek. 2021. A workflow analysis of context-driven conversational recommendation. In Proc. Web Conf. 2021. 866--877.

Digital Library

[24]

Bill MacCartney and Christopher D. Manning. 2008. Modeling Semantic Containment and Exclusion in Natural Language Inference. In Proc. 22nd Int'l Conf. Computational Linguistics. Coling 2008 Organizing Committee, Manchester, UK, 521--528. https://aclanthology.org/C08-1066

[25]

Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1M: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 1 (2019), 187--203.

[26]

Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proc. 7th ACM Conf. Recommender systems. 165--172.

Digital Library

[27]

Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, et al. 2022. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005 (2022).

[28]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022).

[29]

Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, and Kyunghyun Cho. 2019. Finding generalizable evidence by learning to convince Q&A models. arXiv preprint arXiv:1909.05863 (2019).

[30]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.

[31]

Filip Radlinski, Krisztian Balog, Bill Byrne, and Karthik Krishnamoorthi. 2019. Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences. In Proc. 20th Annual SIGdial Meeting on Discourse and Dialogue. 353--360.

[32]

Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Explain yourself! Leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361 (2019).

[33]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should I trust you?" Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD int'l Conf. knowledge discovery and data mining. 1135--1144.

[34]

Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, Vol. 3, 4 (2009), 333--389.

Digital Library

[35]

Anna Rogers, Matt Gardner, and Isabelle Augenstein. 2023. QA dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension. Comput. Surveys, Vol. 55, 10 (2023), 1--45.

Digital Library

[36]

Stuart J Russell. 2010. Artificial intelligence a modern approach. Pearson Education, Inc.

[37]

Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval. Vol. 39. Cambridge University Press Cambridge.

[38]

Sergey Volokhin, Joyce Ho, Oleg Rokhlenko, and Eugene Agichtein. 2021. You sound like someone who watches drama movies: Towards predicting movie preferences from conversational interactions. In Proc. 2021 Conf. of the NAACL: Human Language Technologies. 3091--3096.

[39]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, Vol. 32 (2019).

[40]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).

[41]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022).

[42]

Sanghyun Yi, Rahul Goel, Chandra Khatri, Alessandra Cervone, Tagyoung Chung, Behnam Hedayatnia, Anu Venkatesh, Raefer Gabriel, and Dilek Hakkani-Tur. 2019. Towards coherent and engaging spoken dialog response generation using automatic conversation evaluators. arXiv preprint arXiv:1904.13015 (2019).

[43]

Omar Zaidan and Jason Eisner. 2008. Modeling annotators: A generative approach to learning from annotator rationales. In Proc. 2008 Conf. EMNLP. 31--40.

[44]

Honghua Zhang, Liunian Harold Li, Tao Meng, Kai-Wei Chang, and Guy Van den Broeck. 2022a. On the paradox of learning to reason from data. arXiv preprint arXiv:2205.11502 (2022).

[45]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022b. OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).

[46]

Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang, and Ji-Rong Wen. 2020. Towards topic-guided conversational recommender system. arXiv preprint arXiv:2010.04125 (2020).

Cited By

Austin DKorikov AToroghi ASanner S(2024)Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference ElicitationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688142(74-83)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688142
Deldjoo YHe ZMcAuley JKorikov ASanner SRamisa AVidal RSathiamoorthy MKasirzadeh AMilano SBaeza-Yates RBonchi F(2024)A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671474(6448-6458)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671474
Kemper SCui JDicarlantonio KLin KTang DKorikov ASanner SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Retrieval-Augmented Conversational Recommendation with Prompt-based Semi-Structured Natural Language State TrackingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657670(2786-2790)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657670

Index Terms

Recipe-MPR: A Test Collection for Evaluating Multi-aspect Preference-based Natural Language Retrieval
1. Applied computing
  1. Document management and text processing
    1. Document searching
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Test collections

Recommendations

Natural Language Reasoning, A Survey
This survey article proposes a clearer view of Natural Language Reasoning (NLR) in the field of Natural Language Processing (NLP), both conceptually and practically. Conceptually, we provide a distinct definition for NLR in NLP, based on both philosophy ...
Experimental aspect-oriented language - AspectCOOL
SAC '02: Proceedings of the 2002 ACM symposium on Applied computing

Aspect-oriented programming (AOP) is a programming technique for modularizing concerns that crosscut the basic functionality of programs. In AOP, aspect languages are used to describe properties, which crosscut basic functionality, in a clean and a ...
Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation
RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Designing preference elicitation (PE) methodologies that can quickly ascertain a user’s top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

LG Electronics, Toronto AI Lab

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
194
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Austin DKorikov AToroghi ASanner S(2024)Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference ElicitationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688142(74-83)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688142
Deldjoo YHe ZMcAuley JKorikov ASanner SRamisa AVidal RSathiamoorthy MKasirzadeh AMilano SBaeza-Yates RBonchi F(2024)A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671474(6448-6458)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671474
Kemper SCui JDicarlantonio KLin KTang DKorikov ASanner SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Retrieval-Augmented Conversational Recommendation with Prompt-based Semi-Structured Natural Language State TrackingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657670(2786-2790)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657670

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten