research-article

Open access

Vexation-Aware Active Learning for On-Menu Restaurant Dish Availability

Authors:

Jean-François Kagy,

Afshin Rostamizadeh,

Chris WeltyAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3116 - 3126

https://doi.org/10.1145/3534678.3539152

Published: 14 August 2022 Publication History

Abstract

Here we leverage the power of the crowd: online users who are willing to answer questions about dish availability at restaurants visited. While motivated users are happy to contribute knowledge, they are much less likely to respond to "silly'' or embarrassing questions (e.g., "DoesPizza Hut serve pizza?'' or "DoesMike's Vegan Restaurant serve steak?'')

In this paper, we study the problem of Vexation-Aware Active Learning (VAAL), where judiciously selected questions are targeted towards improving restaurant-dish model prediction, subject to a limit on the percentage of "unsure'' answers or "dismissals'' (e.g., swiping the app closed) measuring user vexation. We formalize the selection problem as an integer program and solve it efficiently using a distributed solution that scales linearly with the number of candidate questions. Since our algorithm relies on an accurate estimation of the unsure-dismiss rate (UDR), we present a regression model that provides high-quality results compared to baselines including collaborative filtering. Finally, we demonstrate in a live system that our proposed VAAL strategy performs competitively against classical (margin-based) active learning approaches while reducing the UDR for the questions being asked.

References

[1]

Omar Alonso, Catherine C. Marshall, and Marc Najork. 2013. A Human-Centered Framework for Ensuring Reliability on Crowdsourced Labeling Tasks. In Human Computation and Crowdsourcing: Works in Progress and Demonstration Abstracts, An Adjunct to the Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, November 7--9, 2013, Palm Springs, CA, USA (AAAI Workshops), Vol. WS-13--18. AAAI . http://www.aaai.org/ocs/index.php/HCOMP/HCOMP13/paper/view/7487

[2]

David Applegate, Mateo Díaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O'Donoghue, and Warren Schudy. 2022. Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient. arxiv: math.OC/2106.04756

[3]

Kalesha Bullard, Yannick Schroecker, and Sonia Chernova. 2019. Active Learning within Constrained Environments through Imitation of an Expert Questioner. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019, Sarit Kraus (Ed.). ijcai.org, 2045--2052. https://doi.org/10.24963/ijcai.2019/283

[4]

Antonin Chambolle and Thomas Pock. 2011. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. Journal of Mathematical Imaging and Vision, Vol. 40, 1 (2011), 120--145. http://dblp.uni-trier.de/db/journals/jmiv/jmiv40.html#ChambolleP11

Digital Library

[5]

Wei Chu, Martin Zinkevich, Lihong Li, Achint Thomas, and Belle L. Tseng. 2011. Unbiased online active learning in data streams. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21--24, 2011, Chid Apté, Joydeep Ghosh, and Padhraic Smyth (Eds.). ACM, 195--203. https://doi.org/10.1145/2020408.2020444

Digital Library

[6]

Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, and Sanjiv Kumar. 2021. Batch Active Learning at Scale. Advances in Neural Information Processing Systems, Vol. 34 (2021).

[7]

Peng Dai, Jeffrey M. Rzeszotarski, Praveen Paritosh, and Ed H. Chi. 2015. And Now for Something Completely Different: Improving Crowdsourcing Workflows with Micro-Diversions. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW 2015, Vancouver, BC, Canada, March 14 - 18, 2015, Dan Cosley, Andrea Forte, Luigina Ciolfi, and David McDonald (Eds.). ACM, 628--638. https://doi.org/10.1145/2675133.2675260

Digital Library

[8]

Pinar Donmez, Jaime G. Carbonell, and Paul N. Bennett. 2007. Dual Strategy Active Learning. In Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17--21, 2007, Proceedings (Lecture Notes in Computer Science), Joost N. Kok, Jacek Koronacki, Ramó n Ló pez de Má ntaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron (Eds.), Vol. 4701. Springer, 116--127. https://doi.org/10.1007/978--3--540--74958--5_14

[9]

Pinar Donmez, Jaime G Carbonell, and Jeff Schneider. 2009. Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 259--268.

Digital Library

[10]

Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou. 2010. Active Learning by Querying Informative and Representative Examples. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6--9 December 2010, Vancouver, British Columbia, Canada, John D. Lafferty, Christopher K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and Aron Culotta (Eds.). Curran Associates, Inc., 892--900. https://proceedings.neurips.cc/paper/2010/hash/5487315b1286f907165907aa8fc96619-Abstract.html

[11]

Sheng-Jun Huang, Jia-Lve Chen, Xin Mu, and Zhi-Hua Zhou. 2017. Cost-Effective Active Learning from Diverse Labelers. In IJCAI . 1879--1885.

[12]

Panagiotis G. Ipeirotis and Evgeniy Gabrilovich. 2014. Quizz: targeted crowdsourcing with a billion (potential) users. In 23rd International World Wide Web Conference, WWW '14, Seoul, Republic of Korea, April 7--11, 2014, Chin-Wan Chung, Andrei Z. Broder, Kyuseok Shim, and Torsten Suel (Eds.). ACM, 143--154. https://doi.org/10.1145/2566486.2567988

Digital Library

[13]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer, Vol. 42, 8 (Aug. 2009), 30--37.

Digital Library

[14]

Evgeny Krivosheev, Siarhei Bykau, Fabio Casati, and Sunil Prabhakar. 2020. Detecting and Preventing Confused Labels in Crowdsourced Data. Proc. VLDB Endow., Vol. 13, 11 (2020), 2522--2535. http://www.vldb.org/pvldb/vol13/p2522-krivosheev.pdf

Digital Library

[15]

Nikolaos Lagos, Salah Ait-Mokhtar, and Ioan Calapodescu. 2020. Point-Of-Interest Semantic Tag Completion in a Global Crowdsourced Search-and-Discovery Database. In ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020) (Frontiers in Artificial Intelligence and Applications), Giuseppe De Giacomo, Alejandro Catalá, Bistra Dilkina, Michela Milano, Sené n Barro, Alberto Bugar'i n, and Jé rô me Lang (Eds.), Vol. 325. IOS Press, 2993--3000. https://doi.org/10.3233/FAIA200474

[16]

Steffen Rendle, Walid Krichene, Li Zhang, and John R. Anderson. 2020. Neural Collaborative Filtering vs. Matrix Factorization Revisited. In RecSys 2020: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 22--26, 2020, Rodrygo L. T. Santos, Leandro Balby Marinho, Elizabeth M. Daly, Li Chen, Kim Falk, Noam Koenigstein, and Edleno Silva de Moura (Eds.). ACM, 240--248. https://doi.org/10.1145/3383313.3412488

Digital Library

[17]

Burr Settles. 2009. Active Learning Literature Survey . Computer Sciences Technical Report 1648. University of Wisconsin--Madison. http://axon.cs.byu.edu/ martinez/classes/778/Papers/settles.activelearning.pdf

[18]

Dominic Seyler, Mohamed Yahya, Klaus Berberich, and Omar Alonso. 2016. Automated question generation for quality control in human computation tasks. In Proceedings of the 8th ACM Conference on Web Science, WebSci 2016, Hannover, Germany, May 22--25, 2016, Wolfgang Nejdl, Wendy Hall, Paolo Parigi, and Steffen Staab (Eds.). ACM, 360--362. https://doi.org/10.1145/2908131.2908210

Digital Library

[19]

Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining . 614--622.

Digital Library

[20]

Luis von Ahn and Laura Dabbish. 2008. Designing games with a purpose. Commun. ACM, Vol. 51, 8 (2008), 58--67. https://doi.org/10.1145/1378704.1378719

Digital Library

[21]

Chris Welty, Lora Aroyo, Flip Korn, Sara McCarthy, and Shubin Zhao. 2021. Rapid Instance-Level Knowledge Acquisition for Google Maps from Class-Level Common Sense. In Proceedings of HCOMP-2021 . AAAI.

[22]

Chris Welty, Lora Aroyo, Flip Korn, Sara M. McCarthy, and Shubin Zhao. 2022. Addressing Label Sparsity with Class-Level Common Sense for Google Maps. Frontiers Artif. Intell., Vol. 5 (2022).

Index Terms

Vexation-Aware Active Learning for On-Menu Restaurant Dish Availability
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Active learning settings
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

A review and experimental analysis of active learning over crowdsourced data
Abstract
Training data creation is increasingly a key bottleneck for developing machine learning, especially for deep learning systems. Active learning provides a cost-effective means for creating training data by selecting the most informative instances ...
Active learning in multi-domain collaborative filtering recommender systems
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

The lack of information is an acute challenge in most recommender systems, especially for the collaborative filtering algorithms which utilize user-item rating matrix as the only source of information. Active learning can be used to remedy this problem ...
Personalized active learning for collaborative filtering
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Collaborative Filtering (CF) requires user-rated training examples for statistical inference about the preferences of new users. Active learning strategies identify the most informative set of training examples through minimum interactions with the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
351
Total Downloads

Downloads (Last 12 months)137
Downloads (Last 6 weeks)21

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten