Article

Learning algorithms for online principal-agent problems (and selling goods online)

Authors:

Vincent Conitzer,

Nikesh GareraAuthors Info & Claims

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 209 - 216

https://doi.org/10.1145/1143844.1143871

Published: 25 June 2006 Publication History

Abstract

In a principal-agent problem, a principal seeks to motivate an agent to take a certain action beneficial to the principal, while spending as little as possible on the reward. This is complicated by the fact that the principal does not know the agent's utility function (or type). We study the online setting where at each round, the principal encounters a new agent, and the principal sets the rewards anew. At the end of each round, the principal only finds out the action that the agent took, but not his type. The principal must learn how to set the rewards optimally. We show that this setting generalizes the setting of selling a digital good online.We study and experimentally compare three main approaches to this problem. First, we show how to apply a standard bandit algorithm to this setting. Second, for the case where the distribution of agent types is fixed (but unknown to the principal), we introduce a new gradient ascent algorithm. Third, for the case where the distribution of agents' types is fixed, and the principal has a prior belief (distribution) over a limited class of type distributions, we study a Bayesian approach.

References

[1]

Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-arm bandit problem. FOCS (pp. 322--331).

Digital Library

[2]

Babaioff, M., Lavi, R., & Pavlov, E. (2005). Mechanism design for single-value domains. AAAI (pp. 241--247).

Digital Library

[3]

Bahar, G., & Tennenholtz, M. (2005). Sequential-simultaneous information elicitation in multi-agent systems. IJCAI (pp. 923--928).

Digital Library

[4]

Bar-Yossef, Z., Hildrum, K., & Wu, F. (2002). Incentive-compatible online auctions for digital goods. SODA (pp. 964--970).

Digital Library

[5]

Bartal, Y., Gonen, R., & Mura, P. L. (2004). Negotiation-range mechanisms: Exploring the limits of truthful efficient markets. ACM-EC (pp. 1--8).

Digital Library

[6]

Blum, A., Kumar, V., Rudra, A., & Wu, F. (2003). Online learning in online auctions. SODA (pp. 202--204).

Digital Library

[7]

Blumberg, A., & Shelat, A. (2004). Searching for stable mechanisms: Automated design for imperfect players. AAAI (pp. 8--13).

Digital Library

[8]

Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM, 44, 427--485.

Digital Library

[9]

Conitzer, V., & Sandholm, T. (2004). Self-interested automated mechanism design and implications for optimal combinatorial auctions. ACM-EC (pp. 132--141).

Digital Library

[10]

de Farias, D. P., & Megiddo, N. (2003). How to combine expert (or novice) advice when actions impact the environment? NIPS.

[11]

Kleinberg, R., & Leighton, T. (2003). The value of knowing a demand curve: Bounds on regret for on-line posted-price auctions. FOCS (pp. 594--605).

Digital Library

[12]

Mas-Colell, A., Whinston, M., & Green, J. R. (1995). Microeconomic theory. Oxford University Press.

[13]

Parkes, D., & Schoenebeck, G. (2004). GROWRANGE: Anytime VCG-based mechanisms. AAAI (pp. 34--41).

Digital Library

[14]

Porter, R. (2004). Mechanism design for online real-time scheduling. ACM-EC (pp. 61--70).

Digital Library

[15]

Smorodinsky, R., & Tennenholtz, M. (2004). Sequential information elicitation in multi-agent systems. UAI (pp. 528--535).

Digital Library

[16]

Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. ICML (pp. 928--936).

Cited By

Scheid ATiapkin DBoursier ECapitaine AMoulines EJordan MEl-Mhamdi EDurmus ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Incentivized learning in principal-agent bandit gamesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693846(43608-43631)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693846
Harris KPodimata CWu ZOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Strategic apple tastingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669622(79918-79945)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669622
Cohen ADeligkas AKoren M(2023)Learning approximately optimal contractsTheoretical Computer Science10.1016/j.tcs.2023.114219(114219)Online publication date: Sep-2023
https://doi.org/10.1016/j.tcs.2023.114219
Show More Cited By

Index Terms

Learning algorithms for online principal-agent problems (and selling goods online)
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Convex optimization
  2. Probability and statistics
    1. Statistical paradigms
      1. Statistical graphics
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Convex optimization

Recommendations

Online double auction mechanism for perishable goods

We investigate mechanism design for a spot market of perishable goods.We explain that failures of trading in the perishable goods damage social utility.We develop an online double auction that prioritizes time-critical bids.Multiagent simulations show ...
Bundling Decisions for Selling Multiple Items in Online Auctions
Fueled by the widespread use of the internet, more and more ordinary people have now become merchandise sellers who sell their own possessions, such as antique collections and limited souvenirs, to buyers who are interested in such goods via online ...
Bounding the optimal revenue of selling multiple goods

Using duality theory techniques we derive simple, closed-form formulas for bounding the optimal revenue of a monopolist selling many heterogeneous goods, in the case where the buyer's valuations for the items come i.i.d. from a uniform distribution and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '06: Proceedings of the 23rd international conference on Machine learning

June 2006

1154 pages

ISBN:1595933832

DOI:10.1145/1143844

Program Chairs:
William Cohen,
Andrew Moore

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
284
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Scheid ATiapkin DBoursier ECapitaine AMoulines EJordan MEl-Mhamdi EDurmus ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Incentivized learning in principal-agent bandit gamesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693846(43608-43631)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693846
Harris KPodimata CWu ZOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Strategic apple tastingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669622(79918-79945)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669622
Cohen ADeligkas AKoren M(2023)Learning approximately optimal contractsTheoretical Computer Science10.1016/j.tcs.2023.114219(114219)Online publication date: Sep-2023
https://doi.org/10.1016/j.tcs.2023.114219
Cohen ADeligkas AKoren M(2022)Learning Approximately Optimal ContractsAlgorithmic Game Theory10.1007/978-3-031-15714-1_19(331-346)Online publication date: 14-Sep-2022
https://doi.org/10.1007/978-3-031-15714-1_19
Kabra AXue YGomes CChen JMankoff JGomes C(2019)GPU-accelerated principal-agent game for scalable citizen scienceProceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies10.1145/3314344.3332495(165-173)Online publication date: 3-Jul-2019
https://dl.acm.org/doi/10.1145/3314344.3332495
Chhabra MDas SRyzhov I(2018)The promise and perils of myopia in dynamic pricing with censored informationProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304704(4994-5001)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304704
Xue YDavies IFink DWood CGomes CJonker CMarsella SThangarajah JTuyls K(2016)AvicachingProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937038(776-785)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2937038
Xue YDavies IFink DWood CGomes C(2016)Behavior Identification in Two-Stage Games for Incentivizing Citizen Science ExplorationPrinciples and Practice of Constraint Programming10.1007/978-3-319-44953-1_44(701-717)Online publication date: 23-Aug-2016
https://doi.org/10.1007/978-3-319-44953-1_44
Slivkins AVaughan J(2014)Online decision making in crowdsourcing marketsACM SIGecom Exchanges10.1145/2692359.269236412:2(4-23)Online publication date: 25-Nov-2014
https://dl.acm.org/doi/10.1145/2692359.2692364
Chhabra MDas SSonenberg LStone PTumer KYolum P(2011)Learning the demand curve in posted-price digital goods auctionsThe 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 110.5555/2030470.2030480(63-70)Online publication date: 2-May-2011
https://dl.acm.org/doi/10.5555/2030470.2030480
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents