skip to main content
10.1145/1143844.1143871acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning algorithms for online principal-agent problems (and selling goods online)

Published: 25 June 2006 Publication History

Abstract

In a principal-agent problem, a principal seeks to motivate an agent to take a certain action beneficial to the principal, while spending as little as possible on the reward. This is complicated by the fact that the principal does not know the agent's utility function (or type). We study the online setting where at each round, the principal encounters a new agent, and the principal sets the rewards anew. At the end of each round, the principal only finds out the action that the agent took, but not his type. The principal must learn how to set the rewards optimally. We show that this setting generalizes the setting of selling a digital good online.We study and experimentally compare three main approaches to this problem. First, we show how to apply a standard bandit algorithm to this setting. Second, for the case where the distribution of agent types is fixed (but unknown to the principal), we introduce a new gradient ascent algorithm. Third, for the case where the distribution of agents' types is fixed, and the principal has a prior belief (distribution) over a limited class of type distributions, we study a Bayesian approach.

References

[1]
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-arm bandit problem. FOCS (pp. 322--331).
[2]
Babaioff, M., Lavi, R., & Pavlov, E. (2005). Mechanism design for single-value domains. AAAI (pp. 241--247).
[3]
Bahar, G., & Tennenholtz, M. (2005). Sequential-simultaneous information elicitation in multi-agent systems. IJCAI (pp. 923--928).
[4]
Bar-Yossef, Z., Hildrum, K., & Wu, F. (2002). Incentive-compatible online auctions for digital goods. SODA (pp. 964--970).
[5]
Bartal, Y., Gonen, R., & Mura, P. L. (2004). Negotiation-range mechanisms: Exploring the limits of truthful efficient markets. ACM-EC (pp. 1--8).
[6]
Blum, A., Kumar, V., Rudra, A., & Wu, F. (2003). Online learning in online auctions. SODA (pp. 202--204).
[7]
Blumberg, A., & Shelat, A. (2004). Searching for stable mechanisms: Automated design for imperfect players. AAAI (pp. 8--13).
[8]
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM, 44, 427--485.
[9]
Conitzer, V., & Sandholm, T. (2004). Self-interested automated mechanism design and implications for optimal combinatorial auctions. ACM-EC (pp. 132--141).
[10]
de Farias, D. P., & Megiddo, N. (2003). How to combine expert (or novice) advice when actions impact the environment? NIPS.
[11]
Kleinberg, R., & Leighton, T. (2003). The value of knowing a demand curve: Bounds on regret for on-line posted-price auctions. FOCS (pp. 594--605).
[12]
Mas-Colell, A., Whinston, M., & Green, J. R. (1995). Microeconomic theory. Oxford University Press.
[13]
Parkes, D., & Schoenebeck, G. (2004). GROWRANGE: Anytime VCG-based mechanisms. AAAI (pp. 34--41).
[14]
Porter, R. (2004). Mechanism design for online real-time scheduling. ACM-EC (pp. 61--70).
[15]
Smorodinsky, R., & Tennenholtz, M. (2004). Sequential information elicitation in multi-agent systems. UAI (pp. 528--535).
[16]
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. ICML (pp. 928--936).

Cited By

View all
  • (2024)Incentivized learning in principal-agent bandit gamesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693846(43608-43631)Online publication date: 21-Jul-2024
  • (2023)Strategic apple tastingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669622(79918-79945)Online publication date: 10-Dec-2023
  • (2023)Learning approximately optimal contractsTheoretical Computer Science10.1016/j.tcs.2023.114219(114219)Online publication date: Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;
Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Incentivized learning in principal-agent bandit gamesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693846(43608-43631)Online publication date: 21-Jul-2024
  • (2023)Strategic apple tastingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669622(79918-79945)Online publication date: 10-Dec-2023
  • (2023)Learning approximately optimal contractsTheoretical Computer Science10.1016/j.tcs.2023.114219(114219)Online publication date: Sep-2023
  • (2022)Learning Approximately Optimal ContractsAlgorithmic Game Theory10.1007/978-3-031-15714-1_19(331-346)Online publication date: 14-Sep-2022
  • (2019)GPU-accelerated principal-agent game for scalable citizen scienceProceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies10.1145/3314344.3332495(165-173)Online publication date: 3-Jul-2019
  • (2018)The promise and perils of myopia in dynamic pricing with censored informationProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304704(4994-5001)Online publication date: 13-Jul-2018
  • (2016)AvicachingProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937038(776-785)Online publication date: 9-May-2016
  • (2016)Behavior Identification in Two-Stage Games for Incentivizing Citizen Science ExplorationPrinciples and Practice of Constraint Programming10.1007/978-3-319-44953-1_44(701-717)Online publication date: 23-Aug-2016
  • (2014)Online decision making in crowdsourcing marketsACM SIGecom Exchanges10.1145/2692359.269236412:2(4-23)Online publication date: 25-Nov-2014
  • (2011)Learning the demand curve in posted-price digital goods auctionsThe 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 110.5555/2030470.2030480(63-70)Online publication date: 2-May-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media