Skip to main content
Log in

Dominant strategy truthful, deterministic multi-armed bandit mechanisms with logarithmic regret for sponsored search auctions

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of Ω(T2/3), where T is the number of time steps. This happens because the existing mechanisms consider the worst case scenario where the means of the agents’ stochastic rewards are separated by a very small amount that depends on T. We make, and, exploit the crucial observation that in most scenarios, the separation between the agents’ rewards is rarely a function of T. Moreover, in the case that the rewards of the arms are arbitrarily close, the regret contributed by such sub-optimal arms is minimal. Our idea is to allow the center to indicate the resolution, Δ, with which the agents must be distinguished. This immediately leads us to introduce the notion of Δ-Regret. Using sponsored search auctions as a concrete example (the same idea applies for other applications as well), we propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. Remarkably, the proposed mechanism Δ-UCB achieves a Δ-regret of \(O(\log T)\) for the case of sponsored search auctions. We first establish the results for single slot sponsored search auctions and then non-trivially extend the results to the case where multiple slots are to be allocated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Agrawal S, Goyal N (2012) Analysis of thompson sampling for the multi-armed bandit problem. In: COLT, pp 39.1–39.26

  2. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach learn 47(2-3):235–256

    Article  Google Scholar 

  3. Babaioff M, Kleinberg RD, Slivkins A (2010) Truthful mechanisms with implicit payment computation. In: Proceedings of the Eleventh ACM conference on electronic commerce (EC’10), ACM, pp 43–52

  4. Babaioff M, Sharma Y, Slivkins A (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J Comput 43(1):194–230

    Article  MathSciNet  Google Scholar 

  5. Bhat S, Padmanabhan D, Jain S, Narahari Y (2016) A truthful mechanism with biparameter learning for online crowdsourcing: (extended abstract). In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems (AAMAS’16), Singapore, May 9-13, 2016, pp 1385–1386

  6. Biswas A, Jain S, Mandal D, Narahari Y (2015) A truthful budget feasible multi-armed bandit mechanism for crowdsourcing time critical tasks. In: Proceedings of the 2015 international conference on autonomous agents and multiagent systems (AAMAS’15), pp 1101–1109

  7. Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found Trends Mach Learn 5(1):1–122

    Article  Google Scholar 

  8. Bubeck S, Cesa-bianchi N, Lugosi G (2013) Bandits with heavy tail. IEEE Trans Inf Theory 59(11):7711–7717

    Article  MathSciNet  Google Scholar 

  9. Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. In: International conference on machine learning (ICML), pp 151–159

  10. Devanur NR, Kakade SM (2009) The price of truthfulness for pay-per-click auctions. In: Proceedings of the 10th ACM conference on electronic commerce (EC’09), pp 99–106

  11. Dirkx R, Dimitrakopoulos R (2018) Optimizing infill drilling decisions using multi-armed bandits: Application in a long-term, multi-element stockpile. Math Geosci 50(1):35–52

    Article  MathSciNet  Google Scholar 

  12. Feldman Z, Domshlak C (2014) Simple regret optimization in online planning for markov decision processes. J Artif Intell Res (JAIR) 51(1):165–205

    Article  MathSciNet  Google Scholar 

  13. Gatti N, Lazaric A, Rocco M, Trovò F (2015) Truthful learning mechanisms for multi-slot sponsored search auctions with externalities. Artif Intell 227:93–139

    Article  MathSciNet  Google Scholar 

  14. Gatti N, Lazaric A, Trovò F (2012) A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities. In: Proceedings of the 13th ACM conference on electronic commerce (EC’12), pp 605–622

  15. Ghalme Ganesh, Jain Shweta, Gujar Sujit, Narahari Y. (2017) Thompson sampling based mechanisms for stochastic multi-armed bandit problems. In: Proceedings of the 16th conference on autonomous agents and multiagent systems (AAMAS), pp 87– 95

  16. Gonen Rica, Pavlov Elan (2007) An incentive-compatible multi-armed bandit mechanism. In: Proceedings of the Twenty-sixth annual ACM symposium on principles of distributed computing (PODC), pp 362–363

  17. Gonen R, Pavlov E (2009) Adaptive incentive-compatible sponsored search auction. In: SOFSEM 2009: theory and practice of computer science, pp 303–316

  18. Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  Google Scholar 

  19. Jain S, Bhat S, Ghalme G, Padmanabhan D, Narahari Y (2016) Mechanisms with learning for stochastic multi-armed bandit problems. Indian J Pure Appl Math 47(2):229–272

    Article  MathSciNet  Google Scholar 

  20. Jain S, Ghalme G, Bhat S, Gujar S, Narahari Y (2016) A deterministic MAB mechanism for crowdsourcing with logarithmic regret and immediate payments. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems (AAMAS’16), Singapore, May 9-13, 2016, pp 86–94

  21. Jain S, Gujar S, Bhat S, Zoeter O, Narahari Y (2018) A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing. Artif Intell 254(Supplement C):44–63

    Article  MathSciNet  Google Scholar 

  22. Kapoor S, Patel KK, Kar P (2018) Corruption-tolerant bandit learning. Machine Learning, pp 1–29

  23. Kleinberg R, Niculescu-Mizil A, Sharma Y (2010) Regret bounds for sleeping experts and bandits. Mach Learn 80(2):245– 272

    Article  MathSciNet  Google Scholar 

  24. Liu Chang, Cai Qingpeng, Zhang Yukui (2017) Multi-armed bandit mechanism with private histories. In: Proceedings of the 16th conference on autonomous agents and MultiAgent systems (AAMAS), pp 1607–1609

  25. Myerson RB (1991) Game Theory: Analysis of Conflict, Harvard University Press, Cambridge

  26. Narahari Y. (2014) Game Theory and Mechanism Design. IISc Press and the World Scientific Publishing Company

  27. Nisan Noam, Ronen Amir (2007) Computationally feasible vcg mechanisms. J Artif Intell Rese (JAIR) 29(1):19–47

    Article  MathSciNet  Google Scholar 

  28. Nisan N, Roughgarden T, Tardos E, Vazirani VV (2007) Algorithmic Game Theory. Cambridge University Press, New York

    Book  Google Scholar 

  29. Santiago Ontanon (2017) Combinatorial multi-armed bandits for real-time strategy games. J Artif Intell Res (JAIR) 58:665–702

    Article  MathSciNet  Google Scholar 

  30. Padmanabhan D, Bhat S, Garg D, Shevade SK, Narahari Y (2016) A robust UCB scheme for active learning in regression from strategic crowds. In: International joint conference on neural networks, IJCNN 2016, pp 2212–2219

  31. Scott SL (2010) A modern bayesian look at the multi-armed bandit. Appl Stoch Model Bus Ind 26(6):639–658

    Article  MathSciNet  Google Scholar 

  32. Das Sharma A, Gujar S, Narahari Y (2012) Truthful multi-armed bandit mechanisms for multi-slot sponsored search auctions. Curr Sci 103(9):1064–1077

    Google Scholar 

  33. Vickrey W (1961) Counterspeculation, Auctions, and competitive sealed tenders. The Journal of Finance 16(1):8–37

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Divya Padmanabhan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Padmanabhan, D., Bhat, S., Prabuchandran, K.J. et al. Dominant strategy truthful, deterministic multi-armed bandit mechanisms with logarithmic regret for sponsored search auctions. Appl Intell 52, 3209–3226 (2022). https://doi.org/10.1007/s10489-021-02387-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02387-2

Keywords

Navigation