Skip to main content

Budgeted Recommendation with Delayed Feedback

  • Conference paper
  • First Online:
Good Practices and New Perspectives in Information Systems and Technologies (WorldCIST 2024)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 987))

Included in the following conference series:

  • 351 Accesses

Abstract

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amuru, S., Buehrer, R.M.: Optimal jamming using delayed learning. In: 2014 IEEE Military Communications Conference, IEEE (2014), pp. 1528–1533 (2014)

    Google Scholar 

  2. Badanidiyuru, A., Langford, J., Slivkins, A.: Resourceful contextual bandits. In: Conference on Learning Theory, PMLR (2014), pp. 1109–1134 (2014)

    Google Scholar 

  3. Bastani, H., et al.: Efficient and targeted Covid-19 border testing via reinforcement learning. Nature 599(7883), 108–113 (2021)

    Article  Google Scholar 

  4. Bubeck, S., Cesa-Bianchi, N., Lugosi, G.: Bandits with heavy tail. IEEE Trans. Inf. Theory 59(11), 7711–7717 (2013)

    Article  MathSciNet  Google Scholar 

  5. Bubeck, S., Wang, T., Viswanathan, N.: Multiple identifications in multi-armed bandits. In: International Conference on Machine Learning, PMLR (2013), pp. 258–265 (2013)

    Google Scholar 

  6. Cesa-Bianchi, N., Gentile, C., Mansour, Y.: Nonstochastic bandits with composite anonymous feedback. In: Conference On Learning Theory, PMLR (2018), pp. 750–773 (2018)

    Google Scholar 

  7. Chapelle, O., Manavoglu, E., Rosales, R.: Simple and scalable response prediction for display advertising. ACM Trans. Intell. Syst. Technol. (TIST) 5(4), 1–34 (2014)

    Google Scholar 

  8. Chen, L., Xu, J.: Task replication for vehicular cloud: contextual combinatorial bandit with delayed feedback. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE (2019), pp. 748–756 (2019)

    Google Scholar 

  9. Gael, M.A., Vernade, C., Carpentier, A., Valko, M.: Stochastic bandits with arm-dependent delays. In: International Conference on Machine Learning, PMLR (2020), pp. 3348–3356 (2020)

    Google Scholar 

  10. Ghoorchian, S., Maghsudi, S.: Multi-armed bandit for energy-efficient and delay-sensitive edge computing in dynamic networks with uncertainty. IEEE Transactions on Cognitive Communications and Networking (2020)

    Google Scholar 

  11. Grover, A., et al.: Best arm identification in multi-armed bandits with delayed feedback. In: International Conference on Artificial Intelligence and Statistics, PMLR (2018), pp. 833–842 (2018)

    Google Scholar 

  12. Han, B., Gabor, J.: Contextual bandits for advertising budget allocation. In: Proceedings of the ADKDD, vol. 17 (2020)

    Google Scholar 

  13. Heidrich-Meisner, V., Igel, C.: Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 401–408 (2009)

    Google Scholar 

  14. Joulani, P., Gyorgy, A., Szepesvári, C.: Online learning under delayed feedback. In: International Conference on Machine Learning, PMLR (2013), pp. 1453–1461 (2013)

    Google Scholar 

  15. Thune, T.S., Cesa-Bianchi, N., Seldin, Y.: Nonstochastic multiarmed bandits with unrestricted delays. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R., eds. In: Advances in Neural Information Processing Systems. Vol. 32., Curran Associates, Inc. (2019)

    Google Scholar 

  16. Vernade, C., Cappé, O., Perchet, V.: Stochastic bandit models for delayed conversions. In: Conference on Uncertainty in Artificial Intelligence (2017)

    Google Scholar 

  17. Vernade, C., Carpentier, A., Lattimore, T., Zappella, G., Ermis, B., Brueckner, M.: Linear bandits with stochastic delayed feedback. In: International Conference on Machine Learning, PMLR, pp. 9712–9721 (2020)

    Google Scholar 

  18. Wu, H., Srikant, R., Liu, X., Jiang, C.: Algorithms with logarithmic or sublinear regret for constrained contextual bandits. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., eds.: Advances in Neural Information Processing Systems. Vol. 28., Curran Associates, Inc. (2015)

    Google Scholar 

  19. Zhou, Z., Xu, R., Blanchet, J.: Learning in generalized linear contextual bandits with stochastic delays. Adv. Neural. Inf. Process. Syst. 32, 5197–5208 (2019)

    Google Scholar 

Download references

Acknowledgement

The work of S.M. was supported by Grant 01IS20051 and Grant 16KISK035 from the German Federal Ministry of Education and Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kweiguu Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, K., Maghsudi, S., Yokoo, M. (2024). Budgeted Recommendation with Delayed Feedback. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A. (eds) Good Practices and New Perspectives in Information Systems and Technologies. WorldCIST 2024. Lecture Notes in Networks and Systems, vol 987. Springer, Cham. https://doi.org/10.1007/978-3-031-60221-4_20

Download citation

Publish with us

Policies and ethics