Budgeted Recommendation with Delayed Feedback

Liu, Kweiguu; Maghsudi, Setareh; Yokoo, Makoto

doi:10.1007/978-3-031-60221-4_20

Kweiguu Liu¹⁴,
Setareh Maghsudi¹⁵ &
Makoto Yokoo¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 987))

Included in the following conference series:

World Conference on Information Systems and Technologies

351 Accesses

Abstract

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Counterfactual contextual bandit for recommendation under delayed feedback

Article 09 May 2024

Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards

Linear Bandits in Unknown Environments

References

Amuru, S., Buehrer, R.M.: Optimal jamming using delayed learning. In: 2014 IEEE Military Communications Conference, IEEE (2014), pp. 1528–1533 (2014)
Google Scholar
Badanidiyuru, A., Langford, J., Slivkins, A.: Resourceful contextual bandits. In: Conference on Learning Theory, PMLR (2014), pp. 1109–1134 (2014)
Google Scholar
Bastani, H., et al.: Efficient and targeted Covid-19 border testing via reinforcement learning. Nature 599(7883), 108–113 (2021)
Article Google Scholar
Bubeck, S., Cesa-Bianchi, N., Lugosi, G.: Bandits with heavy tail. IEEE Trans. Inf. Theory 59(11), 7711–7717 (2013)
Article MathSciNet Google Scholar
Bubeck, S., Wang, T., Viswanathan, N.: Multiple identifications in multi-armed bandits. In: International Conference on Machine Learning, PMLR (2013), pp. 258–265 (2013)
Google Scholar
Cesa-Bianchi, N., Gentile, C., Mansour, Y.: Nonstochastic bandits with composite anonymous feedback. In: Conference On Learning Theory, PMLR (2018), pp. 750–773 (2018)
Google Scholar
Chapelle, O., Manavoglu, E., Rosales, R.: Simple and scalable response prediction for display advertising. ACM Trans. Intell. Syst. Technol. (TIST) 5(4), 1–34 (2014)
Google Scholar
Chen, L., Xu, J.: Task replication for vehicular cloud: contextual combinatorial bandit with delayed feedback. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE (2019), pp. 748–756 (2019)
Google Scholar
Gael, M.A., Vernade, C., Carpentier, A., Valko, M.: Stochastic bandits with arm-dependent delays. In: International Conference on Machine Learning, PMLR (2020), pp. 3348–3356 (2020)
Google Scholar
Ghoorchian, S., Maghsudi, S.: Multi-armed bandit for energy-efficient and delay-sensitive edge computing in dynamic networks with uncertainty. IEEE Transactions on Cognitive Communications and Networking (2020)
Google Scholar
Grover, A., et al.: Best arm identification in multi-armed bandits with delayed feedback. In: International Conference on Artificial Intelligence and Statistics, PMLR (2018), pp. 833–842 (2018)
Google Scholar
Han, B., Gabor, J.: Contextual bandits for advertising budget allocation. In: Proceedings of the ADKDD, vol. 17 (2020)
Google Scholar
Heidrich-Meisner, V., Igel, C.: Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 401–408 (2009)
Google Scholar
Joulani, P., Gyorgy, A., Szepesvári, C.: Online learning under delayed feedback. In: International Conference on Machine Learning, PMLR (2013), pp. 1453–1461 (2013)
Google Scholar
Thune, T.S., Cesa-Bianchi, N., Seldin, Y.: Nonstochastic multiarmed bandits with unrestricted delays. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R., eds. In: Advances in Neural Information Processing Systems. Vol. 32., Curran Associates, Inc. (2019)
Google Scholar
Vernade, C., Cappé, O., Perchet, V.: Stochastic bandit models for delayed conversions. In: Conference on Uncertainty in Artificial Intelligence (2017)
Google Scholar
Vernade, C., Carpentier, A., Lattimore, T., Zappella, G., Ermis, B., Brueckner, M.: Linear bandits with stochastic delayed feedback. In: International Conference on Machine Learning, PMLR, pp. 9712–9721 (2020)
Google Scholar
Wu, H., Srikant, R., Liu, X., Jiang, C.: Algorithms with logarithmic or sublinear regret for constrained contextual bandits. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., eds.: Advances in Neural Information Processing Systems. Vol. 28., Curran Associates, Inc. (2015)
Google Scholar
Zhou, Z., Xu, R., Blanchet, J.: Learning in generalized linear contextual bandits with stochastic delays. Adv. Neural. Inf. Process. Syst. 32, 5197–5208 (2019)
Google Scholar

Download references

Acknowledgement

The work of S.M. was supported by Grant 01IS20051 and Grant 16KISK035 from the German Federal Ministry of Education and Research.

Author information

Authors and Affiliations

Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, 819-0395, Japan
Kweiguu Liu & Makoto Yokoo
Faculty of Electrical Engineering and Information Technology, Ruhr-University Bochum, 44801, Bochum, Germany
Setareh Maghsudi

Authors

Kweiguu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Setareh Maghsudi
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Yokoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kweiguu Liu .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisbon, Portugal
Álvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira
Institute of Information Technology, Lodz University of Technology, Łódz, Poland
Aneta Poniszewska-Marańda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, K., Maghsudi, S., Yokoo, M. (2024). Budgeted Recommendation with Delayed Feedback. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A. (eds) Good Practices and New Perspectives in Information Systems and Technologies. WorldCIST 2024. Lecture Notes in Networks and Systems, vol 987. Springer, Cham. https://doi.org/10.1007/978-3-031-60221-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-60221-4_20
Published: 13 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60220-7
Online ISBN: 978-3-031-60221-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Budgeted Recommendation with Delayed Feedback