Abstract
This paper designs a strategic model used to optimize click-though rates (CTR) for profitable recommendation systems. Approximating a function from samples as a vital step of data prediction is desirable when ground truth is not directly accessible. While interpolation algorithms such as regression and non-kernel SVMs are prevalent in modern machine learning, they are, however, in many cases not proper options for fitting arbitrary functions with no closed-form expression. The major contribution of this paper consists of a semi-parametric graphical model complying with properties of the Gaussian Markov random field (GMRF) to approximate general functions that can be multivariate. Based upon model inference, this paper further investigates several policies commonly used in Bayesian optimization to solve the multi-armed bandit model (MAB) problem. The primary objective is to locate global optimum of an unknown function. In case of recommendation, the proposed algorithm leads to maximum user clicks from rescheduled recommendation policy while maintaining the lowest possible cost. Comparative experiments are conducted among a set of policies. Empirical evaluation suggests that Thompson sampling is the most suitable policy for the proposed algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Honda, J., Takemura, A.: An asymptotically optimal policy for finite support models in the multiarmed bandit problem. Mach. Learn. 85(3), 361–391 (2011)
Maes, F., Wehenkel, L., Ernst, D.: Learning to play K-armed bandit problems. In: Proceedings of ICAART (2012)
Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability. GMR:1051482. Chapman & Hall/CRC, Boca Raton (2005)
Srinivas, N., Krause, A., Kakade, S., Seeger, M.: Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of ICML (2010)
Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: Proceedings of ICML (2013)
Sui, Y., Gotovos, A., Burdick, J., Krause, A.: Safe exploration for optimization with Gaussian processes. In: Proceedings of ICML (2015)
Djolonga, J., Krause, A., Cevher, V.: High-dimensional Gaussian process bandits. In: Proceedings of NIPS (2013)
Wang, Z., Zhou, B., Jegelka, S.: Optimization as estimation with Gaussian processes in bandit settings. In: Proceedings of AISTATS (2016)
Desautels, T., Krause, A., Burdick, J.W.: Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. J. Mach. Learn. Res. 15, 3873–3923 (2014)
Wu, Y., György, A., Szepesvári, C.: Online learning with Gaussian payoffs and side observations. In: Proceedings of NIPS (2015)
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of WWW (2010)
Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theory 58(5), 3250–3265 (2012)
Vanchinathan, H.P., Nikolic, I., De Bona, F., Krause, A.: Explore-exploit in top-N recommender systems via Gaussian processes. In: Proceedings of RecSys (2014)
Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., Toussaint, M.: Safe exploration for active learning with Gaussian processes. In: Bifet, A., May, M., Zadrozny, B., Gavalda, R., Pedreschi, D., Bonchi, F., Cardoso, J., Spiliopoulou, M. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 133–149. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_9
Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Proceedings of NIPS (2011)
Zeng, C., Wang, Q., Mokhtari, S., Li, T.: Online context-aware recommendation with time varying multi-armed bandit. In: Proceedings of KDD (2016)
Nguyen, T.V., Karatzoglou, A., Baltrunas, L.: Gaussian process factorization machines for context-aware recommendations. In: Proceedings of SIGIR (2014)
Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in multi-armed bandits problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS (LNAI), vol. 5809, pp. 23–37. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04414-4_7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhao, C., Watanabe, K., Yang, B., Hirate, Y. (2018). Fast Converging Multi-armed Bandit Optimization Using Probabilistic Graphical Model. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-93037-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)