Abstract
In this paper, a novel method for the discrete optimization problem is proposed based on the UCB algorithm. Definition of the neighborhood in the search space of the problem easily affects the performance of the existing algorithms because they do not well take into account the dilemma of exploitation and exploration. To optimize the balance of exploitation and exploration, we divide the search space into several grids to reconsider the discrete optimization problem as a Multi-Armed Bandit Problem, and therefore the UCB algorithm is directly introduced for the balancing. We proposed a UCB-grid area search and conducted numerical experiments on the 0-1 Knapsack Problem. Our method showed stable results in different environments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997)
Cesa-Bianchi, N.: Prediction, learning, and games. Cambridge University Press (2006)
Davis, L.: Handbook of Genetic Algorithm. Van Nostrand Renhold (1990)
Kennedy, J., Eberhart, R.: Particle Swarm Optimization. IEEE Int. Conf. on Neural Networks 4, 1942–1948 (1995)
Voss, S., Osman, I.H., Roucairol, C.: Meta-Heuristics: Advances and Trends in Local Search Paradigms for Optimization (1999)
Auer, P., Cesa-Bianchi, N., Fischer, P., Informatik, L.: Finite-time analysis of the multi-armed bandit problem. Machine Learning 47, 235–256 (2002)
Audibert, J.Y., Munos, R., Szepesvari, C.: Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science 410, 1876–1902 (2009)
Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research 11, 2785–2836 (2010)
Kocsis, L., Szepesvari, C.: Discounted UCB. In: 2nd PASCAL Challenges Workshop (2006)
Radlinski, F., Kleinberg, R., Joachims, T.: Learning diverse rankings with multi-armed bandits. In: The 25th International Conference on Machine Learning, pp. 784–791 (2008)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1), 4–22 (1985)
Agrawal, R.: Sample mean based index policies with o(log n) regret for the multiarmed bandit problem. Advances in Applied Mathematics 27, 1054–1078 (1995)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proc. of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331 (1995)
Sutton, R.S., Bart, A.G.: Generalization in Reinforcement Learning-An Introduction-. The MIT Press (1998)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004). https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Notsu, A., Saito, K., Nohara, Y., Ubukata, S., Honda, K. (2015). Proposal of Grid Area Search with UCB for Discrete Optimization Problem. In: Huynh, VN., Inuiguchi, M., Demoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2015. Lecture Notes in Computer Science(), vol 9376. Springer, Cham. https://doi.org/10.1007/978-3-319-25135-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-25135-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25134-9
Online ISBN: 978-3-319-25135-6
eBook Packages: Computer ScienceComputer Science (R0)