Abstract
The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multi-armed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to \(1/\sqrt{n}\) and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier bandit-algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than \(O(1/\sqrt{n})\). An extension to the so-called label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ε<1. Applications to routing in packet switched networks along with simulation results are also presented.
This research was supported in part by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences, the Mobile Innovation Center of Hungary, by the Natural Sciences and Engineering Research Council (NSERC) of Canada, and by the Hungarian Inter-University Center for Telecommunications and Informatics (ETIK).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)
Awerbuch, B., Holmer, D., Rubens, H., Kleinberg, R.: Provably competitive adaptive routing. In: Proceedings of IEEE INFOCOM 2005, vol. 1, pp. 631–641 (March 2005)
Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on the Theory of Computing, STOC 2004, Chicago, IL, USA, pp. 45–53. ACM Press, New York (2004)
Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics 6, 1–8 (1956)
Bousquet, O., Warmuth, M.K.: Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research 3, 363–396 (2002)
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., Warmuth, M.K.: How to use expert advice. Journal of the ACM 44(3), 427–485 (1997)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. IEEE Trans. Inform. Theory IT-51, 2152–2162 (2005)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
Gelenbe, E., Gellman, M., Lent, R., Liu, P., Su, P.: Autonomous smart routing for network QoS. In: Proceedings of First International Conference on Autonomic Computing, pp. 232–239. IEEE Computer Society, Los Alamitos (2004)
Gelenbe, E., Lent, R., Xhu, Z.: Measurement and performance of a cognitive packet network. Journal of Computer Networks 37, 691–701 (2001)
György, A., Linder, T., Lugosi, G.: Efficient algorithms and minimax bounds for zero-delay lossy source coding. IEEE Transactions on Signal Processing 52, 2337–2347 (2004)
György, A., Linder, T., Lugosi, G.: A “follow the perturbed leader”-type algorithm for zero-delay quantization of individual sequences. In: Proc. Data Compression Conference, Snowbird, UT, USA, pp. 342–351 (March 2004)
György, A., Linder, T., Lugosi, G.: Tracking the best of many experts. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 204–216. Springer, Heidelberg (2005)
György, A., Linder, T., Lugosi, G.: Tracking the best quantizer. In: Proceedings of the IEEE International Symposium on Information Theory, Adelaide, Australia,pp. 1163–1167 ( June-July 2005)
György, A., Ottucsák, G.: Adaptive routing using expert advice. The Computer Journal 49(2), 180–189 (2006)
Hannan, J.: Approximation to bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)
Helmbold, D.P., Schapire, R.E.: Predicting nearly as well as the best pruning of a decision tree. Machine Learning 27, 51–68 (1997)
Herbster, M., Warmuth, M.K.: Tracking the best expert. Machine Learning 32(2), 151–178 (1998)
Kalai, A.T., Vempala, S.S.: Efficient algorithms for online decision problems. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS, vol. 2777, pp. 26–40. Springer, Heidelberg (2003)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)
McMahan, H.B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 109–123. Springer, Heidelberg (2004)
Mohri, M.: General algebraic frameworks and algorithms for shortest distance problems. Technical Report 981219-10TM, AT&T Labs Research (1998)
Takimoto, E., Warmuth, M.K.: Path kernels and multiplicative updates. Journal of Machine Learning Research 4, 773–818 (2003)
Vovk, V.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, Rochester, NY, pp. 372–383. Morgan Kaufmann, San Francisco (1990)
Vovk, V.: Derandomizing stochastic prediction strategies. Machine Learning 35(3), 247–282 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
György, A., Linder, T., Ottucsák, G. (2006). The Shortest Path Problem Under Partial Monitoring. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_35
Download citation
DOI: https://doi.org/10.1007/11776420_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)