Skip to main content
Log in

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Follow-the-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1 + ε)-scaled regret of order \(O(T^{\frac{2}{3}})\) for any small ε v> 0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which T is the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. György A, Linder T, Lugosi G, Ottucsák G. The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research, 2007, 8(10): 2369–2403

    MathSciNet  MATH  Google Scholar 

  2. Audibert J Y, Bubeck S, Lugosi G. Regret in online combinatorial optimization. Mathematics of Operations Research, 2013, 39(1): 31–45

    Article  MathSciNet  MATH  Google Scholar 

  3. Neu G, Bartók G. An efficient algorithm for learning with semi-bandit feedback. In: Prodeedings of International Conference on Algorithmic Learning Theory. 2013, 234–248

  4. Neu G, Bartók G. Importance weighting without importance weights: an efficient algorithm for combinatorial semi-bandits. Journal of Machine Learning Research, 2016, 17(154): 1–21

    MathSciNet  MATH  Google Scholar 

  5. Kalai A, Vempala S. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 2005, 71(3): 297–307

    Article  MathSciNet  MATH  Google Scholar 

  6. Wang W, Zhou Z H. Crowdsourcing label quality: a theoretical analysis. Science China Information Sciences, 2015, 58(11): 1–12

    MathSciNet  MATH  Google Scholar 

  7. Thompson W. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933, 25(3/4): 285–294

    Article  MATH  Google Scholar 

  8. Robbins H. Some aspects of the sequential design of experiments. Bulletin of the America Mathematical Society, 1952, 58(5): 527–535

    Article  MathSciNet  MATH  Google Scholar 

  9. Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multi-armed bandit problem. Machine Learning, 2002, 47(2–3): 235–256

    Article  MATH  Google Scholar 

  10. Auer P, Cesa-Bianchi N, Freund Y, Schapire R. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 2002, 32(1): 48–77

    Article  MathSciNet  MATH  Google Scholar 

  11. Gai Y, Krishnamachari B, Jain R. Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking, 2012, 20(5): 1466–1478

    Article  Google Scholar 

  12. Chen W, Wang Y, Yuan Y, Wang Q. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. Journal of Machine Learning Research, 2016, 17(50): 1–33

    MathSciNet  MATH  Google Scholar 

  13. Kveton B, Wen Z, Ashkan A, Szepesvari C. Tight regret bounds for stochastic combinatorial semi-bandits. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2015, 535–543

  14. Sankararaman K A, Slivkins A. Combinatorial semi-bandits with knapsacks. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2018, 1760–1770

  15. Kveton B, Wen Z, Ashkan A, Eydgahi H, Eriksson B. Matroid bandits: fast combinatorial optimization with learning. In: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence. 2014, 420–429

  16. Takimoto E, Warmuth M. Path kernels and multiplicative updates. Journal of Machine Learning Research, 2003, 4(5): 773–818

    MathSciNet  MATH  Google Scholar 

  17. Awerbuch B, Kleinberg R. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 45–53

  18. McMahan H B, Blum A. Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of International Conference on Computational Learning Theory. 2004, 109–123

  19. Dani V, Kakade S, Hayes T. The price of bandit information for online optimization. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. 2007, 345–352

  20. Abernethy J, Hazan E, Rakhlin A. Competing in the dark: an efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory. 2008, 263–273

  21. Cesa-Bianchi N, Lugosi G. Combinatorial bandits. Journal of Computer and System Sciences, 2012, 78(5): 1404–1422

    Article  MathSciNet  MATH  Google Scholar 

  22. Bubeck S, Cesa-Bianchi N, Kakade S. Towards minimax policies for online linear optimization with bandit feedback. In: Proceedings of Annual Conference on Learning Theory. 2012

  23. Combes R, Talebi M, Proutiere A, Lelarge M. Combinatorial bandits revisited. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems. 2015, 2116–2124

  24. Blum A. On-line algorithms in machine learning. In: Fiat A, Woeginger G L, eds. Online Algorithms. Berlin: Springer, 1998, 306–325

    Chapter  Google Scholar 

  25. Foster D, Vohra R. Regret in the on-line decision problem. Games and Economic Behavior, 1999, 29: 7–35

    Article  MathSciNet  MATH  Google Scholar 

  26. Cesa-Bianchi N, Lugosi G. Prediction, Learning, and Games. New York: Cambridge University Press, 2006

    Book  MATH  Google Scholar 

  27. Hannan J. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 1957, 3: 97–139

    MathSciNet  MATH  Google Scholar 

  28. Vazirani V. Approximation Algorithms. Berlin: Springer, 2001

    MATH  Google Scholar 

  29. Williamson D, Shmoys A. The Design of Approximation Algorithms. New York: Cambridge University Press, 2011

    Book  MATH  Google Scholar 

  30. Poland J. FPL analysis for adaptive bandits. In: Proceedings of International Symposium on Stochastic Algorithms. 2005, 58–69

  31. Kakade S, Kalai A, Ligett K. Playing games with approximation algorithms. SIAM Journal on Computing, 2009, 39(3): 1088–1106

    Article  MathSciNet  MATH  Google Scholar 

  32. Garber D. Efficient online linear optimization with approximation algorithms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 627–635

  33. Hazan E, Hu W, Li Y, Li Z. Online improper learning with an approximation oracle. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5652–5660

Download references

Acknowledgements

We would like to thank Weidong Ma from MSRA for the helpful discussion on approximation algorithm analysis.

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61832003, 61761136014, 61872334), the 973 Program of China (2016YFB1000201), and K.C.Wong Education Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoming Sun.

Additional information

Feidiao Yang is a doctoral student at Institute of Computing Technology, Chinese Academy of Sciences, China, supervised by Professor Xiaoming SUN. He mainly works on the theoretical aspect of machine learning, particular in the topic of online learning.

Wei Chen is a Principal Researcher at Microsoft Research Asia. He is also an Adjunct Professor at Institute of Interdisciplinary Information Sciences, Tsinghua University and an Adjunct Researcher at Institute of Computing Technology, Chinese Academy of Sciences, China. He is an IEEE Fellow and his main research interests include social and information networks, online learning, algorithmic game theory, Internet economics, distributed computing, and fault tolerance.

Jialin Zhang is an Associate Researcher and a doctoral supervisor at Institute of Computing Technology, Chinese Academy of Sciences, China. Her research topics include submodular optimization, approximation algorithms, online algorithms, quantum computing, and algorithmic game theory.

Xiaoming Sun is a Researcher at Institute of Computing Technology, Chinese Academy of Sciences, China, leading the group of Algorithm and Complexity. His research interests include algorithm and complexity, quantum computing, social network, and decision tree complexity.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, F., Chen, W., Zhang, J. et al. Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization. Front. Comput. Sci. 15, 155404 (2021). https://doi.org/10.1007/s11704-020-9519-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-9519-9

Keywords

Navigation