ABSTRACT
In a Stackelberg Security Game, a defender commits to a randomized deployment of security resources, and an attacker best-responds by attacking a target that maximizes his utility. While algorithms for computing an optimal strategy for the defender to commit to have had a striking real-world impact, deployed applications require significant information about potential attackers, leading to inefficiencies. We address this problem via an online learning approach. We are interested in algorithms that prescribe a randomized strategy for the defender at each step against an adversarially chosen sequence of attackers, and obtain feedback on their choices (observing either the current attacker type or merely which target was attacked). We design no-regret algorithms whose regret (when compared to the best fixed strategy in hindsight) is polynomial in the parameters of the game, and sublinear in the number of times steps.
- An, B., Kempe, D., Kiekintveld, C., Shieh, E., Singh, S. P., Tambe, M., and Vorobeychik, Y. 2012. Security games with limited surveillance. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI). 1242--1248.Google Scholar
- Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of the 36th Symposium on Foundations of Computer Science (FOCS). 322--331. Google ScholarDigital Library
- Awerbuch, B. and Kleinberg, R. 2008. Online linear optimization and adaptive routing. Journal of Computer and System Sciences 74, 1, 97--114. Google ScholarDigital Library
- Awerbuch, B. and Mansour, Y. 2003. Adapting to a reliable network path. In Proceedings of the 22nd Annual Symposium on Principles of Distributed Computing (PODC). 360--367. Google ScholarDigital Library
- Blum, A., Haghtalab, N., and Procaccia, A. D. 2014a. Lazy defenders are almost optimal against diligent attackers. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI). 573--579.Google Scholar
- Blum, A., Haghtalab, N., and Procaccia, A. D. 2014b. Learning optimal commitment to overcome insecurity. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS). 1826--1834.Google Scholar
- Blum, A. and Mansour, Y. 2007. Learning, regret minimization, and equilibria. In Algorithmic Game Theory, N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, Eds. Cambridge University Press, Chapter 4.Google Scholar
- Bubeck, S. and Cesa-Bianchi, N. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR abs/1204.5721.Google Scholar
- Cesa-Bianchi, N., Mansour, Y., and Stoltz, G. 2007. Improved second-order bounds for prediction with expert advice. Machine Learning 66, 2--3, 321--352. Google ScholarDigital Library
- Conitzer, V. and Sandholm, T. 2006. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM Conference on Economics and Computation (EC). 82--90. Google ScholarDigital Library
- Jiang, A. X., Nguyen, T. H., Tambe, M., and Procaccia, A. D. 2013. Monotonic maximin: A robust Stackelberg solution against boundedly rational followers. In Proceedings of the 4th Conference on Decision and Game Theory for Security (GameSec). 119--139. Google ScholarDigital Library
- Kalai, A. and Vempala, S. 2005. Efficient algorithms for online decision problems. Journal of Computer and System Sciences 71, 3, 291--307. Google ScholarDigital Library
- Kiekintveld, C., Marecki, J., and Tambe, M. 2011. Approximation methods for infinite Bayesian Stackelberg games: Modeling distributional payoff uncertainty. In Proceedings of the 10th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). 1005--1012. Google ScholarDigital Library
- Korzhyk, D., Conitzer, V., and Parr, R. 2010. Complexity of computing optimal Stackelberg strategies in security resource allocation games. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI). 805--810.Google Scholar
- Letchford, J., Conitzer, V., and Munagala, K. 2009. Learning and approximating the optimal strategy to commit to. In Proceedings of the 2nd International Symposium on Algorithmic Game Theory (SAGT). 250--262. Google ScholarDigital Library
- Littlestone, N. and Warmuth, M. K. 1994. The weighted majority algorithm. Information and computation 108, 2, 212--261. Google ScholarDigital Library
- Marecki, J., Tesauro, G., and Segal, R. 2012. Playing repeated Stackelberg games with unknown opponents. In Proceedings of the 11th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). 821--828. Google ScholarDigital Library
- Pita, J., Jain, M., Tambe, M., Ordónez, F., and Kraus, S. 2010. Robust solutions to Stackelberg games: Addressing bounded rationality and limited observations in human cognition. Artificial Intelligence 174, 15, 1142--1171. Google ScholarDigital Library
- Tambe, M. 2012. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press. Google ScholarDigital Library
- Yang, R., Ford, B. J., Tambe, M., and Lemieux, A. 2014. Adaptive resource allocation for wildlife protection against illegal poachers. In Proceedings of the 13th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). 453--460. Google ScholarDigital Library
- Zinkevich, M. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML). 928--936.Google Scholar
Index Terms
- Commitment Without Regrets: Online Learning in Stackelberg Security Games
Recommendations
Selling to a No-Regret Buyer
EC '18: Proceedings of the 2018 ACM Conference on Economics and ComputationWe consider the problem of a single seller repeatedly selling a single item to a single buyer (specifically, the buyer has a value drawn fresh from known distribution D in every round). Prior work assumes that the buyer is fully rational and will ...
To Handle, to Learn and to Manipulate the Attacker's (Uncertain) Payoffs in Security Games: Doctoral Consortium
AAMAS '15: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent SystemsStackelberg security games (SSGs) are now established as a powerful tool in security domains. In order to compute the optimal strategy for the defender in SSG model, the defender needs to know the attacker's preferences over targets so that she can ...
Low-Regret Algorithms for Strategic Buyers with Unknown Valuations in Repeated Posted-Price Auctions
Machine Learning and Knowledge Discovery in DatabasesAbstractWe study repeated posted-price auctions where a single seller repeatedly interacts with a single buyer for a number of rounds. In previous works, it is common to consider that the buyer knows his own valuation with certainty. However, in many ...
Comments