Abstract
We consider the problem of identifying the set of users in an organization’s network that are most susceptible to falling victim to social engineering attacks. To achieve this goal, we propose a testing strategy, based on the theory of multi-armed bandits, that involves a system administrator sending fake malicious messages to users in a sequence of unannounced tests and recording their responses. To accurately model the administrator’s testing problem, we propose a new bandit setting, termed the structured combinatorial multi-bandit model, that allows one to impose combinatorial constraints on the space of allowable queries. The model captures the diversity in attack types and user responses by considering multiple multi-armed bandits, where each bandit problem represents an attack (message) type and each arm represents a user. Users respond to test messages according to a response model with unknown statistics. The response model associates a Bernoulli distribution with an unknown mean with each message-user pair, dictating the likelihood that a user will respond to a given message. The administrator’s problem of identifying the most susceptible users can then be expressed as identifying the set of message-user pairs with means that exceed a given threshold. We adopt a Bayesian approach to solving the problem, associating a (beta) prior distribution with each unknown mean. In a given trial, the system administrator queries a selection of users with test messages, generating query responses which are then used to update posterior distributions on the means. By defining a state as the parameters of the posteriors, we show that the optimal testing strategy can be characterized as the solution of a Markov decision process (MDP). Unfortunately, solving the MDP is computationally intractable. As a result, we propose a heuristic testing strategy, based on Thompson sampling, that focuses queries on message-user pairs that are estimated to have means close to the threshold. The heuristic testing strategy is shown to yield accurate identifications.
This research was supported by the U.S. Office of Naval Research (ONR) MURI grant N00014-16-1-2710.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that thresholds can depend on the specific message-user pair (m, k); however, for ease of presentation, we assume identical thresholds \(\tau \) across all (m, k).
References
Anantharam, V., Varaiya, P., Walrand, J.: Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part I: IID rewards. IEEE Trans. Autom. Control 32(11), 968–976 (1987)
Audibert, J.Y., Bubeck, S., Munos, R.: Best arm identification in multi-armed bandits. In: Proceedings of the 23rd Annual Conference on Learning Theory, pp. 41–53 (2010)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in multi-armed bandits problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS (LNAI), vol. 5809, pp. 23–37. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04414-4_7
Bubeck, S., Wang, T., Viswanathan, N.: Multiple identifications in multi-armed bandits. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, pp. 258–265 (2013)
Bullée, J.W.H., Montoya, L., Pieters, W., Junger, M., Hartel, P.H.: The persuasion and security awareness experiment: reducing the success of social engineering attacks. J. Exp. Criminol. 11(1), 97–115 (2015)
Chen, S., Lin, T., King, I., Lyu, M.R., Chen, W.: Combinatorial pure exploration of multi-armed bandits. In: Advances in Neural Information Processing Systems, pp. 379–387 (2014)
Cialdini, R.B.: Influence: Science and Practice, vol. 4. Pearson Education, Boston (2009)
Crossler, R.E., et al.: Future directions for behavioral information security research. Comput. Secur. 32, 90–101 (2013)
Dodge Jr., R.C., Carver, C., Ferguson, A.J.: Phishing for user security awareness. Comput. Secur. 26(1), 73–80 (2007)
Frazier, P.I.: Learning with dynamic programming. In: Wiley Encyclopedia of Operations Research and Management Science, pp. 1–13. Wiley, New York (2010)
Gabillon, V., Ghavamzadeh, M., Lazaric, A., Bubeck, S.: Multi-bandit best arm identification. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2011)
Gittins, J., Glazebrook, K., Weber, R.: Multi-Armed Bandit Allocation Indices. Wiley, Hoboken (2011)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Heartfield, R., Loukas, G.: A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput. Surv. 48(3), 37:1–37:39 (2016)
Hoffman, M., Shahriari, B., Freitas, N.: On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In: Artificial Intelligence and Statistics, pp. 365–374 (2014)
Jun, K.S., Jamieson, K.G., Nowak, R.D., Zhu, X.: Top arm identification in multi-armed bandits with batch arm pulls. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 139–148 (2016)
Karp, D.B.: Normalized incomplete beta function: log-concavity in parameters and other properties. J. Math. Sci. 217(1), 91–107 (2016)
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Krebs, B.: Target hackers broke in via HVAC company. https://krebsonsecurity.com/2014/02/target-hackers-broke-in-via-hvac-company/. Accessed 05 Feb 2014
Krombholz, K., Hobel, H., Huber, M., Weippl, E.: Advanced social engineering attacks. J. Inf. Secur. Appl. 22, 113–122 (2015)
Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L.F., Hong, J.: Teaching Johnny not to fall for phish. ACM Trans. Internet Technol. 10(2), 7 (2010)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)
Locatelli, A., Gutzeit, M., Carpentier, A.: An optimal algorithm for the thresholding bandit problem. Proceedings of The 33rd International Conference on Machine Learning, pp. 1690–1698 (2016)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Reeves, J.: Yes, it’s bad. Robocalls, and their scams, are surging. https://www.nytimes.com/2018/05/06/your-money/robocalls-rise-illegal.html. Accessed 20 May 2018
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)
Russo, D.: Simple Bayesian algorithms for best arm identification. In: Conference on Learning Theory, pp. 1417–1418 (2016)
Schneier, B.: Inside risks: semantic network attacks. Commun. ACM 43(12), 168–168 (2000)
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)
Wu, Y., Gyorgy, A., Szepesvári, C.: On identifying good options under combinatorially structured feedback in finite noisy environments. In: International Conference on Machine Learning, pp. 1283–1291 (2015)
Zetter, K.: Inside the cunning, unprecedented hack of Ukraine’s power grid. https://www.wired.com/2016/03/inside-cunning-unprecedented-hack-ukraines-power-grid/. Accessed 03 Mar 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proof of Lemma 1
A Proof of Lemma 1
Denoting \(\mathbb {E}_n^\pi [J(\varTheta ,P;\tau )]:=E_{\varTheta \sim f_n(\theta _{mk})}[J(\varTheta ,P;\tau )\mid P=P^\pi ]\) as the expectation of the reward with respect to the posteriors \(f_n(\theta _{mk})\), application of the law of iterated expectations allows one to write the expected reward as \(\mathbb {E}_0^\pi [J(\varTheta ,P;\tau )] = \mathbb {E}_0^\pi [\mathbb {E}_n^\pi [J(\varTheta ,P;\tau )]]\), where
where \(I_{\tau }(\alpha ,\beta )\) is the normalized incomplete beta function (we have used the identity \(1-I_{\tau }(\alpha ,\beta ) \equiv I_{1-\tau }(\beta ,\alpha )\)). The dependency of the identification set P on the testing strategy \(\pi \) is made explicit by writing \(P^\pi \).
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Miehling, E., Xiao, B., Poovendran, R., Başar, T. (2018). A Bayesian Multi-armed Bandit Approach for Identifying Human Vulnerabilities. In: Bushnell, L., Poovendran, R., Başar, T. (eds) Decision and Game Theory for Security. GameSec 2018. Lecture Notes in Computer Science(), vol 11199. Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-01554-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01553-4
Online ISBN: 978-3-030-01554-1
eBook Packages: Computer ScienceComputer Science (R0)