Keywords

1 Introduction

The United States Department of Defense (DoD) defines cyberspace as a global domain consisting many different and often overlapping networks (Joint Pub 3-12 2013). Though many people equate cyberspace with the Internet, the latter is simply a subset of the former. Cyberspace, after all, includes many networks (e.g., classified intelligence networks) and systems that are not directly reachable from the Internet. Though it is difficult to characterize the nature of these other networks and systems that comprise cyberspace, we know a fair amount about the Internet. We know, for instance that it is the largest, most complex system ever built by humans. By some estimates, it consists of over 8 billion devices (Tung 2017) exchanging over 4 billion bytes of data every second (“Internet Live Stats” 2018). Cyberspace, by definition, is even bigger.

The size and speed of the Internet, coupled with its growth rate, prompted the development and application of artificial intelligence (AI) techniques for performing tasks that humans alone could no longer effectively do at this scope. It also spurred the creation of novel capabilities that take advantage of, and indeed require, the very large data sets that are available in cyberspace. The development of these techniques has been organic and, while enabling localized capabilities, has sometimes hindered other ones. In particular, we are concerned that some trends in both human and synthetic (i.e., AI-enabled) operator development are not supportive of effective human-machine teaming. In Sect. 2 of this paper, we provide a brief introduction to AI in general and to some of the specific concepts we’ll discuss later in the paper. On this foundation, we describe in Sect. 3 future threats that motivate the need for better human-machine teaming. We then describe advances in AI that allow the creation of synthetic cyberspace actors in Sect. 4. In Sect. 5, we address the human members of future cyberspace operations teams. Section 6 presents the need for AI that is explainable to humans as the foundation of trust in these teams. These human-machine teams of the future are described in Sect. 7. Finally, we offer our conclusions and recommended future work in Sect. 8.

2 A Brief Introduction to AI

AI is fundamentally concerned with machines that solve problems and make decisions or appear to think analogously to a human at some level of approximation. While there is no single definition for the term, there exist different classes of AI that allows us to formulate a tentative ontology, which we show in Fig. 1. A high-level bifurcation is possible by differentiating the approach used to represent information or knowledge. Symbolic approaches, as the name implies, use symbols (e.g., words) to represent the atomic components of thought and generally rely on some kind of semantic rules to process information. Alternately, non-symbolic approaches use numerical and often distributed representations (reflected by patterns of activity across numerous processing units).

Fig. 1.
figure 1

A general ontology of AI techniques

In symbolic approaches to AI, system developers model real-world concepts, their relationships and how they interact to solve a set of problems. This effort requires considerable knowledge of both the problem and solution domains, which makes it fairly labor-intensive. However, it yields results that are inherently explainable to humans since they are derived from human knowledge models in the first place. Symbolic AI systems include the expert systems that became prolific in the 1970s and 80s. These relied on extensive interviewing of subject matter experts and time-consuming encoding of their expertise in a series of conditional structures. An example of this approach is MYCIN, one of the first practical rule-based systems that was developed to help physicians select antimicrobial therapies (Shortliffe 2012). These early expert systems suffered from a fundamental inability to adapt or learn absent human intervention in updating the knowledge base.

Non-symbolic AI gained momentum after many in the AI community, disappointed with the limitations of symbolic approaches, looked to animal brains for inspiration. In Artificial Neural Networks (ANN) each node receives multiple inputs from other nodes, typically in the form of a real number, and produces one or more outputs that are the result of applying some function to those inputs. By applying weights to each connection and allowing those weights to be modified through a feedback loop, the ANN can be trained. There are many other non-symbolic approaches, such as probabilistic ones, that have been successfully applied to problem sets in which the knowledge engineering required in symbolic AI is not a feasible option.

Many modern AI systems are able to learn from experiences. Machine learning (ML) refers to techniques that allow AI systems to adapt to changing inputs and ideally improve their performance as a result. Though ML is equated with non-symbolic approaches, it is also possible for some symbolic AI systems to learn. The Soar cognitive architecture, for instance, is a symbolic production system capable of episodic learning. A Soar agent could achieve a goal through a circuitous series of intermediate steps, some of which will be successful. Over multiple experiences or episodes, the agent condenses these steps into a shorter, more effective and efficient chain. This process, called “chunking,” is one of the main ways in which Soar agents learn.

ML can take place with or without human help. In supervised ML, the system is presented with inputs and must then produce an output, typically in terms of a classification (e.g., an email message is or is not spam). If the output is correctly classified, the system receives positive reinforcement; otherwise, it may receive negative reinforcement. The learning or, more accurately, training process can be automated by using labeled training data sets. If you remove the human from the process and don’t use labeled data, a system can still learn through reinforcement learning. In this form of ML, the system interacts with its environment in a sequence of observation-action pairs where a reward is presented after each action. Using this approach, a system could learn how to efficiently route network packets using as rewards the inverse of the number of hops required. The key requirement for these ML approaches is a feedback process that allows the agent to determine when its decisions are correct. This process can be artificial (e.g., tagged data sets) or natural (e.g., observing the behavior of routed packets).

3 Future Threats

The ability of AI to make increasingly complex decisions much faster than humans could, all the while learning from its experiences has already delivered many benefits in the service of humanity. The same capabilities, however, can cause unexpected and undesirable effects as Microsoft learned when it developed chatbot that learned to compose racially and sexually offensive tweets (Metz 2016) from its interactions with thousands of people. Perhaps more concerning are scenarios in which AI systems are intentionally developed and deployed to cause harm. It is, after all, logical to assume that malicious cyberspace actors will leverage emerging technologies for their own purposes.

If an attacker is using AI to operate at machine speed, the defender must be able to work at least as quickly in order to be effective. This idea of synthetic agents attacking and defending information systems with no humans in the decision-making loop inspired the Defense Advanced Research Projects Agency (DARPA) Cyber Grand Challenge (CGC), which brought together seven finalists to Las Vegas, Nevada in August of 2016. The goal was for these cyber reasoning systems (CRS) to perform automated vulnerability detection, exploit generation, software patching, and to determine when it would be most advantageous to patch a vulnerability or exploit it on a competing team’s CRS without human intervention (Brooks 2017). The message is clear: in the future of cyberspace both attackers and defenders will, at least partially, be autonomous agents. In fact, the leader of the winning team, David Brumley, founded the company For All Secure to take autonomous vulnerability detection (and potentially patching) to market.

It is not only machines who will be threatened by autonomous agents. Many security experts anticipate a new breed of phishing emails generated by ML algorithms that will be much more targeted, compelling and effective than human-generated ones (Emmanuel 2017). One of the reasons why these messages will be more threatening is that they will leverage the ability of data analytics and ML to scour vast data sources for information with which to precisely target individuals at scale. The U.S. Army already identified this micro-targeting trend as a feature of future wars (Kott et al. 2015) for which our current counter-measures are ineffective.

The Social Network Automated Phishing with Reconnaissance (SNAP_R) system (Seymour and Tully 2016) demonstrated a recurrent neural network (RNN) that is able to tweet phishing messages that target specific users. During a limited experiment, SNAP_R was four times faster than humans at sending out targeted attacks while achieving an order of magnitude improvement in the target click rate. One year later, DARPA announced its Active Social Engineering Defense (ASED) program, aimed at autonomously identifying, disrupting, and investigating social engineering attacks. The very existence of ASED underscores the difficulty and long-term significance that DARPA attributes to this threat.

Finally, as AI in general and ML in particular become increasingly important in our lives, adversaries will develop attacks aimed directly at the ML mechanisms that are designed to improve and defend our lives. Adversarial ML (AML) is an emerging field of study concerned with attacks against online ML algorithms (Huang et al. 2011). Early research has shown that ML classifiers are susceptible to three types of attacks (Papernot et al. 2016). Confidentiality attacks entail gaining information on data used to train the ML system (Shokri et al. 2017), its internal model (i.e. weights), or architectural (i.e. learning rate) parameters. Integrity attacks attempt to modify input to the ML classifier in order to induce a particular output or behavior, such as causing an image recognition system to misclassify a 2 as a 9 by modifying a few image pixels (Carlini and Wagner 2017). Availability attacks attempt to deny access to the ML classifier such as by generating numerous false positives. An ML-based IDS/IPS, for example, is vulnerable to such attacks. As AML techniques mature, malicious actors will employ them to manipulate the outputs of intelligent systems.

4 Synthetic Actors

Against this backdrop of technological opportunities and threats, research and development of autonomous synthetic actors proceeds apace. Though much work to date has focused on applications of ML to the detection and mitigation of cybersecurity incidents, research is also taking place towards the development of more robust defensive agents that can hunt for and neutralize threats on their networks. As we mentioned in our threat discussion, we are seeing similar moves in the development of attack capabilities. In fact, one of the noteworthy aspects of DARPA’s CGC was that it demonstrated the feasibility (and, one might argue, inevitability) of autonomous offensive and defensive agents fighting against each other with humans out of the loop at least in some cases. While we have not yet seen documented cases of autonomous synthetic attackers conducting real operations, many think that these incidents are not too far in the future (Dvorsky 2017). Indeed, SoarTech has already demonstrated cognitive agents that can perform defensive and offensive (e.g., penetration testing) activities in virtualized environments.

SoarTech’s Simulated Cognitive Cyber Red-team Attacker Model (SC2RAM) is a synthetic, offensive, cognitive agent that emulates real attackers by modeling the complex thoughts, decision-making, and contextual understanding of a human interactive operator. Its goal-seeking behavior results in a virtually unlimited range of realistic attacks. The current attacker agent, built on the Soar Cognitive Architecture (a symbolic AI platform) can conduct multiple attacks including phishing with malicious documents, remote exploitation, and SQL injection. A custom remote access toolkit developed for this project provides additional persistent on-target capabilities such as lateral movement and file exfiltration, providing a realistic experience for training network defenders. The premise of red teaming and penetration testing, exemplified by SC2RAM, is that it is better to test one’s own defenses against realistic but benign attackers than it is to wait until the real adversaries do so. Since human penetration testers are rare and expensive experts, it is logical to leverage synthetic agents in this manner.

It also makes sense to employ such agents when the scale of a problem requires a very large number of interactions. Much of the research at the intersection of cybersecurity and AI uses non-symbolic approaches. Some of the first successful applications of ML to cybersecurity were in classification of spam email messages (Cohen 1996). Over the last two decades, these approaches have become remarkably accurate. Today’s ubiquitous spam filters improve their performance through interaction with the humans whose inboxes they protect. When the agent misclassifies a message, the human has an opportunity to correct the error thus allowing the ML system to learn to improve itself.

Given the role of these agents as first lines of defense for end points, much research is needed in identifying vulnerabilities to AML in systems such as these spam filters or the newer breeds of antimalware products that use ML to detect malicious software. Here one could utilize machine learning techniques to make inferences on the training set of another machine learning classifier in order to manipulate inputs to generate desired outputs. For example, given an ML system that classifies software as benign or malicious (e.g., an anti-malware application), one could imagine another system that generates multiple variants of malware, each with small perturbations that don’t affect its functionality. These variants could be sent to the classifier until it incorrectly decides that the malware sample is benign. Given enough such misclassified samples, the AML system can make inferences about what it takes to fool the defender. This AML versus ML assessment could serve to harden network security applications by evaluating the robustness of an already trained model, particularly when the internal classifier parameters are unknown. Since this sort of assessments require many thousands or millions of attempts to characterize the system under test, synthetic agents would be well-suited to perform them.

Despite their ability to analyze vast amounts of information, non-symbolic approaches like those in use for spam, malware and intrusion detection are less effective at reasoning over the context and meaning of cyberspace activities. They are ideally suited to answer the questions of what and even the how, but not the why. Symbolic approaches, such as rule-based systems, on the other hand, are oftentimes better for this purpose because they model higher-level cognitive processes and human expertise. A promising area of research for more effective synthetic cyberspace actors is the integration of symbolic and non-symbolic approaches to help us identify not just the threats, but also their possible implications to our organizations and systems. Such hybrid systems would be more capable in a wider variety of situations. It will be at that point that synthetic actors could become real teammates to their human counterparts, significantly enhancing the performance of our workforce.

5 Human Actors

One of the challenges in reviewing the current state of the cyber workforce is that there is a paucity of quantitative assessment regarding the cognitive aptitudes, work roles, or team organization required by cyber professionals to be successful. We argue that the people who operate within the cyber domain need a combination of technical skills, domain specific knowledge, and social intelligence to be successful. They, like the networks they operate, must also be secure, trustworthy, and resilient.

A concern in writing about human actors is that cyber professionals are generally seen as a homogeneous, holistic classification. That said, due to the complexity and rapid evolution of the tasks involved in cyber defense, it is important to note that there is substantial heterogeneity between work roles and individual skillsets. By virtue of this complexity in the task environment, cyber professionals need to work in teams. While in the military context cyber teams tend to be teams of diverse talents, in the private sector it is much more likely for smaller teams to be composed of similarly-talented individuals rather than a group with diverse work roles and backgrounds (Champion et al. 2012). Recent research has identified that cybersecurity teams are better able to solve complex tasks than individual analysts, potentially due to the distribution of expertise across analysts (Rajivan 2014; Rajivan et al. 2013; Rajivan and Cooke, in press). For instance, performance on incident triage was highest with a diverse group of heterogeneous talents as opposed to a team with members of similar background and skills. (Rajivan 2013). A limitation of research into cyber teamwork is that they have not examined different organizations of teams or combinations of teams. This future research is essential to determine the correct make-up of the future cyber workforce.

Champion et al. (2014) investigated the contribution of informal education to developing cyber security expertise and found that 69 of 82 professionals reported that informal education supplementation was a prerequisite for career success. Furthermore, 40% of professionals felt that job experience was the highest factor in positive performance over degree of knowledge/education (12%). Many professionals anecdotally reported that those receiving supplemental on-the-job training and mentoring exhibited the highest performance benefits as measured by future career success. Similarly, Asgharpour et al. (2007) found that operators who subjectively rated themselves with higher levels of expertise tended to have both more and more diverse competencies than those with less self-professed expertise.

Cognitive task analyses have identified that cyber professionals need to exhibit strong situational awareness (Jajodia et al. 2010), including juggling concurrent sources of information regarding the health of the network, historical and current network activity, and performing a continual assessment of risk. For recent meta-analyses see Franke and Brynielsson (2014), and Onwubiko and Owens (2011). Similarly, through the use of structured interviews, Goodall et al. (2009) interviewed twelve cyber professionals and identified that the requirement for situated knowledge (i.e., knowledge of the local environment) made intrusion detection a relatively unique task and challenging to transfer expertise to other tasks in the cyber domain. This required triage teams to interface with local workers to understand the topology and peculiarities of the local network to determine whether an intrusion had occurred and what remedies were available.

There are numerous tools to process this incoming information (e.g., Bro and Snort for intrusion detection), however, there is just too much information for a human actor to successfully process, and critical misses are inevitable. A human teamed with a machine, however, has the potential to cover a much wider set of attack vectors because the machine does not have the same attentional limitations and can do a more thorough assessment of making sense of large swaths of incoming data.

Before proceeding to discuss the importance of AI systems that can interact with human actors, it is important to understand how we are training our cyber workforce and to identify any gaps in training. The Department of Homeland Security’s National Initiative for Cybersecurity Careers and Studies (NICCS) developed a Cybersecurity Workforce Framework (Newhouse et al. 2016) to provide a base set of work roles for the cyber workforce. While this ontology was not empirically justified, it represents the most well-documented rostering of work roles in the cyber domain. This collection includes nine work-role categories, 31 specialty areas, and over 1000 types of knowledge, skills, and abilities. Major categories are described in Table 1.

Table 1. Cybersecurity Workforce Framework. Reproduced from (Newhouse et al. 2016, p. 14).

Securely Provision roles revolve around the more traditional information technology field including software developers, computer programmers, and network architects. The Operate and Maintain roles include System Administrators, Knowledge Management, and Security Analysts. The Oversee and Govern roles include managerial roles, Cyber Law, Policy Development, and Education. The Protect and Defend roles include Cyber Analysts (Operators) and Network Defenders. The Analyze, Collect and Operate, and Investigate roles all encompass the broad field of Digital Forensics and will tend to be government or law enforcement positions (Caulkins et al. 2016).

In general, cyber professionals in the Securely Provision, Operate and Maintain, and Protect and Defend work roles must have good mental flexibility and pattern matching abilities (Baker 2016; Ben-Asher and Gonzalez 2015; Champion et al. 2014). They will have to possess significant skill and knowledge about computer operating systems and using analytical tools for such things as network scanning, network mapping, and vulnerability analysis. This task environment involves scanning large numbers of network events and (generally false) alerts across multiple computer screens with the goal of identifying threats while minimizing false alerts (D’Amico and Whitley 2008).

A limitation of the NICCS Workforce Framework is that, of the 1060 types of knowledge, skills, and aptitudes, fewer than ten describe teamwork or working with AI. This implies that the Framework paints an incomplete picture of workforce proficiency (Cook 2014). Furthermore, the development of any cyber workforce that neglects the social aspect of human behavior on the network neglects a critical component of the cyber domain. For instance, cyber defense would be aided by an understanding of human behavior and how it introduces risk to the network (Asgharpour et al. 2007; Pfleeger and Caputo 2012). We should leverage AI and humans’ capabilities to maximize information exchange so each level processes the right ‘kinds’ of information to be most effective. Under this view AIs should process the large swaths of incoming poorly-structured data and distill this data into a format that can be readily presented to a human operator. The human operator can then perform high-level strategic inference over this well-structured information from the AI. We now know that human operators, though, will not use this data unless they can understand why the AI makes its recommendations.

6 Explainable AI

Most AI systems today are not designed to (nor can they usually) explain to their human users the manner in which they arrived at their conclusions. The reason is that most AI developed to date for cybersecurity applications is non-symbolic. As we explained in our introduction to AI earlier, these approaches, unlike symbolic ones, are not inherently explainable. System designers would have to deliberately develop explanation mechanisms, which is something seldom seen in the field. Faced with such opacity, many users choose to blindly trust the computer, which is a phenomenon that has been called the “in screen we trust” effect (Aiken 2017). The option is to distrust the computer and ignore its decisions if they seem unreasonable. Some systems, however, might not allow this option if their AI mechanisms are part of closed decision loops that don’t allow real-time human interference.

In order to develop and maintain the trust that is inherent in teaming, AI systems must be able to explain their conclusions to human teammates. In this regard, symbolic AI approaches such as expert systems and cognitive architectures are better suited because they model human knowledge and thought processes respectively. Their very nature is similar to higher level human thought constructs, which in most cases makes it simpler for them to present their causal chains to humans. Conversely, this nature also makes it easier for humans to point out errors or omissions in their synthetic teammates. The Soar cognitive architecture, for instance, uses goal graphs to simulate human cognitive processes, which naturally lends itself being explainable to people by representing the synthetic decision-making processes as goal trees.

Visualizing non-symbolic systems like ML processes, on the other hand, has traditionally been more difficult. The reason for this is that they mostly rely on mathematical models and processes. To this end, the Defense Advanced Research Projects Agency (DARPA) is pursuing its eXplainable AI (XAI) program, to which the authors of this paper are both contributing. One of projects in this program is XAI for the Veterans’ Transition Assistance Program (XAI-VTAP), which is geared towards matching the resumes of veterans to open job postings. Some of the work being done in this project uses novel techniques to provide an unprecedented level of visibility into how ML algorithms arrive at conclusions. Figure 2 shows how and why the system matched a specific resume to multiple occupational categories. The top part of the figure shows how good of a fit a candidate is against each category and provides examples from that person’s resume. The bottom part illustrates how various indicators were ultimately mapped to various categories. One could imagine job seekers using this feedback as a training aid to create better resumes in general, as well as resumes that improve their odds of getting specific jobs.

Fig. 2.
figure 2

User interface prototype for Explainable AI to support Veterans Transition Programs (XAI-VTAP)

While such explainability could lead to better employment opportunities and possibly improve resume-writing skills for veterans and other job seekers, it also enables the threats to AI systems posed by AML. While there are many types of AML, the one that is most relevant to our discussion is the deliberate manipulation of the data inputs to an ML mechanism so that it fails to function as intended. This could happen if an adversary determines how an ML-based span filter works and then crafts span messages that are not identified as such and thus are delivered to a victim’s inbox. It could also happen if the adversary pollutes the training data set for an ML-based product so that it is trained to correctly identify spam messages except those that have a particular set of characteristics that only the adversary knows. This would then allow only that adversary to bypass the detection mechanism. The knowledge that can be gained through explainable AI facilitates AML techniques.

Still, explainability is crucial to our human-machine teaming efforts for three reasons. Firstly, it allows trust to be developed between humans and their synthetic teammates. Autonomous AI agents are likely to reach some seemingly far-fetched conclusions that may stretch their credulity of their human counterparts. In those situations, it is necessary to be able to walk the human through the thought process. Secondly, the AI system’s conclusions will only be as good as the models they have and the learning they have been able to do on their own. It is entirely possible that some misfits may occur, in which case the human will be able to detect the error, point it out to the AI system, and allow it to learn from the experience. Thirdly, synthetic teammates have tremendous potential as training tools, which can only be realized if it is able to explain itself to those who are learning from or with that system.

7 Human-Machine Teaming

The notion of shared mental models between humans and machines is a common thread when examining human-centered big data research. Mental models provide a representation of situation, various entities, capabilities, and past decisions/actions. These models are dynamic, with analyst and model engaged in a continuous production loop. In addition, from a purely human level there is research on teamwork (Baumann and Bonner 2013) and the degree to which teammates from different backgrounds have overlapping shared mental models (Bearman et al. 2010). There is also research on the degree to which multiple agents can recognize a common plan from reading large corpora (Paletz 2014).

Teams of security analysts are in many instances, a loose association of individuals, rather than a functioning team (Champion et al. 2012). A functioning professional team is a “purposive social system” (Hackman and Katz 2010), in which members of the team have diverse backgrounds, identified by role and work together in an interdependent manner towards common objectives (Salas et al. 1992). Team effectiveness largely depends upon appropriate leadership, team structure, communication, collaboration and distribution of tasks. Communication is the key medium by which human teams form relationships, collaborate and share information (Cooke et al. 2013). Communication is the conduit to transform individual expertise and situational awareness to team level knowledge and situational awareness.

Field studies with security analysts found that communication and collaboration between security analysts was an integral aspect of effective defense particularly during a widespread security crisis (Goodall et al. 2009; Jariwala et al. 2012). Lab experiments on collaboration during the threat detection have also found evidence that cooperation between security analysts during triage analysis augments signal detection performance, particularly in novel and complex situations (Rajivan et al. 2013). However, during collaborative analyses, analysts may fail to contribute requisite expert knowledge and demonstrate biases in the way information is pooled from each other, leading to communication losses affecting threat detection performance (Rajivan 2014). Communication across the hierarchy of security analysts have also been observed to be inefficient and largely one-directional (bottom-up). Tools for collaborative threat detection developed using human systems engineering principles would help in mitigating such losses in communication between security analysts (Rajivan 2014).

Leadership is also crucial to security defense team development and performance (Buchler et al. 2017). Typically, an individual in a leadership role is expected to: develop team capabilities, facilitate problem solving, provide performance expectations, synchronize and integrate team member contributions, clarify team member roles and engage in meetings and feedback (Salas et al. 1992; Simsarian 2002). Field studies on security leadership showed that leadership is a significant predictor of defense performance. In one such study, two security teams, otherwise equivalent in skills, experience and knowledge, was observed to demonstrate widely different defense performance primarily due to differences in leadership approach and amount of collaboration (Jariwala et al. 2012). In a subsequent study, it was found that functional specialization and adaptive leadership strategies are important predictors of security defense performance (Buchler et al. 2017). Except for these handful of studies, the determinants of effective teamwork and leadership among security analysts is still an emerging area.

Collaboration, communication and knowledge integration is necessary for accurate and expeditious correlation analysis. From past team research, it is evident that teams often don’t realize their full potential and could fail for a multitude of reasons. Loss in team processes such as communication would lead to sub-optimal decision making. For example, collaborative threat detection requires the exchange of expert information between security analysts. Previous research has demonstrated that teams may not be effective in exchanging novel information. Particularly, uneven information distribution biases people to share, more often, information that are known to majority in the team and prevents them from sharing and associating unique information available with them (Stasser and Titus 1985). The effect of such team-level biases on security team collaborations are largely unknown.

Experiments on team interactions need to be conducted ideally in context (through field studies) or using simulation environments. Due to restricted access to real world cyber protection teams and due to lack of importance currently given to team process metrics in cyber defense exercises (Granåsen and Andersson 2016), experiments on team interactions in cyber defense can instead be conducted in the lab using simulation systems that recreate realistic team interactions and work flows between study participants which would in turn require the participants to exercise some of the same cognitive process involved while conducting cyber defense in the real world (Cooke and Shope 2004).

We argue that in order to incorporate machines into human teams effectively, they must be natural to use, seamlessly integrate into the task environment, and provide a subjective improvement in effectiveness. Ideally, a single human operator (or small team of operators) would be able to supervise multiple AIs (Chen and Barnes 2014; Pellerin 2015; Trexler 2017). The goal of the AI is to process the massive amount of incoming information, present it efficiently to the human operator, make low-level decisions, and help the human operator make high-level strategic decisions. This AI will be able to make decisions at the speed of cyberspace and adapt to new attack vectors in near real-time, which is orders of magnitude faster than a human operator. We foresee that within the next decade, the war for cyberspace will be fought between nations’ AIs, and the skill of the operators and effectiveness of the AI’s algorithms will be the deciding factor.

As such, it is essential for human operators to trust their AIs. Petraki et al. (2015) argue that it is important to have mutual predictability and adaptability in order engender trust. As previously discussed, that is one of the main goals of DARPA’s eXplainable AI Program. The ability of the AI to be able to adapt to a human operator’s goals, and for the operator to query the underlying question as to ‘why’ a decision was made is key to trusting in the AI’s automation. One such technique is to supplement traditional AI techniques with models that approximate human behavior, such as in the Soar cognitive architecture and the ACT-R cognitive architecture.

In summary, by leveraging AIs to do much of the complex sensemaking required in many cyber operations tasks, we argue that it is possible to maximize a human operator’s ability to conduct strategic operators effectively, even in the face of an overwhelming amount of incoming data. We argue that AIs need to seamlessly integrate with humans, and that they need to be explainable in order for human teammates to trust their output.

8 Conclusions

From the foregoing, we posit that there are three key elements of effective human-machine teaming in cyberspace: effective intra-team communications mechanisms, a sophisticated and diverse cyber workforce, and AI systems that can readily explain the rationales for their decisions to their human teammates.

We have already established that communication is the key medium by which teams form relationships, collaborate and share information. It is a logical extension of this premise to assert that whatever the team composition (e.g., human, synthetic), as long as there is at least one human in the mix, effective communications will be required to build and maintain the team’s effectiveness. Even if there are no humans in a team of cyberspace actors, communications will be key, albeit in a somewhat different form.

It will also be important to ensure that the human actors that are teaming with AI systems are knowledgeable of the capabilities and limitations of the underlying technologies. In other words, to fully leverage the potential of our synthetic teammates, we will need cybersecurity operators with broad knowledge and skills, and who know when to task agents and when to question their reports. There is a dearth of research in this area, so much work needs to be completed before we can quantify the requirements for humans in an effective human-machine cybersecurity team.

Finally, the skills of the human actors will be excessively tasked unless their synthetic teammates are able to explain to them the manner in which they reached a specific decision. This requirement for explainable AI addresses two critical aspects of effective teaming: trust and correctness. An important element of teamwork is trust, which can be eroded by unexpected behaviors, particularly those that could seem to undermine or threaten mission accomplishment. If a synthetic agent is incapable of explaining to its teammates how it arrived at a particular conclusion, it will not engender (and may erode) trust. Furthermore, since it may likely be infeasible to develop a perfectly correct AI system, the ability to explain itself will allow its human teammate to identify logical or syntactical errors.

Given that it is likely that AI will play an increasingly important role in the future of cybersecurity, it is imperative that we develop better constructs for human-machine teaming. These should be focused on effective communications, human workforce development, and explainable AI. Though much research is needed in all three areas, we can’t afford to take the risk of not getting this right. Our cybersecurity depends on it.