Keywords

1 Introduction

Situation awareness (SA) evaluation has been a central topic in the study of complex operational environments for the past 30 years. Empirical findings show that incorrect SA is a major impact factor on human error. For example, in aviation, 88% of aviation accidents involving human error could be attributed to SA problems [1]. Over the years, there has been a shift in the need to evaluate SA. From accidentology in the late 80’s [2], SA became a subject of “in-situ” evaluation in training or interface design contexts [3,4,5,6]. Its central place in the decision making process, individually and in teams, makes its assessment a key element in performance prediction and analysis.

Today, as technology becomes increasingly adaptive, the objectives have evolved towards a real-time evaluation of the user’s state, so that technical systems can react to problem states. In operational settings, on-task measures offer the opportunity to mitigate critical user states through adapting levels of automation [7], modes of interaction, or communications. In adaptive instructional systems, real-time measures can be employed to monitor and ensure desired training states, for example to foster the development of coping strategies. An SA-adaptive training system could, for instance, identify SA problems in teams and provide specific training content that focuses on improving communications processes.

To date, despite the unanimously recognized importance of the SA construct, mitigation of SA problems has not yet been a focus of such systems. A look at traditional measures of SA unveils a major limitation that could well be a possible reason for this disregard: designed to detect what is wrong with SA, measures have focused on SA accuracy much more than on when or how fast SA was achieved, often resulting in rather qualitative assessments. However, quantitative assessments are important to know when to adapt. Particularly in team settings a temporal evaluation of SA is necessary to identify when team performance is likely to suffer from asynchronous SA. Multiple researchers agree that the development of unobtrusive objective online measures of SA is the logical and necessary next step [8, 9]. Despite this consensus, we are not aware of any studies that have looked more closely at what it would mean to measure SA dynamics.

To address this demand, this paper explores the concept of SA synchrony as a measure of shared SA temporal dynamics. We propose the use of indicators of when SA is achieved to identify three intervals with relevant shared SA (SSA) problems. We further suggest that it should be possible to effectively mitigate such SSA problems by optimizing the length of these intervals, possibly through adaptive interventions. After a review of current techniques and an exploration of why these are inherently limited by SA’s nature, we introduce our conceptual approach, then discuss implications, requirements, and possible approaches to measuring SA synchrony.

2 Measuring Situation Awareness - Knowing What

Measuring SA has traditionally focused on assessing what is wrong with the user’s internal representation of a given ground-truth situation. This may originate from SA being commonly defined as the active knowledge one has about the situation he is currently involved in. Notably, Endsley’s three-level model [10] describes SA as a product of the situation assessment process on which decision making is based on. Endsley’s information processing approach is structured into three hierarchical levels. Level-1 SA comprises the perception of the elements in the environment, level-2 SA is about their comprehension and interpretation, and level-3 SA represents the projection of their evolution in the near future.

At the group level, SA is necessarily more complex to define and evaluate. Although numerous debates also remain regarding its definition, the two concepts of shared SA and team SA tend to be recognized as describing respectively the individual and the system-level of a group situation awareness [9, 11,12,13,14]. Given this systemic aspect of SA, most classical measurement methods are not well-suited to account for the higher complexity of the sociotechnical system.

In 2018, Meireles et al. [15] referenced no less than fifty-four SA measurement techniques across eight application fields. (Driving, Aviation, Military, Medical, Sports …). This diversity of assessment techniques illustrates the need for context-specific methods. However, the vast majority of techniques rely on discrete evaluations either during or after the task. Self-rating techniques, performance measurements, probe techniques, observer rating – each of these techniques has particular advantages and disadvantages, as described in the upcoming subsections.

2.1 Post-trial Techniques

In post-trial self-rating techniques, participants provide a subjective evaluation of their own SA based on a rating scale immediately after the task. The situation awareness rating technique (SART) [16], the situation awareness rating scales (SARS) [17], the situation awareness subjective workload dominance (SA-SWORD) [18] and the mission awareness rating scale (MARS) [19] are some of the most used self-rating techniques. Performance measurements have also been explored as a post-trial assessment method. SA is inferred from the subject’s performance on a set of goals across the task [20].

Being non-intrusive to the task, post-trial methods seem well suited for ‘in-the-field’ use and are easily applicable in team settings. However, they suffer from major drawbacks: Self-rating techniques are greatly influenced by subjects’ task performance and limited ability to evaluate their own SA [21]. Performance measures are based on the controversial assumption that performance and SA are reciprocally bound.

2.2 In-Trial Techniques

A second category of SA measurement techniques consists of subjects answering questions about the situation during the trial. Called query or probes techniques, these methods can be categorized into freeze probe and real-time probe techniques. The situation awareness global assessment technique (SAGAT) [22] is the most popular and most used among a variety of freeze probe techniques, which also include the SALSA, a technique developed for air traffic control applications [23] and the Situation Awareness Control Room Inventory (SACRI) [24]. The trial is frozen (paused) from time to time while subjects answer questions regarding the current situation. Alternatively, real-time probe techniques push questions to the subject through the interface during the task, without pausing it. The Situation-Present Assessment Method (SPAM) [25] particularly uses response latency as the primary evaluation metric.

Although these techniques allow for an objective evaluation of what subjects know about the situation at critical points in time, they often invoke task interruptions and task-switching issues, they are thus deeply intrusive and less suitable for ‘in-the-field’ use.

2.3 Potential Continuous Techniques

Observer-rating techniques such as the Situation Awareness Behaviorally Rating Scale (SABARS) [19], are designed to infer the subjects’ SA quality from a set of behavioral indicators. Such direct methods require the participation of subject matter experts and often result in a highly subjective evaluation.

Physiological and behavioral measures may also serve as indirect but potential continuous evaluation techniques of SA. However, unlike workload for which numerous psychophysiological and neuropsychological metrics have been proven reliable [26,27,28], a viable objective continuous measure of SA is still unaccounted for [29, 30].

2.4 In Team Settings

To date, the vast majority of existing measures of team SA (TSA) or shared SA (SSA) are variations of individual SA measurement methods and no measure has been formally validated [13, 31,32,33]. Shared SA can be seen as a matter of both individual knowledge and coordination, differentiating two levels of measurement [34, 35]: (1) the degree of accuracy of an individual’s SA, and (2) the similarity of teammates’ SA. The evaluation of SA accuracy is essentially what most objective techniques (cf. Sect. 2.1 and 2.2) are concerned with. One’s understanding of the situation is compared to the true state of the environment at the time of evaluation, leading to the assessment of SA as a degree of congruence with reality.

The evaluation of SA similarity is usually based on the direct comparison of teammates’ understanding of the situation elements relevant to them. SA on an element is considered shared if they have a similar understanding of it. For example, the Shared Awareness Questionnaire [36] scores teammates SSA on the agreement and accuracy of their answers to objective questions regarding the task. Inherently, these methods suffer from the same limitations than individual techniques and valid and reliable measures are still lacking [37].

2.5 Summary

In summary, methods like post-situational questionnaires or observation are usually unobtrusive but subjective, while objective methods require intrusion in the task to ask the questions (Table 1).

Table 1. Summary of major categories of SA measurements

Although a one-fits-all technique is not necessarily pertinent due to the diversity of goals and context of evaluation, all measures exhibit a common limitation: they examine SA at certain points in time. As the environment evolves, however, SA has to be built and updated continuously to integrate relevant new events, information, and goals. Thus, SA is inherently dynamic [29, 38] and a continuously evolving construct. The inability to take into account the dynamic nature of SA is the main criticism expressed towards current evaluation techniques [14].

3 The Dynamics of Situation Awareness – Knowing When

In a context where Human-Autonomy Teaming and adaptive systems used in training and operational settings are in dire need for online assessment of the human state, the question of the evaluation of SA’s temporal evolution (SA dynamics) has become central [39, 40]. In such operational environments, coherent decision making and team performance rest on a common and accurate understanding of what is going on and the projection of what might happen. Thus, this section addresses the question of why understanding the dynamics of SA is of importance and what might be relevant to measure it in a team context.

According to Endsley’s model of Situation Awareness [10] we understand that the building of a shared SA, rests upon the perception and similar integration of the right situational elements by all teammates. In this spirit, shared SA is defined as

the degree to which team members possess the same SA on shared SA requirements” (Endsley and Jones [43], p. 48)

SA requirements are pieces of information needed by individuals to take decisions and act to fulfil their tasks. They may concern the environment, the system, as well as knowledge of and about other team members. In [41], Cain refers to these information items as Necessary Knowledge Elements (NKE). As previously explored by Ososky et al. [42] we propose to extend the concept to Necessary Shared Knowledge Element (NSKE), defining an information item needed by all teammates to fulfil a collaborative part of their tasks, in other words, the “shared SA requirements” from Endsley & Jones’ definition (Fig. 1).

Fig. 1.
figure 1

Illustration of the Necessary Shared Knowledge Element (NSKE).

However, these elements necessary to build the shared understanding of the situation are rarely perceived simultaneously by each individual [41, 43]. Let us assume a hypothetical case where all necessary information is available to two teammates (A and B) and where they have both managed to form the same representation of it, effectively achieving SSA. Whenever a new NSKE appears, it invalidates the current SSA until the NSKE is integrated with A’s and B’s individual SA, respectively, to achieve an updated SSA (Fig. 2). In this model, three latencies are of interest for temporal SSA assessment, creating four phases of interest for SSA assessment.

Fig. 2.
figure 2

Illustration of SSA temporal evolution and associated Initial Integration Latency (IIL), Team Synchronization Latency (TSL) and Team Integration Latency (TIL).

The Initial Integration Latency (IIL), the first latency, is the time needed by the first teammate to perceive and integrate the new NSKE into his updated SA. The interval between the appearance of the NSKE and its integration into A’s SA (Phase 2 in Fig. 2) represents a situation of shared but inaccurate SA that comes with an increased probability of inaccurate decision making. During this period, teammates still possess a common representation of the situation. Individual decisions are consistent and collective decisions are coherent with the ongoing strategy. However, their representation is no longer up to date. The difference between reality and its representation increases the risk of inappropriate decision making. The duration of this latency is influenced by the same factors concerning attention and the sensory-perceptual system that impact Level-1 SA [32]: Stress, fatigue, workload, or interface complexity.

The second latency, we call Team Synchronization Latency (TSL), represents the time the second teammate needs to perceive and integrate the new NSKE into his updated SA after the first teammate did [Phase 3 in Fig. 2]. Taking into account the first latency, this creates an interval of divergent SA located between the two teammates’ SA integrations of the event (Phase 3 in Fig. 2). During this time span, in addition to SA not being accurate for at least one of the teammates, SA is also not shared, increasing the probability of incoherent decision making. In this situation, two teammates, one being up to date with the situation and not the other, could send conflicting instructions to a third one.

Finally, the Team Integration Latency (TIL), is the sum of the first two. It represents the time elapsed between the appearance of the NSKE and its integration by the last team member concerned (Phase 2 + Phase 3 in Fig. 2). It reflects the duration for which not all team members have accurate SA.

Shared SA is re-accomplished once the second teammate acquires the NSKE. (i.e. after the second latency). Two modes of NSKE acquisition can be distinguished: independent and collaborative acquisition of the NSKE. The independent method of synchronization (Fig. 3) consists of both teammates autonomously acquiring the NSKE directly from the environment. As the perception of the new situation element is accomplished without assistance from the teammate, IIL, TSL and by extension TIL are subject to similar influencing factors.

Fig. 3.
figure 3

Independent method of acquiring the NSKE to return to shared SA

In contrast, collaborative acquisition of the NSKE (Fig. 4) is based on the active exchange of the NSKE between teammates. Research has shown that verbal or electronic communication is central in the process of building and maintaining shared SA [42, 44]. The first individual to perceive an NSKE (teammate A) communicates it to teammate B sometime after having integrated it into his own SA. The communication may comprise the element of the situation itself (e.g. “There is a new unidentified airplane track”; Level-1 SA) or already include higher-level information based on A’s processing and sense-making (e.g. “Identification needed on the new track”; Level-2 SA).

Fig. 4.
figure 4

Collaborative method of acquiring the NSKE to return to shared SA

In this case, B’s acquisition of the NSKE depends on its prior acquisition by A, so that the duration of the second latency is influenced by additional factors. While perception and attention can still be impacted by the previously mentioned factors, there are additional factors specific to the exchange of information that can impact the communication and its content. This can be e.g. A’s workload, the priority of NSKE over other tasks of A, the quality of shared mental models, or B’s ability to receive and process communications. The recognition and effective exchange of NSKE also requires sufficient knowledge of the teammate’s tasks and needs of information, as well as an ongoing estimation of the teammate’s current SA. This knowledge is commonly referred to as part of the Team SA [9, 11].

By being inherent to the process of SA updating and sharing, these latencies emphasize the importance of SA dynamic properties. As such, we propose their use as a metric to assess SA synchrony. The following section addresses possible approaches for measuring these latencies.

4 Perspectives for SA Synchrony Measurement

Direct objective measures of SA have been extensively validated for a wide range of domains; however, they rely on an objective ground truth to compare responses to. They use a methodological standardization to objectify an otherwise subjective representation of the situation. With SA being a cognitive construct, evaluation of the situation model requires the elicitation of its content [4]. Therefore, any direct measure requires the expression or verbalization of an internal construct, making it inevitably intrusive and difficult to apply in fieldsettings. Although in some situations assessing the accuracy of the knowledge possessed can be sufficient, being able to detect and measure the hereinabove latencies through indirect measures would open new ways to quantify, qualify, and respond to shared SA issues in training in real-time.

4.1 Quantifying SA Synchrony

In 1997, Cooke et al. [45] proposed a cognitive engineering approach to individual and team SA measurement including, among others, behavior analysis, think aloud protocols or process tracing. Some of these methods find echoes in today’s human monitoring approach in cognitive engineering. Monitoring team members’ activities (including cognitive, physiological and behavioral) and comparing them, could provide an indirect way to assess SSA that can help overcome the evaluation constraints inherent to SA’s nature.

In this sense, SA Synchrony can be seen as the temporal comparison of SSA-driven reactions and behaviors between two individuals in a collaborative work situation [46].

As illustrated in Sect. 3, the perception of a new situational element defines the three latencies and is central in the shared SA synchronization process. Thus, techniques oriented towards detecting this perception could be suited solutions to measure the three latencies discussed in Sect. 3. In this, some behavioral and physiological measurements present the advantage of being continuous and are already being used for the quantification of the user’s state and activity [47,48,49]. As continuous measures, their temporal evolution can easily be compared and latencies between reactions or behaviors may be observed.

In 2019, de Winter et al.[8] built the case for continuous SA assessment by using a metric derived from the eye movements as a promising, although highly improvable, continuous measure of SA. More generally, eye-tracking is considered a potentially non-intrusive SA measurement tool. Recent applications in aviation [50,51,52] and driving [53], allowed to infer individual attentional focus and SA from patterns of gaze fixation that can then be compared between teammates. Similarly, reaction times could also be recorded through mouse-tracking or the tracking of other interface-related behaviors, as explored in [54,55,56,57].

Given all the techniques already explored and their mixed results, a multi-measurement approach seems to be required to capture complex constructs such as SA [20]. In an exploration of such a usage, we combined eye-tracking and mouse-tracking to extract perception latency and reaction time of individuals during a simplified airspace monitoring task. The first fixation on newly appeared tracks and the first click on them were recorded, allowing for an inference on individual SA actualization moment.

On paper, a multi-dimensional approach to human monitoring provides the basis for continuous and objective real-time assessment. Despite this potential, we recognize that by focusing on the resulting behavior, the measure could be strongly influenced by factors other than SA [45]. In addition, the methods described above still have limitations for field use and many issues need to be considered, such as the “look-but-fail-to-see” phenomenon in which one gazes without necessarily perceiving. As the techniques are already very sensitive and complex, the need to combine them makes their application all the more difficult. Future research should provide insight into how these methods can complement each other in measuring SA synchrony.

4.2 Qualifying SA Synchrony

While SA synchrony is primarily intended to be a purely quantitative objective measure, its interpretation and qualification are nonetheless interesting. Importantly, as communications add an inherent latency, a perfect synchrony of SA across team members is neither realistic nor necessarily desirable [58,59,60]. We understand that, when collaborating, the interpretation and prioritization of tasks and the relevance of the NSKE is a function of individual strategies and objectives. Thus, in order to identify SA problems, it may be necessary to evaluate the deviation from an expected latency. The interpretation or qualification of SA synchrony requires an in depth understanding of individual and team tasks, processes, and communications. As stated by Salas et al. [61], the qualification of behavioral markers must be contextualized to the environment in which they are being applied. Similarly to theoretical optimal SA [62] a theoretical optimal synchronization could be defined based on team task analysis. In order to be qualified, the link between SA synchrony and performance needs to be studied.

4.3 Using SA Synchrony

As most quantifiable metrics, SA synchrony can be used as descriptor of the collaboration or the performance. It is suited for both posteriori evaluation of overall team behavior during tasks and real-time assessment for an adaptive system or team feedback. Optimized states of synchronization between teammates or with the reality can be defined, aiding in the identification of problematic periods during collaboration processes.

In the specific context of training, scenarios and NSKEs are often known a priori, allowing the definition of anticipated responses to measure IIL, TSL and TIL. Thus, scripted training settings may allow for an easier assessment of SA synchrony than naturalistic situations. Adaptive interventions could then be designed to reduce the problematic latencies.

5 Conclusion

The dynamic nature of SA is unanimously acknowledged and its temporal evolution is the subject of much discussion. However, the assessment of SA dynamics has received little attention compared to SA accuracy.

Thus, we proposed, as a complementary approach to accuracy and similarity of SA, to consider the concept of SA synchrony as an indicator of SA dynamics in teams. We hope that pursuing the opportunities presented by the concept of SA synchrony may help in overcoming current limitations and drafting novel solutions for assessing and improving non-optimal SA dynamics. As discussed, knowing when? and for how long? SA synchrony is (not) achieved may be a helpful complement for assessing shared SA and preventing human error in a team setting.

We identified three intervals with SA-relevant issues. Future research may focus on measuring the duration of these three latenciesas possible quantitative measures of SA synchrony. SA being an internal cognitive construct has directed measurement techniques towards essentially discrete and intrusive methods. In essence, the measurement of SA content necessarily requires some form of verbalization that does not seem to be compatible with the continuous measurement techniques required today for online assessment. Considering the limitations of current techniques, we suggest the use of indirect measures. Although these are highly criticized for their inability to capture the content of the representation objectively, they seem the best fit for the ‘in-the-field’ applications required today because of their continuous and unobtrusive characteristics and their potential to be evaluated in real-time. We intend to identify and evaluate a number of candidate measures in upcoming research.