Keywords

1 Introduction

The past decades introduced a new law in Human Computer Interaction sciences, beyond the Moore’s law, we are talking about the Buxton’s Law of Promised Functionality [5]. The Buxton’s law illustrates the exponential growth of interaction means including multimodal interaction. Therefore raises a new complexity for the user centered designer looking for a tighter coupling between multimodal interaction possibilities and Human skills and previous experiences. Actually, the HCI popular Human-computer metaphor shows some limitations when the designer expects multimodal interactions compatible with Human skills; also called “Natural interactions”. Therefore this paper expects to present a summary of the latest multimodal theories in cognitive sciences and an approach of multimodal interactions in line with these theories. In this way, this paper proposes a user centered design process, focusing on the design of the future flight deck, dedicated to the identification of the most compatible multimodal interactions according to the piloting tasks. So the paper starts with a state of the art of the latest multimodal theories in cognitive science. Then the paper describes a design and evaluation process for multimodal interaction, called CONTACT (for COckpit NaTural interACTion), in line with these theories. Finally the paper concludes on lessons learned and way forwards.

2 Theoretical Background

Since the 1950s Cognitive Sciences have been dominated by the cognitivist (or computo-symbolic) approach and the Human-computer metaphor. According to this way of thinking, the mental activity is based on computations made on symbols, in step-by-step processing. Representations from modal systems are transduced into amodal symbols that represent knowledge. Moreover, the perception (considered as an input data) and the motoric response (considered as an output) are functionally dissociated. Consequently, the cognitivist approach is particularly focused on computo-symbolic processing which are supposed to link those two phenomenon [11]. This dualist vision dissociates the abstract cognition on one hand and the body, the physical and the social environment on another hand. Thus, it assumes an independence of cognitive activities towards the body and the environment [2, 3, 6, 10]. Applied to ergonomics, those theories particularly focused on the cognitive processing arising between information presentation and the motoric response despite perceptive and motoric activities themselves [7].

Since a couple of decades, some new approaches defend the idea that cognition is not an amodal system but a multimodal system and that every human activity, however “abstract”, relies directly on perception and action. Those approaches, generally called embodied and situated, are more adaptive and functionalist. They consider that the sole function of cognition is the action in order to adapt to the world. The action is no longer considered as an output of the system, but as a central and essential component of cognition. Consequently, the purpose of Psychology is no longer centered on mental representations but on the dynamic body-cognition-environment interactions. Those interactions are guaranteed by the perception and the action which are the interfaces between Humans and the world to adapt to [8, 9]. In a Human Factors perspective, we consider that embodied and situated approaches should be fostered to design Human-machine interaction. This choice aims to fit with the more and more multimodal technologic conjuncture [14].

2.1 Multimodal Approaches in Cognitive Sciences: Embodied and Situated Theories of Cognition

Both terms embodied and situated are fundamentally linked but they nuance the emphasis on bodily and emotional factors on one hand [8] and on environmental, cultural and social factors on another hand [12] for studying cognition. In cognitive ergonomics, this interesting distinction allows working on different scales, from perceptivo-motor details of interaction to the whole context in which it arises [4, 7].

Among recent theories of embodied and situated cognition, Barsalou’s simulationist theory [2, 3] supports that the human brain continuously simulates the possible interactions with its environment, at a perceptive, motoric and interoceptive (related to internal states such as emotions) level. Those simulations are based on neural activation patterns previously activated during interactions with similar environments. More specifically, every interaction with the world generates a distributed neural activation on the different perceptive, motoric and interoceptive systems. Those activations are captured in associative areas in the form of a multimodal state, in a bottom-up process. The simulation process is top-down: the confrontation with an event (perceptive, motoric or interoceptive) previously experimented reactivates a multimodal state and consequently a pattern similar to a previous distributed (multimodal) activation. During each experience in the environment the brain simulates the possible interaction (including situated actions) on the basis of past interactions. Thus, every situation generates a multimodal perceptivo-motor simulation.

The ideomotor theory [11] shares many common assumptions with the simulationist approach. It suggests that the actions are represented according to their perceived effects. The initiation of an action contextually adapted would be possible only via the sensorial activation of the endogenous or exogenous effects that this action will produce [18]. Hommel et al. [11] state that the ideomotor phenomenon is based on three postulates. First, perception and action are functionally linked into the same system. Second, perception and action are represented in a distributed format, that is multimodal. Third, the action control is proactive. More specifically, the distributed (or multimodal) characteristics of experienced contexts are integrated into episodic traces in the form of event files. Those files contain multimodal perceptive and motoric information. In comparison with Barsalou’s simulationist approach, the reactivation of an event file is more attributed to individual intentions instead of the encounter with the environment. Thus, the early representation of action’s consequences (i.e. the goal to reach), in other words individual’s intention, is sufficient to reactivate multimodal events files. The expected effects prime the action which will produce those effects, thus the action does exist before its execution.

Most recently, Versace et al. [23] proposed a memory model based on a unique, distributed and multimodal system: the Act-In (activation-integration) model. This model considers that the knowledge is composed of sensory, motoric, emotional and motivational properties of past experiences and that the knowledge emerges from the situation. The Act-In model also assumes that memory is an episodic, multimodal and distributed system. The knowledge emerges from the coupling between present experience and memory traces of past experiences. The knowledge emergence is based on two mechanisms: an inter-trace activation mechanism allows activating different memory multimodal traces containing perceptive, motoric or emotional properties common with the current situation; an intra-trace activation mechanism associates the different properties to form a trace. Both mechanisms allows simultaneously the knowledge emergence, but also the creation or modification of traces in memory. Thus, the brain is considered as a categorization system which is developing through traces accumulation and the emergence of specific knowledge depends on the singularity of memory traces and current situations, in other words their distinctiveness against other traces or situations. Interestingly, Act-In considers that memory traces reflect all the components of past experiences including perceptive, motoric, emotional and motivational properties, which determine the actions to a large extent. Like in simulationist and idemotor theories, Act-In conceives direct links between perception and action and defends that cognition is multimodal. Also, it supports the idea of an early activation of action in cognitive activities and rejects the idea of action as an output of the system.

According to those approaches, interacting with the world is a phenomenon emerging from the Human-environment coupling. It is a contextual phenomenon spatially and temporally situated. Consequently, it is a dynamic phenomenon. This point involves that every behavior is influenced by the previous behavior(s). For instance, Smith [19] and Thelen and Smith [20] explain the dynamic functioning of cognitive activities by the Piaget’s A-not-B error [16]. In Piaget’s experiment children between 8 and 12 months are in front of two hides (A and B). When the experimenter hides a toy behind A (the child see him), the child catches the toy behind A. But when the experimenter hides this toy behind B (the child still see him), the child persists to search behind A. According to dynamic approaches, once the child catches the object, a trace of this activity becomes an input for the following trial. Thus, the action to catch A emerges from the combination of this trace and the stimulus (a person hiding a toy). In other words, every situation produces a specific neural trace and a reinforcement of this trace. Behaviors are influenced by traces previously activated, that is, by prior behaviors. The most common dynamic effects are the intramodal facilitation whereby a pre-activation in one modality facilitates the following processings in the same modality (or combination of modalities) whatever the nature of the processing [21, 22] and the switching cost whereby a change of modality induces a cost [15].

2.2 Towards Applications for Multimodal Interaction

The embodied and situated approaches give strong basis about the multimodal nature of human activities. In particular, this description of human behavior indicates that the completeness of human behaviors is attached to tangible perceptive and motoric experiences. For example, if a pilot has the intention of changing its altitude, a perceptivo-motor activation associated with altitude changing will automatically be simulated at neural level (e.g. the action to perform on the rotary knob, the auditory, visual and tactile feedback of the rotary knob, the proprioceptive feedback of the plane, etc.).

In terms of application for multimodal interaction design, the first issue is to identify the perceptive and motoric modalities the most strongly related to the tasks to perform. For instance, we could imagine that the concept of fuel leak has a strong relation with the smell of fuel; thus, introducing artificially a smell of fuel in the cockpit could ease the failure detection and processing. Secondly, we have seen that the cognition is a dynamic system and that every behavior is influenced by previous behaviors. This involves adjusting the modalities allocation regarding their integration into the interaction dynamics.

In short, the multimodal interaction design has to be based on (1) the modalities allocation and (2) the interaction dynamics design. In this way, we propose to ground our method on the concept of PMU (Perceptivo-Motor Unit), presented in the following section.

The cognitive processes involved in our study are particularly automatic. Thus we expect that our method may allow quick answers, low human error and workload. Moreover, an interaction based on previous knowledge will necessarily limit the training needs. By analogy, we can say that this approach aims to solicit skills based behaviors as described by Rasmussen [17].

2.3 Perceptivo-Motor Units (PMU)

The core idea of the embodied and situated theoretical background is that the Human behavior emerges from the encounter between the individual and its environment. This emergence takes the form of a neural activation comprising perceptive and motoric properties associated with the situation. On one hand, the individual has intentions and perceptivo-motor experiences. In addition, we consider that individual has perceptivo-motor capacities varying according to internal and environmental factors. On another hand, the environment affords stimuli and action means. We call this emerging behavior a Perceptivo-Motor Unit or PMU. The PMU is defined as a division of user’s activity making arising a behavior based on a neural trace comprising perceptive and motoric properties associated with the situation (Fig. 1).

Fig. 1.
figure 1

Perceptivo-Motor Unit or PMU

The application of PMU concept consists in matching the elements of the individual side and the environment side in order to bring about a behavior adapted to the situation. Furthermore, the modalities allocation must pay attention to enable intramodal PMU sequences rather than switching costs (Fig. 2).

Fig. 2.
figure 2

Types of PMU sequences

3 Contact

3.1 Overview

CONTACT (COckpit NaTural interACTion) is a design and evaluation method for multimodal interaction projects in aeronautics environments. It is a user-centered approach based on the embodied and situated theories of cognition presented in previous sections. This method is still under development and the following section proposes a first presentation of CONTACT.

The CONTACT method has three main steps: NEEDS, CONCEPT and SOLUTION. It aims to (1) gather the needs related to the study; (2) define a multimodal interaction concept focused on perceptive and motoric aspects of interaction (Human and user centered aspects of interaction); and (3) translate the concept into technical solution (technical aspects of interaction). Those steps are built around the PMU concept derived from the literature (Fig. 3).

Fig. 3.
figure 3

CONTACT method

Each subpart of the method is presented as a document to fill. Thus, the method application is guided all along the process. Except the Needs document, all the documents to fill have the same template: necessary inputs, method description, example of application, expected output and evaluation to conduct. The following sections describe each step of the CONTACT method.

3.2 Needs

The Needs document aims to gather all the necessary inputs for the study’s scope. Elements to collect concern general data (e.g. general needs, starting hypothesis, interaction technologies assumptions, prototyping and test means), using conditions (system, users, environment and activity description) and organizational information (e.g. project stakeholders, planning). The Needs document offers an exhaustive view of all the inputs and documents associated to the project.

3.3 Concept

3.3.1 Intentions

Once the project needs collected, the CONTACT method proposes to determine a concept of multimodal interaction centered on the Humans aspects. As our approach is based on use cases, the first step consists in selecting relevant use cases and adapting them in the form of intentions. The notion of intention designates the expected consequences of a behavior, the goal to achieve in order to produce a perceptible effect [11]. It is the reason for which an individual performs an action. So, an intention includes perceptive and motoric properties associated with a goal and a situation. For example, “retract the landing gear” or “add the waypoint FJR on the flight plan” are intentions. This notion is close to the notion of task, but its formulation is adapted to work on perceptive and motoric properties associated with the situation.

The Intentions document contains criteria to select the relevant use cases and guidelines to reword tasks into intentions. Furthermore, the method provides examples of use cases, examples of intentions and associated templates to fill.

3.3.2 Perceptivo-Motor Experience

Perceptivo-motor experience refers to what future users would experience in terms of perception and action for each intention. This part concerns exclusively what they would experience “ideally”. It doesn’t take in account the constraints associated with the activity such as turbulence, weather events, incapacitation, abnormal situations, etc. This step aims to gather as much information as possible about future user’s knowledge (in the field of aeronautics, mainstream technologies or everyday life in the physical world) in order to transpose this existing perceptivo-motor experience into the concept of interaction. The purpose of this approach is to limit training needs, errors and workload and to improve human performance in the future cockpits.

To this end, the CONTACT method provides a set of the strongest perceptive and motoric modalities based on 32 helicopter piloting tasks [13]. To create this norm, 45 participants (experts and non-experts of helicopters piloting) had to assess to what extent the tasks involved 14 motoric modalities (e.g. head, left hand, left fingers, left foot, language) and 5 perceptive modalities (vision, hearing, touch, proprioception, smelling) on Likert scales (from 0 to 5). An example of results is presented in Table 1. Using this norm consists in transposing the results from 0 to 5 to related or similar intentions. According to the project needs, this set of tasks could be completed (the protocol is provided to extend the set to further tasks or different types of aircrafts such as airplanes or UAVs). More globally, the perceptivo-motor experience document contains all the guidelines and templates to achieve this step. To conclude, the perceptivo-motor experience step gives the “ideal” modalities related to each intention, rated from 0 to 5 (Table 1).

Table 1. Example of perceptivo-motor experience for the intention “set the radio frequency 103.40”.

3.3.3 Perceptivo-Motor Capacities

Perceptivo-motor capacities refers to the user’s capacities to act and perceive regarding the situation’s constraints for each intention, with a use cases approach. This part concerns what the users could experience “realistically”. To determine those capacities, the CONTACT method provides 21 criteria impacting directly perception and action in piloting environments. The criteria are grouped into 4 categories corresponding to mission, environment, Human and cockpit characteristics. This list of criteria has been defined by experts committees and optimized to both cover all the possible impacts on perceptivo-motor capacities and ease its use. Nonetheless, other criteria could be added according to projects’ needs (for example, this list could be reviewed for ground control stations contexts which are substantively different from cockpits).

For each intention, the criteria are instantiated according to objective data. As far as possible, the criteria instantiations are graduated from 0 to 5 and this scale corresponds to precise definitions and objective data. For instance, the turbulence criterion includes different levels of turbulence defined and graduated from 0 to 5 (e.g. 5 corresponds to extreme turbulence). Once the use cases criteria are instantiated, the method allows to translate the results into perceptive and motoric modalities availability (also noted from 0 to 5). Using the previous example, high turbulence will degrade touch modality and fine motor modalities such as fingers which could be noted 1 for example. The method guides the perceptivo-motor capacities notation, but it does not yet provide generic rules to determine it. Thus, we recommend to involve experts of the domain for this step. As for the perceptivo-motor experience, this part gives the “realistic” modalities related to each intention, rated from 0 to 5 (Table 2).

Table 2. Example of perceptivo-motor capacities for the intention “set the radio frequency 103.40” under turbulence conditions.

3.3.4 Modalities Selection

The modalities selection consists in confronting perceptivo-motor experience (“what do I prefer doing”) and perceptivo-motor capacities (“what I can do”). The selection of Human modalities depends on choices between preferred modalities and available modalities. The notation from 0 to 5 used for both experience and capacities ease this choice. Indeed, a strong overlapping between two notes indicates a relevant modality. Although currently the modalities selection is made by experts, we are looking at automatizing this step, at least for highlighting the strong overlapping (Table 3).

Table 3. Example of modalities selection for the intention “set the radio frequency 103.40”.

Once a first modalities selection is done, the method proposes to check the modalities sequences fluency in order to foster intramodal facilitation and to limit switching costs. To this end, the method provides guidelines for modalities choices according to Allen’s temporal intervals [1], that is the temporal relations between the tasks. Metaphorically, this step consists in “choreographing” the interaction with a broader perspective (not focused on intentions level).

The modalities selection takes the form of an Excel document to fill, which represents in parallel the perceptivo-motor experience, the perceptivo-motor capacities and the dynamic relations between the tasks. As a way forward, we plan to automatize partially this task with a dedicated tool. Finally, the modalities selection step allows to select one or several relevant modalities, taking into account both embodied knowledge and situation constraints, in a Human centered point of view.

3.4 Solution

3.4.1 Design

The design step consists in translating the concept (Human aspects) into design solution. This iterative step involves several stakeholders (domain experts, developers, Human Factors specialists, designers, users, etc.) in design sessions. The design sessions’ inputs are the needs and the concept (Human modalities selected). All the previous results are presented to the stakeholders who are guided to propose several designs. The workshops are made to first propose a maximum of solution and second converge towards a few designs. Several iterative sessions could be made to choose the most efficient design(s).

Those designs could be tested during design exposures, for example while confronting future users to mockups. To this end, the CONTACT method provides an evaluation guide and usability criteria to assess the designs (e.g. error rate, task duration, number of action, workload). The results could lead to revise or to validate the designs.

3.4.2 Solution

Finally, the designs selected are converted into technical solutions. Four levels of description have been defined to specify the technical needs: device (precise definition of the device), device hardware and software settings (e.g. mouse resolution and acceleration), interaction technic (definition of the interactions effects – e.g. click on the button starts a video and highlights the icon) and fine-tuning (interaction fine-tuning definition). This step could be made in collaboration with developers. If some prototyping and/or simulation means are available in the project, the solution could be implemented and tested in simulation conditions. The CONTACT method also provides an evaluation guide and usability criteria to assess the solution(s). Again, the results could lead to revise or to validate the solution(s).

4 CONTACT Applications Lessons Learned and Way Forward

The CONTACT method first applications (on future commercial aircrafts and future helicopters cockpits) are in progress. Those applications allows us to evaluate and improve the process and the associated tools. So far, we observed that the method using required a strong intervention of Human Factors specialists. In particular, the perceptivo-motor capacities and the modalities selection steps necessitate the involvement of Human Factors specialists. As a way forward, we are considering a partial automation of the method. Firstly, we are establishing some rules to facilitate the perceptive-motor capacities definition according to the set of criteria (for example, if the turbulences criteria = 4, then the touch capacities = 1). Secondly, we are studying a dedicated tool to ease the modalities selection. Such a tool would suggest modalities to select, in this way the users of the tool will only have to check the results obtained. Such improvements will accelerate the overall process and also allow the method using by a wider range of users (e.g. not only Human Factors specialists). Despite those future enhancements, we still recommend to convene a multidisciplinary group of specialists to optimize the benefits of the method deployment (e.g. Human Factors specialist specialized in Cognitive Sciences and physiology, engineers, UX designers, final users, developers, domain experts).

Beyond the stakeholders to involve in the project, we observed that the CONTACT method required some appropriate means. Those resources are documentation (technical documents about the targeted system, its use context, the users population, the missions to perform, etc.), but also test or simulation means to perform evaluations. According to the test and simulation means available, the CONTACT method may remain usable on subset; for example it is possible to use it without performing evaluations.

Finally, the CONTACT method can be integrated at different maturity levels of a project, upstream or during the project (e.g. from TRL 1 to TRL 6). The method is also applicable incrementally for research and development project; for example, the CONTACT method is compatible with agile methods. The estimated duration for its application varies between a few weeks up to few month according to the project scope.

5 Conclusion

The CONTACT method interest lies in the integration of embodied and situated approaches of cognition (Human centered approach), context and interaction dynamics into an ergonomics process. In a Human Factors point of view, the expected benefits are to limit training needs, errors and workload and to improve Human performance in future cockpits. In an industrial point of view, the expected benefits are a better integration of Human Factors and safety impacts and thus an optimization of design and development cycles for complex multimodal aeronautics systems. Applying this method, also ensures a consistent modalities allocation philosophy, as they are studied at intention and also inter-intention level. To conclude, the CONTACT method is an original approach to design aeronautics multimodal workstations.