1 Introduction

In recent years, multimodal systems have gained considerable interest, and research in this area is expanding rapidly. Most research in the area of human-computer interaction has treated system input and output as separate domains. As a result of this division, two major groups of interfaces have emerged: multimodal input (e.g., touch, gestures, voice, and conventional input devices) and multimedia output (e.g., computer-based interactive audio-video presentations) [1, 6]. Furthermore, a knowledge gap still exists between the domain of system output research (i.e., users’ performance with multimedia presentations) and user performance research with multimodal input devices in the context of human–computer interaction [18]. Specifically, while conventional input devices are ubiquitous, familiar, and well established, they can quickly become an interaction impediment, especially when users need to interact with environments rich in sophisticated multimedia output. Therefore, to successfully integrate multimodal control input devices into a complete and well-balanced system (i.e., equivalent input–output capabilities), a clearly-defined conceptual framework is required. The intent behind the framework presented here is twofold. First, a conceptual framework is essential to identifying the theoretical constructs that could provide an insight into the natural integration patterns characterizing people’s use of different input modalities. Second, such a framework is also critical to identifying the means of successfully engineering these patterns into a system.

Pilots’ awareness of the flight deck as an interaction space is intrinsic. However, the pilot-aircraft interaction flow, especially with respect to aircraft system control inputs, consists of series of unintuitive, discrete actions pilots need to perform using conventional physical controls (e.g., knobs, buttons, and cursor control devices). This applies to even the most sophisticated flight decks to date. Furthermore, these pilot-aircraft control input interactions are also very much constrained within numerous “flat” surface areas in the flight deck where the majority of pilot interfaces are currently located (e.g., instrument panels, displays, side consoles, and overhead panels). However, the inherent spatial characteristics of this interaction environment support the notion of expanding the interface to what could effectively become one continuous interaction space—a virtual, multilayered “bubble”—an interspace. In the most general sense, the interspace can be an environment where people interact with technology freely and naturally in a multimodal fashion so that the actions in one modality complement, collaborate, and corroborate the input from the others, producing a well-choreographed and more organic user experience. Furthermore, the interspace can be an environment that is flexible, where an optimal blend of cooperating modalities can be used to overcome the weaknesses and capitalize upon the strengths of each individual modality. Hence, the key to an effective paradigm shift in designing multimodal control systems is contingent on the successful integration of the naturally occurring modality communication and cooperation patterns [14] within the intended interspace.

2 Background

The need to optimize pilot-aircraft interactions motivated the aviation research community to focus on researching new and novel control input technologies [10, 16, 19]. For example, Leger [10] identified a need for alternative control input technologies (e.g., voice, gaze, gesture controls) as these improvements would allow the integration of more features into aircraft while the pilots need to keep eyes focused on flying remains high. He further highlighted multimodal integration of two or more of these novel technologies as a method that would allow the user to operate the system under natural logic (as opposed to system-imposed logic). Implementing modality redundancy and complementarity would reduce the risk of error due to short-term memory failure and would also simplify the flight deck layout. This would offer more space for displays and important information. Finally, the author concluded that capitalizing on multimodal interactions with aircraft systems may require less training than complex conventional physical input devices.

Merchant and Schnell [16] reported on the development of a simulator that combined voice and gaze control. This combined approach was selected in order to overcome some of the limitations of gaze control and voice control when either method was used alone. Combining the two modalities was deemed by the researchers as the most appropriate way to alleviate the limitations of each individual modality.

Rood [19] introduced some general considerations related to the integration of alternative control technologies into flight decks from both a human factors and an engineering perspective. The author purported that from a human factors perspective, it was important to avoid a bottleneck to ensure that the potential for human input and system capability was not hindered by the interface. Alternative control technologies could reduce this bottleneck. Rood [19] also identified important recommendations for designing a multimodal interface such as task modeling, prototyping, context of task, and task loading, with modeling error and error correction being especially important.

Furthermore, considerable research has been conducted in the aviation domain regarding speech recognition and voice control: touch screens, touchless gesture recognizers and controllers, and eye tracking and gaze control systems [4, 5, 7, 13]. However, no unifying conceptual framework exists for the synergistic integration of conventional [2] and nonconventional [15] control input modalities into the flight deck. The conceptual framework outlined in this paper is inspired by four theoretical constructs:

  • Communication [20],

  • Complementarity of people and technology [11],

  • Distributed cognition [8, 9], and

  • Modality cooperation [14].

Furthermore, this framework is motivated in part by the recent significant maturation of control input technologies including touch screens, voice recognition, eye tracking, and touchless gesture recognition, among others.

3 Communication

The most well-known and influential model of communication [20] consists of five basic elements: an information source, a transmitter, a channel, a receiver, and a destination. The model also includes a sixth element— noise— a factor that may lead to the signal received being different from the one sent. Shannon and Weaver’s [20] seminal work led to very valuable research on redundancy in language and in making information measurable. It also gave birth to the mathematical study of information theory. The model may seem more information-centered than meaning-centered. However, the opening paragraph of Weaver’s introduction essay, “The Mathematical Theory of Communication,” suggests a very broad application of the fundamental principles of communication theory [20]. Namely, communication is described to include all of the ways by which one mind may affect another. This includes not only written and oral speech, but all human behavior. Yet a broader definition of communication also includes the means by which one mechanism affects another. Additionally, Shannon and Weaver [20] defined the following terms:

  • Information entropy as the measure for the uncertainty in a message,

  • Redundancy as the degree to which information is not unique in the system,

  • Noise as the measure of information not related to the message, and

  • Channel capacity as the measure of the maximum amount of information a channel could carry.

They also addressed three main challenges to communication:

  • Technical: how accurately a message can be transmitted,

  • Semantic: how precisely the desired meaning is conveyed, and

  • Effectiveness: how the conveyed meaning affects behavior in a predictable way.

In the context of this framework, the meaning of the sixth element (noise) of Shannon and Weaver’s [20] model is expanded to include all the factors that affect the received signal and make it different from the one sent. That is, all the factors (e.g., context) that implicitly modify the message and reduce the information entropy, as well as those factors (e.g., noise) that explicitly increase the information entropy in a message, are also included.

In summary, the first theoretical construct of this conceptual framework encompasses the broadest view of communications, which entails all the means by which people and technologies interact. While communication theory is the most theoretical of all the four elements considered, it establishes the outermost boundaries of the conceptual framework. Its applicability to (a) identifying the natural communication patterns that characterize flight crews’ interactions with technology on the flight deck, and (b) successfully engineering those interactions into a system by emulating their temporal, spatial, and semantic features becomes more distinct with the introduction of the rest of the framework elements.

4 Complementarity of People and Technology

Jordan [11] postulated that people and technology are not comparable but instead complementary, and that because they possess different capabilities, limitations, strengths, and weaknesses in order to successfully accomplish a task, people and technology require mutual dependency. Jordan identified several basic guidelines for implementing complementarity. One is that technology serves people in two ways: as tools and as production machines. The key notion behind complementarity of tools is that people perform best under conditions of optimum difficulty (i.e., if the job is too easy people get bored, and if the job is too hard they get fatigued). Therefore, tools should be used to bring the perceptual, cognitive, and motor requirements of a task to the optimal levels for human performance. The author reiterated that it is in managing contingencies that people are irreplaceable by technology, and that people degrade gracefully, whereas machines can either do the job or they fail. Jordan [11] challenged the human factors engineering community to develop systems where the motivation for the human element is embedded within the task itself. That is, unless there is a challenge to the human operators in every task, activity, and responsibility assigned to them, they will not complement the machines. They will quickly realize that they are used unproductively and will resist and rebel against it. Nothing could be more wasteful than developing systems that cause the human elements to rebel against the system. Consequently, in the context of a flight deck, if safety and efficiency of flight are to be maximized, the focus must be on ways to develop and support the complementary nature of the flight crew and flight deck systems, especially controls.

In summary, the value of Jordan’s work to this conceptual framework is in its positive affirmation that control functions, or functions in general, are not to be allocated to either humans or technology [11]. Rather, the interactions of humans and technology are to be carefully tailored, taking into account the mutual dependencies between task components and the ways they are performed by humans and technology.

5 Distributed Cognition

The original theory of distributed cognition was developed by Hutchins [8] and further advanced by Hutchins and Klausen [9]. The notion that cognition is fundamentally distributed is the underpinning of the theory, and its unit of analysis is a functional group rather than an individual mind. This unit of analysis was termed “a system of distributed cognition” [9]. Specifically, to provide an insight to the performance of the flight deck as a system, the authors discussed a much larger unit of cognitive analysis that included both the crew and the information environment (e.g., aircraft systems, voice communications with air traffic control and between air traffic control and other aircraft in the vicinity). This approach afforded a more thorough description of the cognitive processes by tracing the movement of information through the system and the mechanisms that carry out performance, both of the individual and of the system as a whole. The analyses revealed a pattern of cooperation and coordination of actions among crew members which, on one level, could be seen as a structure for propagating and processing information within the crew, and on another, as a system in which shared cognition evolved as a system-level property. The relationship between the cognitive properties of the system, as determined by the movement of representations and the cognitive properties of the individual components, identified a set of possible pathways for information to take through the system. Some of the observed pathways were anticipated by the design while others not intended in the design were, nonetheless, contributing to its performance characteristics. Pathways deemed redundant contributed to the higher of robustness of the system.

In summary, according to Hutchins and Klausen [9], the system’s cognitive properties are determined in part by:

  • The cognitive properties of the individual pilots,

  • The properties of the representational structures through which a task relevant representational state was transmitted,

  • The specific organization of the representations supported in those structures,

  • The interactions of the higher level representations held by the members of the crew, and

  • The shared characteristics of knowledge and access to task relevant information between the crew members.

In the context of this conceptual framework, the central theme of distributed cognition theory (i.e., the rejection of the traditional assumption that cognitive processes were limited to the internal mental states of an individual) offers a more holistic view of cognitive processes on a flight deck. Specifically, these processes are not only distributed across individual crew members engaged in collaborative tasks, but also between the crew and the artifacts they employ, and between the crew and the features of the environment surrounding them. The distributed cognition approach aims to demonstrate how intelligent processes in human activity surpass the boundaries of the individual. They transcend into the realm of multiple human contributors using multiple modalities to interact with each other and with multiple technological devices in order to reduce the information entropy. This, in turn, enables them to cooperate and ultimately complete a given task successfully.

6 Modality Cooperation

Martin, Veldman, and Béroule’s [14] theoretical framework was premised on a question regarding the appropriateness of using multimodality. They suggested that multimodality should be used only if it helped achieve usability criteria including (a) fast interaction; (b) robustness to system recognition errors, unexpected events, and user errors; (c) intuitiveness; (d) ease of linking presented information to more inclusive contextual knowledge; and (e) good transfer of information from one modality to another. To accomplish this, Martin et al. proposed six basic types of cooperation between modalities:

  • Equivalence: information is processed by any available modality best suited at that moment for a specific task;

  • Specialization: specific kind of information is always processed by the same modality;

  • Redundancy: the same information is processed by all modalities;

  • Complementarity: different chunks of information are processed by different modalities and then merged;

  • Transfer: information is produced by one modality and used by another (e.g., transfer between two input modalities, two output modalities, or an input and an output modality); and

  • Concurrency: different chunks of information are processed by several modalities at the same time but not merged (parallel use of several modalities).

In summary, Martin, Veldman, and Béroule [14] identified the usability goals of multimodal systems and further elaborated on how six basic types of modality cooperation could be used to best meet those goals. Here, the understanding of how to combine modalities and why a specific combination of modalities may improve the pilot–aircraft interactions is vital for the successful integration of multimodal control input devices into the flight deck.

7 Synthesis

The theoretical constructs that can provide an insight into the natural integration patterns characterizing people’s use of different input modalities and help recognize the means of successfully engineering these patterns into an interspace system described so far are:

  • Communication as defined by Shannon and Weaver [20];

  • Complementarity of people and technology as recommended by Jordan [11];

  • Distributed cognition as defined by Hutchins [8]; and

  • Modality cooperation as proposed by Martin, Veldman, and Béroule [14].

These components echo one shared notion. That is, in today’s information society people interact with and through technology; and therefore people and technology can be seen as the elements of one entity (e.g., a system). These elements continuously communicate and cooperate throughout the lifecycle of the system. Cooperation requires communication in order to successfully exchange information and to show and interpret the intent of present and future actions. Communication also requires cooperation in order to ensure a successful outcome. This includes integration of shared knowledge coming through the different communication channels and from all of the distributed elements of the system. However, successful continuous communication and cooperation within a system is a multifaceted phenomenon that depends on how prudently the results from the research into the natural temporal synchronization, spatial organization, and semantic cooperation between those elements are reflected in the design process.

One system that could benefit from optimizing the temporal, spatial, and semantic facets of the interactions within it is the flight deck. For example, in establishing the spatial aspects of the interspace, several interaction strata, or zones, could be defined within the interspace by using an approach similar to the creation of reach isorating surfacesFootnote 1 [21]. Reed et al.’s integrated approach to measuring and modeling reach difficulty and capability was based on the assumption that maximum reach is a probabilistic concept and should be modeled as such and that a maximum reach is a maximum difficulty reach. Figure 1 provides a notional illustration of a crew interspace Footnote 2 including:

Fig. 1.
figure 1

Notional control input modality allocation into the crew interspace

  • Inner interspace stratum: pilot’s immediate surroundings that are within easy reach without torso movement (e.g., location of physical controls–control yoke, side stick);

  • Intermediate interspace stratum: the zone with medium to high difficulty reach without torso movement (e.g., location of heads-down visual displays, instrument panels); and

  • Outer interspace stratum: the area confined by the perimeter of the physical flight deck enclosure that is not reachable without torso movement or reachable with higher degree of difficulty and with torso movement (e.g., windshield).

Furthermore, a set of control input modalities is “mapped” to each stratum as a hypothetical interspace modality allocation. That is, based on their inherent spatial properties (e.g., reach envelope), conventional physical controls [2, 11], and touch-based controls [3] are allocated to the Inner interspace stratum, while voice, eye gaze, and touchless gesture controls [4, 5, 7, 13] are allocated to all three strata of the interspace. The ultimate goal of this notional arrangement is that the actions in one modality complement, collaborate, and corroborate the input from the others, producing a well-choreographed and more organic interspace on the flight deck.

8 Discussion and Conclusion

Control inputs on today’s aircraft flight decks are mostly completed by using a multitude of rotary knobs, push-buttons, cursor control devices, etc. The need to optimize the pilot-aircraft interaction flow motivated the aviation research community to focus on researching new and novel control input technologies. Considerable research has also been conducted in the aviation domain regarding the implementation of speech recognition and voice control, touch screens, touchless gesture recognizers and controllers, and eye tracking and gaze control systems. Such nonconventional control input technologies have been researched as both standalone nonconventional control input methods and in combination with conventional physical interfaces. However, only recently have some of these control technologies reached the required level of maturity for aviation application. Now, they have the potential to help simplify flight deck operations and allow for more direct and intuitive interactions with aircraft systems via a properly engineered and contextually suitable selection of a control modality or a combination of modalities [13, 17].

In the context of the conceptual framework presented here, and founded on rigorous analyses of pilot tasks across all phases of flight, research conducted within the framework will focus on the examination of new and novel control input modality combinations within the flight deck interspace. Furthermore, in the research and development process, the naturally-occurring, mutual disambiguation between modalities will be leveraged to mimic the more intuitive collaboration in human-to-human interactions. Such an approach is deemed essential to the successful shift toward well-balanced system input-output capabilities and better management of uncertainty in interpreting users’ intent. The research goals include crew workload optimization, minimization of the potential for human error, and aiding error detection and recovery, thus improving the user experience and ultimately improving the safety of flight.