1 Introduction

The goal for multimodal communication (MMC) is to facilitate the conveyance of information through various modalities, such as auditory, visual, and tactile [1]. MMC has become a research focus within the domain of human-robot interaction (HRI) over the past few decades [24]. The appeal of MMC in robotics is largely due to the natural, intuitive, and flexible modes in which humans can communicate with robots. Dumas, Lalanne, and Oviatt [5] identified two main objectives for multimodal designs:

  1. 1.

    “support and accommodate users’ perceptual and communicative capabilities”

  2. 2.

    “integrate computational skills of computers in the real world, by offering more natural ways of interactions to humans”

For these reasons, MMC is employed to facilitate more efficient human-robot teaming, but the current technological-state of robots must be expanded to support a more fluid and dynamic interaction. Therefore, systematic evaluation of robot hardware and software capabilities is essential to direct functional requirements. However, prior to development, a clear understanding of what robot capabilities are needed is essential to enable efficient human-robot team (HRT) interaction. In other words, the end-user’s abilities and preferences must be analyzed to suggest how the future of robot functionality should meet those demands and limitations. The ideal experimental environment for such evaluations is within an interactive simulated context to allow controlled investigation within an applicable setting.

It is often the case in robotics that the current state-of-the-art limits the evaluation of collaborative mixed-initiative teams. However, interactive simulations overcome this limitation to enable research in areas closely related to desired applications, especially in regards to HRI [6]. Being that the results of such investigations will drive the technological requirements of future robots, it is imperative that this line of research be conducted early within the design life-cycle to provide developers with end-user recommendations and avoid the costs associated with negligible prototypes [7]. The use of interactive simulation environments is a fitting solution for early MMC within HRT exploration and experimentation for a number of reasons.

2 MMC Interactive Simulation

Laboratory studies are conducted in controlled settings with little to no variation introduced that is irrelevant to the independent variables of interest. On the contrary, field studies are conducted in natural settings, usually in the setting of applicable interest, and careful effort is made to ensure only the independent variables of interest are manipulated wherever possible. It is quite indisputable that both approaches are unique and necessary to fully understand the impact of the independent variables on dependent variables of interest, but pure laboratory experiments tend to reduce the ecological and external validity while field studies reduce the level of internal validity. The bridge between laboratory and field experiments is simulation-based approaches.

Interactive simulations combine the stringent control of laboratory experiments with the reflected settings of field experiments to create an environment in which more detailed measurements can be utilized to better investigate underlying concepts of real-world tasks. This is important for MMC research within dismounted HRTs because it is not only pertinent that the hardware performs accurately and reliably, but it is also critical to determine the best MMC methods for enhancing HRTs. Unlike other forms of simulation, interactive simulations instill a greater sense of significant consequences in regards to successful performance, increases the level of physical interaction with robots, can potentially introduce intermittent variables that are perceived as unexpected, but are fully controlled, and expands the field-of-view and situation awareness of the user beyond the focus on a single computer monitor. Moreover, interactive simulations can also control details like weather and time, and ensure safety while still maintaining contextual validity to provide the best outlet for consistent tasking.

Another major benefit to using interactive simulations to evaluate HRI components of HRTs is that the robot does not actually have to possess full functionality. In other words, through what is referred to as the “Wizard of Oz (WoZ)” method [8, 9], unbeknownst to participants the robot is actually controlled by a second- or third-party, usually the researcher [for a review see 7]. Steinfeld et al. [9] have development a framework for such a method in which both human-centered, robot-centered, or a mix approach can be implemented to research HRI, with the focus presented here on human-centered approaches. This method has been shown to be effective to evaluate both individual [10] and group HRI [11]. Often for the human-centered approach, the robot’s behavior is preprogrammed and only activated when the researcher observes that participants have correctly conveyed the communication sequence intended for an appropriate robot response. The researcher simply can trigger the robot to respond accordingly, while the participants believe they are controlling or directing the robot’s behavior. Through this approach, concepts of usability and preference can be evaluated for current and future human-robot MMC technologies as well as assess the functionally of MMC hardware and software to capture and interpret the user’s responses [6, 9].

3 Defining the Operational Environment

One area in which MMC for HRTs is most relevant is within the military domain, specifically regarding dismounted Soldiers. The U.S. Army has shifted its focus from developing platforms to investing more in the technology for dismounted Soldiers. It seems that the most effort has been allocated to developing expensive artillery and less on front-line combatants [12]. The recognition of this limitation by Army officials has helped direct resources back to the Soldiers with the solution of spawning robot teammates [12]. These robots will relieve Soldiers in ways such as carrying heavy gear, storing supplies for extensive mission durations, and will also assist in conducting intelligent, surveillance, and reconnaissance (ISR) tasks. However, before these capabilities and scenarios are delivered, an important HRI aspect must be resolved first – effective human-robot communication. MMC is the prescribed solution for HRTs, but less in known about the most effective means for communication under varying operational conditions. Prior to deploying robot teammates, extensive and rigorous tests on both input and output devices, as well as user preferences and abilities must be conducted. Generating interactive simulations for HRI will allow varying MMC methods to be tested under a multitude of contexts, exposing the optimal combination of communication modalities for HRTs in the operational environment of interest.

In 2010, the U.S. Army Research Laboratory (ARL) established the Robotics Collaborative Technology Alliance (RCTA) with the prospective vison that “…advanced autonomy-enabled technologies will play an even greater role in keeping our Soldiers safe” [13]. Within that program, one of the major focus areas is HRI, but even more so on the communication between Soldiers and robot teammates. Adopting the philosophy “from tools to teammates” [14], the Army’s goal is to equip squads with autonomous robot capable of carrying out orders without the need of a dedicated technician to teleoperate them. In fact, robots may eventually out number Soldiers [15] making teleoperation nearly an impossible task and emphasizing the inherent need to develop an efficient and effective means of human-robot communication among single and multi-robot teammates.

Additionally, the Man Transportable Robotic System (MTRS) is a U.S. Army program in charge of converting the assortment of unmanned ground vehicles (UGVs) to a single configuration [16]. Part of that transition will involve a common communication architecture. Therefore, similar to the way humans are able to communicate with other humans, the same metaphoric principle will apply to Soldiers and robot teammates, especially when they all use the same language. This indicates, a positive transfer of training will occur in regards to HRI when Soldiers learn how to communicate with their robot teammates. The question remains: how should a Soldier communicate with a robot teammate? The answer: it depends.

Dismounted soldiers are taxed with a myriad of tasking conditions that can constantly fluctuate within the continuum of military operations. Often, soldiers are exposed to environments that inhibit clear communication, deprive sensory perception, or reduce mobility. Factors such as noise from air- and ground strikes, limited visibility due to time of day or air quality, and physical restrictions within confined areas all deteriorate the quality of communication. Further, Soldiers wear extensive amounts of gear including camouflage fatigues, about 40 pounds (18 kg) of body armor, an additional 80 pounds (36 kg) of supplies, and usually hold a personal weapon for protection [12], which all effect the types of MMC that Soldiers are physically capable of producing and receiving.

It is also necessary to assess those same factors on the performance of the MMC hardware and software. Speech recognition systems convert human verbal responses into a translatable signal for a robot to understand, and cameras or inertial measurement-units (IMUs) capturing arm and hand signals. In a controlled environment, these systems may perform accurately and reliably, but when exposed to the conditions typical of the Soldier’s operational environment they may not. Since interactive simulation experimentation is the leading option available for research and training before Soldiers enter real-world environments, it is important to consider how each factor hinders clear communication between a Solider and a robot, and the best way to assess the effects on HRT performance.

4 Experimental Approach and Overview

A series of experiments has been proposed for investigation of MMC for dismounted HRTs, one of which is presented here. In order to successfully implement bidirectional MMC for dismounted HRTs, an understanding of both unidirectional interactions must be evaluated. Meaning, a clear understanding of both transmitting a message to a robot and receiving information from a robot should be investigated separately prior to experimentation of the full transaction in order to accurately identify the psychological, cognitive, and physical impact MMC has on dismounted HRTs. The modalities of interest for the present experiment were auditory and visual, specifically speech and gesture communication from a human to a robot.

Additionally, the same approach must be followed to test the hardware and software capabilities of MMC input and output devices to ensure system functionality meets set criteria [17]. Basically, if a speech recognition system is unable to correctly classify the audible response of the human teammate while performing a task within a quiet laboratory, then fielded applications of the hardware will likely fail. Similarly for gesture recognition systems, if they cannot classify the gesture response a human transmits while solely performing that task alone, then the system is sure to fail when the human must perform more than one task simultaneously.

This experiment simulated a surveillance operation, in which a novice population communicated commands using speech, gestures, and a combination thereof to a robot teammate. The goal for the task was three fold: 1) command a robot to report obstacles occluding the robot’s navigation path (the orange cones), 2) travel to a specified location (WoZ controlled), and 3) screen the targeted location until further notice. Table 1 describes the options available for each type of command and Fig. 1 shows the controlled experimental tasking environment.

Table 1. The commands the human communicated to the robot using speech, gestures, and a combination thereof.
Fig. 1.
figure 1

(Left) This image represents the command sequence the human communicated to the robot. (Right) This image depicts the simulated experimental setup with the command sequence displayed on the screen. The same screen also presented robot responses.

Depending on the modality used for the task, participants were given a description of the scenario to justify the use of each modality and to provide an operational context. Table 2 illustrates the scenario description for each modality.

Table 2. The description of the scenarios presented to participants for each modality

It is important to note that the use of the WoZ method was extensively controlled and warranted for this specific stage in the progression of MMC for dismounted HRTs. It also adheres to the recommendations for WoZ experiments described by Green, Hüttenrauch, and Eklundh [18]. In this experiment, the role of the participant was simply to send a command to a robot. The command was basically a simple navigation task. There was no task that required the participants to intervene if the robot was inaccurately performing the commanded task. Therefore, to ensure the robot successfully navigated to the desired locations, the WoZ approach was used to preprogram the navigation routes the robots followed and to ensure the robot’s behavior was consistent. The role of the researcher was merely to observe whether participants had correctly conveyed the right command and to initiate the robot’s movement through a wireless device that solely progressed the robots navigation to preprogrammed waypoints using an indoor tracking system. During training, participants were told that if they did not correctly convey the correct command to the robot, then the robot would simply not respond at all (i.e. the researcher would not initiate robot movement and no text response on the screen). It was up to the participants to inquire if they were making a communication error. Since social interaction was not a concern for this experiment, the argument that a WoZ controlled robot can actually be considered a proxy through which humans interact with other humans is dismissed [19]. Additionally, the level of deception was minimal, if at all, implying the users’ perceived usability and preferences were not affected by this approach.

Assessment was conducted in a pre- and post-test manner to capture the impact each communication modality had on participants and HRT performance before and after exposure to each modality type. To better understand the individual differences among participants they first completed a spatial orientation test, an attentional control assessment, and rated their expectations and perceived importance in regards to a robot’s behavior. Spatial orientation ability is crucial when communicating with a non-collocated teammate to scout and patrol specific locations. Attentional control begins to take into account the effect the environment and individual differences in attention could have on MMC by assessing how easily distracted participants are when exposed to certain attentional grabbing factors, such as music, other people, or even internal feelings such as hunger. Expectation ratings are also important to assess prior to experimentation because participants can carry preconceived notions about how their interactions will take place, which may bias their behavior and post-task ratings. Preference for modality type was assessed using a system usability scale after being exposed to each modality and once again after completion of all scenarios to see if their perceptions changed by the end of the experiment. It is also important to gather data on the levels of perceived workload each modality elicits from each participant. The NASA-Task load Index (TLX) was used to determine how much workload was elicited by each modality and what type of workload was contributing the most (e.g. mental, temporal, or physical).

The speech and gesture recognition hardware were assessed by their ability to recognize the participants verbal and hand signals, respectively, and correctly classify them based on their pre-programmed lexicon. By using the WoZ approach, the scenario completion was not dependent on the hardware and software performance because the researcher determined if participants were correctly communicating the information to their robot teammate and controlled the robot’s responses accordingly. In this manner, the human-centered approach allowed assessment of the user to be conducted separate from MMC input and output devices, which meant various devices could be tested as a between subjects factor without effecting the user’s ratings of their interaction or even being aware that hardware tests were being conducted.

5 Discussion and Future Direction

This experiment lays foundational work for understanding the best methods for MMC within dismounted HRTs. Human behavior can be affected by the environment where tasks are being performed, therefore utilizing interactive simulations can provide a host of benefits from both the laboratory and field approaches that support more natural tasking environments, but employ laboratory control for more accurate and precise assessment. Interactive simulations are an ideal solution to investigate MMC within dismounted HRTs. These controlled and consistent environments allow detailed assessment to be attained while still reaching a level of task and environmental complexity only achievable in real-world scenarios. The research presented illustrates first steps in utilizing interactive simulations to understand best practices for MMC.

The approach taken for this experiment was to employ WoZ techniques to assess the usability and preference of various MMC methods while simultaneously evaluating the functional capabilities of MMC hardware and software devices. By assessing each modality separately and then combined, each communication method was analyzed individually to see if either mode, or combination, was preferred and more effective for HRT performance. This also allowed the hardware and software for each modality to be assessed in a separate manner to determine the baseline performance within a controlled environment.

The theoretical contributions of this line of research sheds light on the cognitive implications of MMC within HRTs. This work also expands upon current theoretical frameworks of human information-processing by providing use cases in which to apply and assess the concepts of resource allocation and time-sharing efficiency. Practical applications of this research informs MMC developers and robot designers about the limitations and preferences of end-users interacting in HRTs. It also provides an experimental approach for assessing the optimal combination of MMC within dismounted Soldier-Robot teams.

Future research should begin to incorporate more factors that will increase the fidelity of the simulations. This simulation experiment is not intended to be the final stage of investigation before recommendations are generated for MMC designers, but a step in a systematic and iterative process of slowly removing researcher control of the robot and employing more robot autonomy as technology progresses. Tests under varying noise, lighting conditions, and with secondary tasks will allow for a better representation of the operational environment and ideally more generalization of the results to support a positive transfer of training for communication across all HRT tasks. Additionally, methods of MMC should be investigated for communicating with more than a single robot teammate. Even more so, investigations should be expanded to assess how varying characteristics of the robots (e.g. gender of voice) and returning modes of communication (e.g. auditory vs visual feedback) effect the efficiency of HRTs. The more reflective the experimental setting is of the operational environment, the better suited and more effective HRTs will be when deployed.