Keywords

1 Introduction

To this day, pilots of large commercial aircraft – including those produced by Airbus and Boeing – still interact with on-board avionic systems using conventional interfaces such as the Flight Control Unit (FCU)Footnote 1 for autopilot control, the Control and Display Unit (CDU) for flight management, and the Radio Management Panel (RMP) for communication and navigation frequency selection. Information about the current state of the aircraft and its various systems is shown on a number of instrument displays such as the Primary Flight Display (PFD) and Navigation Display (ND).

A lot of work has been carried out over the last decade to introduce new modes of interaction to the flight deck, such as touchscreen technology. There are several advantages associated with this technology, such as the ability to manage avionic systems in a more intuitive manner, and to control systems and view their status from the same display. Touchscreen solutions are already beginning to appear on business jets [1] and will soon be introduced to commercial passenger aircraft as well [2].

A number of research projects have explored the application of touchscreen technology to the flight deck [3,4,5,6]. In [6] the University of Malta developed a concept for tactical and strategic flight control which is based on the use of a touchscreen device which can be placed on the table in front of each pilot. With this device, each pilot can interact with multiple avionic systems through a single interface without having to reach out to different controls located around the flight deck. This can be particularly advantageous when flying through turbulence. An image of the touchscreen interface is shown in Fig. 1.

Fig. 1.
figure 1

The touchscreen interface concept

Another mode of interaction which can be introduced to the flight deck is voice control (also known as Direct Voice Input (DVI)). Voice control has many potential benefits; for instance, pilots can issue commands ‘hands-free’, thus allowing them to use their hands for other tasks. Also, pilots do not need to look down at a particular screen and can therefore spend more time looking outside for other aircraft and potential obstacles. On the other hand, voice control presents a number of challenges, including the ability to cope with voice differences (such as accents and intonations) and noise in the cockpit. DVI has been used on military fighter jets (such as the Eurofighter Typhoon) for several years [7] but is yet to be introduced on commercial aircraft. In [8] the authors explore the use of multiple modes of interaction – including touch and voice control – for the purpose of flight management.

This paper focuses on an ongoing research project which builds on the results of [6] by extending pilot interaction with cockpit avionics through the use of DVI. This will enable pilots to use voice commands and/or touchscreen gestures to control the aircraft. One of the advantages of DVI that is exploited in this project is the possibility to recognize commands that are issued by the pilot when acknowledging (repeating) instructions or clearances issued by Air Traffic Control (ATC).Footnote 2 For instance, if ATC requests the crew to fly on a certain heading, the pilot will repeat the heading instruction back to ATC and the DVI application will recognize the pilot’s heading command. Then, all the pilot has to do is execute the command. Hence, this has the potential to reduce crew workload.

The rest of this paper is organized as follows. Section 2 highlights the main requirements associated with DVI. Section 3 presents the voice commands which were defined for tactical flight control. Section 4 describes the main components of the DVI application. Sections 5 and 6 discuss the preliminary evaluation of the DVI application and Sect. 7 presents the main conclusions of this work.

2 DVI Requirements

In order to ensure that voice control can be applied to the cockpit environment in a way that is acceptable to pilots, while still being compatible with current flight deck procedures and operations, the DVI application needs to meet a number of challenging requirements, including:

  • Standard phraseology – The wording and structure of the voice commands should be based on standard phraseology that is used between pilots in the cockpit, as well as between the crew and ATC. This reduces the amount of training that would be required for pilots to use the DVI application, and makes it easier to recall specific instructions.

  • Flexibility – The application should be able to cope with common variations of the same word or command. For example, the number 250 in the command ‘SET SPEED 250’ can be expressed in different ways such as ‘TWO HUNDRED FIFTY’, ‘TWO FIFTY’ and ‘TWO FIVE ZERO’. Similarly, certain words may be omitted from a command or the order of words may differ, without affecting the meaning of the command.

  • Robustness and recognition accuracy – The application should have a high speech recognition rate (of at least 98%) and should be able to cope with the noise levels of a typical flight deck environment. Ideally, the application should be speaker independent such that it can recognize different voices and cope with variations in accent, pitch, intonation and other speech characteristics. This is important since flight crew members change between flights. Alternatively, if a speaker dependent solution is used, users need to log onto the system in order to identify themselves and ensure that the correct voice profile is used by the application.

  • Response time – The DVI application should recognize commands in a timely manner (ideally within 200 ms) and provide adequate visual and/or aural feedback to the user. This is especially important in critical situations and when the pilot needs to react quickly to ATC instructions.

  • Safety – Depending on the nature of the command and the criticality of the situation at hand, a voice command may need to be confirmed by the pilot prior to execution. This confirmation introduces a delay but may be necessary to ensure that the correct command will be executed. The research project described in this work is investigating whether command confirmation should be mandatory in all cases or not. In order to enhance safety, the DVI application should also validate the voice commands and check that they are consistent with the current state of the aircraft. The user should be informed if an invalid command is detected.

3 Voice Commands for Tactical Control

Voice commands were defined for various aspects of tactical flight control (i.e. autopilot control via the FCU), including: switching the autopilot ON or OFF, engaging different autopilot modes, and selecting target values for speed, heading, altitude and vertical speed. A non-exhaustive list of voice commands for autopilot control is presented in Table 1 together with some examples which demonstrate the command syntax used.

Table 1. Voice commands for autopilot control

From Table 1 it can be observed that there are various instances where the exact same function can be performed using different voice commands. For instance, when setting an aircraft speed of 250 knots, one of (at least) four different commands can be used. This is representative of the variations that can be found in practice. It can also be observed that, apart from setting the value of an autopilot parameter, the user can also engage autopilot flight modes. For instance, in the case of heading and speed, the user can switch between Selected mode (in which case the autopilot follows a user-selected heading or speed) and Managed mode (in which case the autopilot follows the FMS plan). In the case of altitude, the user can engage an Open or Managed climb/descent mode. For a Managed climb or descent, the autopilot follows the FMS plan. For an Open climb, the auto-throttle is set to full climb thrust whereas, for an Open descent, the auto-throttle is set to idle thrust. Furthermore, any constraints on the FMS plan are disregarded during an Open climb or descent.

The ‘EXECUTE’ command is issued by the pilot after each of the voice commands shown in Table 1. This enables the pilot to confirm that the voice command has been correctly identified by the DVI application.

4 DVI Application

The DVI application is composed of two main modules, the first of which is a speech recognition module. For this project, the commercially available Dragon Naturally-Speaking Premium 13.0 voice recognition engine by Nuance® was used. This is a speaker dependent speech recognition system which uses advanced machine learning algorithms (such as Deep Neural Networks (DNN)) to recognize speech and is able to adapt and learn in order to improve its recognition accuracy.

The speech recognition software comes with a default dictionary that can be updated by the user. It also has a training sub-module that can be used to train the software to recognize individual words. In fact, during the development of the DVI application, the speech recognition software was trained to recognize each of the words used in the voice commands. Furthermore, the dictionary was modified by removing words which sounded similar to those used in the voice commands. This was done in an attempt to improve the recognition accuracy of the software.

The second main module of the DVI application is a custom-built software package which processes the output of the speech recognition module in order to identify specific voice commands. This is essentially done by checking whether the words, format and syntax of phrases spoken by the user match one of the predefined voice commands for autopilot control. If a match is found, the software checks the validity of the command, such as by checking that any numerical values are within a certain range, depending on the current aircraft configuration. For instance, the upper and lower limits of target speed vary dynamically with flap setting. The software also checks for inconsistencies in the commands. For example, if the user issues the command ‘DESCEND FLIGHT LEVEL 100’ but the aircraft is at flight level 50, an inconsistency is detected.

The DVI module is activated (and therefore listens to voice commands) only while the user presses the Push-to-talk (PTT) button. The speech that is detected by the application is displayed to the user. If the voice command is invalid, the user is informed by means of the error message ‘INVALID COMMAND’. If the command is valid but the parameter being set (e.g. speed) exceeds certain limits, the error message ‘INVALID RANGE’ is shown. On the other hand, if the command is recognized by the system and is valid, it is read back to the user via the headset. The user can then confirm the command by saying ‘EXECUTE’. This will trigger the DVI application to transmit the command to the flight simulation platform. In this research, X-Plane is used as the flight simulation environment and the aircraft model is the Airbus 320 New Engine Option (A320neo).

The voice commands are defined in a script file, together with the associated actions and any preconditions and/or constraints. The user can easily modify this script file without having to recompile the source code. The following is a small section of the script file which defines commands for setting an absolute heading:

In this sample of the script file, four variations of the same heading command are defined between curly brackets. The target heading is represented by ‘#x#’ and is constrained to integers between 0 and 360. If the command is valid and is confirmed by the user, the target heading is loaded into the variable Target Heading which is transmitted to the A320neo model in X-Plane. The target heading will then appear on the simulator’s FCU and PFD and the aircraft will start turning towards that heading.

5 Evaluation

In order to get user feedback on the DVI application before extending it to other functional areas (including the FMS and communication system), a preliminary evaluation was carried out with a number of commercial airline pilots, with one pilot per evaluation session. Each evaluation consisted of a briefing, an acclimatization session, some test scenarios, and a debriefing.

During the briefing, the pilots were first given an overview of the scope and objectives of the project and the evaluations. Then they were asked to complete a short questionnaire about their flying experience. After the briefing, the pilots were shown how to use the DVI application; then, they were provided with a headset and told to train the speech recognition software by reading out loud a list of words which formed part of the voice commands. Following that, they were allocated some time to practice giving the application voice commands.

The main part of the evaluation session consisted of a number of test scenarios where the evaluation pilot was given instructions by one of the researchers who acted as a pseudo Air Traffic Controller (ATCo). These included instructions to adjust the aircraft speed, heading, altitude and vertical speed. After each instruction, the evaluation pilot had to read the instruction back to the ATCo (according to standard procedure) while pressing the PTT button, thereby activating the DVI application. If the application recognized the instruction correctly and considered it to be valid, the pilot had to confirm it and then the command was executed.

Following the test scenarios described above, the evaluation pilot was asked to close his eyes while the aircraft was initialized in an unusual attitude. The pilot was then asked to open his eyes and recover the aircraft using voice commands. This test was repeated twice. When the tests were complete, a debriefing was carried out and the evaluation pilot was asked to complete a questionnaire related to various aspects of the DVI application.

During the evaluations, qualitative data was gathered by means of questionnaires, video recordings, semi-structured interviews and direct observation.

6 Results and Discussion

Three professional civil air transport pilots flying part 25 certified aircraft (Airbus A319/A320/A321) participated in the evaluations. Their flying experience ranged from six years to over20 years and their age ranged from 33 to over 40 years. Two of the pilots were male first officers while the third pilot was a female captain. Only one of the pilots had prior experience of using applications based on voice recognition. A photo showing the pseudo ATCo with one of the pilots is presented in Fig. 2.

Fig. 2.
figure 2

The pseudo ATCo (left) and evaluation pilot (right) during one of the evaluation sessions

In general, the pilots agreed with the idea of having voice control in the cockpit as an additional mode of interaction. Two of the pilots said that this would be particularly useful in abnormal or critical situations, such as in the event of smoke in the cockpit. When asked to rate the applicability of voice control to particular avionic systems and functions, the pilots assigned the ratings given in Table 2. From this table it can be observed that the pilots felt that voice control is most suited to the FMS and the communications system, and least suited to the manipulation of flight controls.

Table 2. Ratings assigned by the evaluation pilots for the application of voice control for different systems or functions (1 = totally disagree, 5 = fully agree)

In the case of the FMS, one of the reasons for the high score could be the fact that, with current flight deck interface technology, flight plan modifications (such as the addition of a waypoint) require the pilot to navigate through various menus and press multiple buttons on the CDU. In contrast, with voice control, the same operation could potentially be performed with a single voice command. One of the pilots suggested that voice control could also be used to transmit messages via the Aircraft Communications Addressing and Reporting System (ACARS).

The pilots were also asked to rate various aspects of the DVI application Table 3. As can be observed from this table, the pilots assigned a score of 3 or more to the majority of the aspects of the DVI application, with the recognition accuracy and responsiveness of the application scoring the least. The following paragraphs explain some of the reasons behind these scores and discuss potential solutions to improve the performance of the DVI application.

Table 3. Ratings given by the evaluation pilots for various aspects of the DVI application (1 = poor, 3 = acceptable, 5 = excellent)

As explained previously, the speech recognition software had to be trained to recognize the voice of the evaluation pilot before the beginning of the test scenarios. This was done by reading out loud each of the words used in the voice commands and checking that they were correctly identified by the system. As expected, in each case there were a number of words which were either recognized intermittently or not at all. In this case, the first step was to train the speech recognition module by repeating each of the words individually for a number of times. If, after the training, the software was still unable to recognize a particular word (e.g. maintain), the dictionary was modified manually by removing any similar-sounding words (e.g. maintenance) which were not being used in any of the voice commands.

The steps described above improved the recognition accuracy of the application during the acclimatization phase; however, during the actual test scenarios, a number of words were still identified incorrectly. This suggests that the training phase of the speech recognition software was not sufficient and that more time was required to enable the software to adapt to (and learn) the user’s voice.

Another solution to improve the performance of the DVI application is to add context to the speech and command recognition process. For instance, certain groups of words are always spoken together in a particular command (such as ‘OPEN CLIMB’ or ‘FLIGHT LEVEL’). In this case, it is possible to train the speech recognition software to recognize a whole phrase (group of words) rather than individual words. Also, in the case of similarly-sounding words, the DVI application can correctly identify an ambiguous word by taking into account any words which are spoken before or after that word. For example, if the speech recognition software outputs ‘10 LEFT HEADING 1-3-5’ (where ‘TURN’ is incorrectly recognized as ‘10’ due to the similarity between the two words), the DVI application can examine the whole command and determine that ‘10’ is out of context and consequently replace it with ‘TURN’.

Another possible solution to improve the accuracy of speech recognition is simply to use a higher quality microphone with noise-cancelling properties. This would reduce the impact of any background noise on the speech recognition process.

The DVI application is currently designed to wait for pilot confirmation (via the ‘EXECUTE’ command) before executing a voice command. Although this adds a level of safety to the system, all of the pilots agreed that this feature is not desirable or justified in time-critical situations where quick pilot reactions are essential. For instance, when recovering from a stall (or any other upset condition), the ‘STOP’ command would be sufficient and should be executed immediately. The pilots also suggested that the wording could be changed from ‘STOP’ to ‘RECOVER’ or ‘LEVEL OFF’ since the intention of either of these commands was clear and unambiguous.

The DVI application rated quite well in terms of phraseology; however, the pilots felt that this aspect could be improved by allowing for more flexibility during read back of ATC instructions. One of the pilots also suggested that, when the command is read back to the pilot by the DVI application prior to execution, the phraseology used could be similar to that of the Flight Mode Annunciator (FMA)Footnote 3. This would make it easier for the pilot to confirm that the correct command will be executed.

An issue that was occasionally observed during each of the evaluation sessions was that pilots either forgot to press the PTT button before issuing a voice command, or released the PTT button too early (i.e. before the text corresponding to the command appeared on the display). One possible solution to this problem is to replace the functionality of the PTT button with a dedicated voice command.

7 Conclusion

This paper presented a prototype application based on DVI technology which enables pilots to interact with the autopilot by means of voice commands. The application was evaluated with the participation of commercial airline pilots and the overall feedback was positive. Several suggestions for improvement were made and a number of potential solutions were identified.

The DVI application is part of a bigger solution that is designed to provide pilots with multiple modes of interaction with avionic systems. The next steps will focus on the application of DVI to other functional areas (including flight management, communication, and checklist execution), the integration of voice control with the touchscreen interface developed in [6], and the evaluation of the complete integrated solution in a representative cockpit environment.