Adapting Instruction by Measuring Engagement with Machine Learning in Virtual Reality Training

Bell, Benjamin; Kelsey, Elaine; Nye, Benjamin; Bennett, Winston (“Wink”)

doi:10.1007/978-3-030-50788-6_20

Benjamin Bell¹⁰,
Elaine Kelsey¹⁰,
Benjamin Nye¹¹ &
…
Winston (“Wink”) Bennett¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12214))

Included in the following conference series:

International Conference on Human-Computer Interaction

1768 Accesses

Abstract

The USAF has established a new approach to Specialized Undergraduate Pilot Training (SUPT) called Pilot Training Next (PTN) that integrates traditional flying sorties with VR-enabled ground-based training devices and data-driven proficiency tracking to achieve training efficiencies, improve readiness, and increase throughput. Eduworks and USC’s Institute for Creative Technologies are developing machine learning (ML) models that can measure user engagement during any computer-mediated training (simulation, courseware) and offer recommendations for restoring lapses in engagement. We are currently developing and testing this approach, called the Observational Motivation and Engagement Generalized Appliance (OMEGA) in a PTN context. Two factors motivate this work. First, one goal of PTN is for an instructor pilot (IP) to simultaneously monitor multiple simulator rides. Being alerted to distraction, attention and engagement can help an IP manage multiple students at the same time, with recommendations for restoring engagement providing further instructional support. Second, the virtual environment provides a rich source of raw data that machine learning models can use to associate user activity with user engagement. We have created a testbed for data capture in order to construct the ML models, based on theoretical foundations we developed previously. We are running pilots through multiple PTN scenarios and collecting formative data from instructors to evaluate the utility of the recommendations OMEGA generates regarding how lapsed engagement can be restored. We anticipate findings that validate the use of ML models for learning to detect engagement from the rich data sources characteristic of virtual environments. These findings will be applicable across a broad range of conventional and VR training applications.

You have full access to this open access chapter, Download conference paper PDF

Helping Instructor Pilots Detect and Respond to Engagement Lapses in Simulations

Virtual Performance-Based Assessments

Automatic engagement estimation in smart education/learning settings: a systematic review of engagement definitions, datasets, and methods

Article Open access 12 November 2022

Keywords

1 Introduction

1.1 Problem Summary

To support a complex array of current and envisioned missions, the U.S. Air Force trains and educates a large uniformed workforce across diverse Air Force Specialty Codes (AFSCs). Early-stage training in many AFSCs presents Airmen and officer trainees with large corpora of foundational knowledge to master, delivered through a variety of modalities. This content is taught at technical training or flight schools in courses lasting up to several months. Because motivation is a necessary element for learning retention and force readiness, Air Force education and training stakeholders continue to look for ways to engage their constituencies, through gamified interactive learning, simulation, and the use of mixed-reality immersive environments.

To maintain an engaged, motivated cadre of Airmen, OMEGA, via a service-oriented, general-purpose appliance operating in concert with the learning environment, will recommend interventions when end-users are showing signs of a lapse in engagement. These services will be accessible to any AF instructional system, which can automatically or through a human instructor locally effect OMEGA’s recommendations. OMEGA will enable AF and DoD learning environments to identify and adapt to detected lapses in engagement, promoting greater motivation, retention, and readiness.

1.2 Project Context: Pilot Training Next

The test case for OMEGA, selected by the Air Force, is Pilot Training Next (PTN). PTN, now training its third iteration of pilots, aims to the reduce the time and cost of undergraduate pilot training (UPT) by leveraging virtual and augmented reality, biometrics, AI, and data analytics. While PTN student pilots fly the same hours and sorties in the T-6 as do their legacy pilot training counterparts, the ground-based training is done using an immersive PC-based flight simulator (Lockheed Martin’s Prepar3D®), VR headset (HTC’s VIVE™ Pro), stick, throttle and rudder pedals, and a syllabus of PTN-specific scenarios (Fig. 1).

PTN has maintained a heavy emphasis on data collection and analysis. Reliable and predictive metrics are needed not only to assure instructors and higher command that student pilots are achieving the same skills as their legacy training counterparts, but also because PTN students qualify for graduation on the basis of achievements rather than on calendar time. This fundamental change in how students progress requires an abundance of data that supplement instructor evaluations to ensure skill mastery.

During scenarios flown in the simulator, objective data is readily captured for every time interval, such as aircraft state (position, attitude, airspeed) and configuration (aileron, rudder and elevator deflections; flap and gear positions). Instructors can also monitor how the scenario is progressing overall and can provide verbal feedback in real-time.

Some important metrics though are less directly observable or measurable. Student engagement is widely accepted as a critical mediating factor in both learning retention [1] and learning outcomes [2]. The importance of engagement is not lost on instructors (though labels like attention and focus are more common in pilot training), and is often the basis for, or at least an element of, scoring situational awareness (SA).

Two additional factors make engagement even more salient for PTN. First, instructors have less visibility into student engagement (due to the VR headset) than in conventional simulators. Second, a vision for PTN is for one instructor to be monitoring multiple students simultaneously. Indirectly-observable measures such as engagement will thus require some level of automation support to cue instructors when lapses are detected.

2 Modelling Engagement

2.1 Conceptual Model of Engagement

In training sorties flown in the aircraft or in conventional simulators, an instructor pilot (IP) monitors the student pilot’s performance. Maneuvers are evaluated by observing air speed, vertical speed, attitude, angle of attack, and so on. Instructors are also interested in situational awareness (SA), which they can assess by observing the student’s ability to “stay ahead of the airplane”: anticipating upcoming changes in heading, airspeed, or altitude; applying smooth control inputs to adjust bank angle, pitch and power; and maintaining proper scan of the flight instruments.

Several theoretical frameworks for characterizing engagement informed our model of engagement. We created an inventory of nine relevant engagement and disengagement models from the literature that emphasize behavioral indicators (e.g., data from log files or from direct queries to the user) [3]. These included Intrinsic vs. Extrinsic [4]; Two Factor Hygiene-Motivator Theory [5]; Motivators from Maslow’s Hierarchy (Ibid); Achievement Goal Theory [6]; D’Mello & Graesser’s Engagement model [7]; and Baker’s indicators of passive vs. active disengagement [8]. From this we synthesized a multi-timescale engagement and motivation model [9].

More recently, we refined the model to reflect the aviation focus of this project, preferring metrics associated with event response tasks (e.g., maneuvering to avoid a new hazard) and monitoring tasks (e.g., maintaining straight and level attitude). We incorporated significant research conducted to identify indicators of distraction and disengagement for accidents attributed to loss of control and airplane state awareness [10, 11]. A subset of these states is relevant to flight tasks performed in simulated environments:

Attention vs. Distraction: Situational awareness was particularly reduced by the induction of diverted attention. Channelized attention or attentional tunneling also indicated loss of situational awareness [10];
Boredom and Distraction: Distraction can be characterized by any time without interaction with the system; engaged pilots interacted with the system to optimize performance, even when this was not required to meet performance requirements [12];
Attentional Tunneling: Attentional tunneling is indicated by lack of interaction with one or more system elements, coupled with strong interaction with another element [11];
Vigilance: Diligence, distraction and daydreaming all lead to failures in practical monitoring tasks [13].

The model resulting from this additional analysis is shown in Fig. 2. The resulting expert model is based on eight input metrics, used to compute three mid-level features: Performance, Efficiency, and Responsiveness. An overall composite measure of current engagement within a given data window is derived from these mid-level features. The relative weights and derivation methods shown in Fig. 2 represent the initial trial conditions. We anticipate that these will be refined through additional testing.

2.2 Engagement Metrics and Virtual Reality

A desktop flight simulator generates a rich set of data, including aircraft position, attitude and configuration. From such data, objective performance metrics can be calculated with some reliability. For instance, detecting when a student pilot lowers the gear while the airspeed exceeds the maximum gear-down speed is straightforward. To monitor engagement, however, requires aggregating observable measures to generate an indirect estimate of engagement. Our model, for instance, specifies eight such indirect measures.

The addition of a VR head-mounted display (HMD) adds additional data points that could be incorporated as part of a suite of metrics to monitor engagement levels. Typical VR headset and sensors can capture head position and movement; higher-end devices, such as the VIVE Pro Eye, can capture eye tracking data. The VR environment thus adds to the already rich data stream available from the simulator. This apparent abundance of data, however, does not solve the problem of developing reliable measures of engagement. Several challenges for interpreting the data remain, including, non-exhaustively:

1.
Understanding which data points are relevant to engagement;
2.
Setting proper coefficients representing how each data point should be weighted;
3.
Distinguishing between and properly applying a single data point x observed at time t compared with a trend of how x behaves over some interval (e.g., from t − 5 s to t + 5 s).
4.
Incorporating the velocity of the change in a data point, for instance, how abrupt an aileron deflection or throttle movement the student applied.

A principal emphasis of this work is to explore the role that machine learning models could play in interpreting simulator and VR device data in order to develop measures to drive our conceptual model of engagement. This machine learning approach is summarized in the next section.

3 Machine Learning Approach

3.1 Machine Learning Model

We employ machine learning to allow OMEGA to develop more accurate predictive associations between raw data inputs and higher-level aggregated engagement metrics. This section describes the techniques and architecture of the OMEGA machine learning component. Our design leverages the underlying data streams available from Prepar3D to provide better predictive power in situations where there is limited access to interpreted data (e.g. when interpreted metrics of event occurrence, event success/failure, and efficiency are not available). To achieve this, we employ three methods in combination:

1.
We use standard machine learning techniques to attempt to accurately predict engagement and disengagement in input metric sequences. These approaches are attractive because they enjoy fast estimation methods with low run-time, and therefore can provide near-instant feedback to instructors. Based on the features and data available in Prepar3D, we have selected Support Vector Machine (SVM) and Binomial Regression as the most applicable approaches. These techniques are most powerful in cases where sequence classification is not strongly context-dependent. For OMEGA, however, we expect some context-dependence in the data. For example, rapid adjustments of heading, altitude and airspeed may represent recovery from a period of inattention if these maneuvers occur between waypoints, but may represent an attentive reaction if observed during an event requiring active response (e.g. a heading change when passing a waypoint). To mitigate this risk and to improve the model, we deploy two additional “deep learning” layers of machine learning that are more robust to sequence classification in context-dependent data.
2.
We use a form of deep learning called bidirectional long short term memory (BD-LSTM), a type of recurrent neural network, to produce improved results in sequence classification problems that are heavily context-dependent. In this case, determining whether a given sequence of composite low-level metric readings represents disengagement is likely to be highly context-dependent, for instance a climb at full power versus a climb at normal cruise speed. We us bidirectional LSTMs, which consider the ‘context’ of both the preceding and following time slice data when predicting disengagement, to reduce the incidence of false-positive detection of disengagement in this environment. The tradeoff for improved disengagement and inattention detection is the high resource and time cost of maintaining bidirectional LSTM in OMEGA. Recurrent Neural Networks (RNNs) like LSTM are can be difficult to train due to memory-bandwidth-bound computation limitations. For this reason, we have selected a second deep learning approach in case the performance requirements present too much risk.
3.
We use an alternative deep learning approach called Attention-based Modeling as the third layer in OMEGA’s machine learning stack. Attention-based models are sequence-to-sequence models designed to improve performance of RNN-based approaches. This third layer provides an alternative mechanism in cases where bidirectional LSTM is too resource-intensive to be effective in real-time.

3.2 Training the Model

Data for training the machine learning components derives from experimental subjects who fly a pre-selected set of PTN scenarios in a data collection station that mirrors most of a PTN simulator, namely, the simulation software, stick and throttle, and VIVE Pro HMD and sensors. The data collection station also includes a dedicated application for the experimenter to monitor each scenario, interact with the subject, and time-stamp relevant events. Figure 3 shows an experimenter and subject during a data collection session.

For purposes of creating training data for the machine learning models, experimenters are trained in a protocol to (1) time-stamp lower-intensity and higher-intensity segments of a scenario, to help the models account for workload in processing measures of user activity; and (2) engage the subject in conversation at specific points during a scenario. Conversing with the subject acts as a surrogate for disengagement. We posit that loss of attention, or distraction, will be statistically detectable in the simulation log files. Specifically, we anticipate three possible types of deviations:

1)
Response Time: Most dominantly, we anticipate that subjects’ time to respond to changes in the environment will be slower and/or less precise when engaging in conversation. Specifically, we anticipate a longer duration with no response after an event that requires a maneuver (e.g. heading change), followed in some cases by an initial control input that is more abrupt, more prone to overcorrect, or may even be in the wrong direction.
2)
Performance: We anticipate more likely failure to accomplish scenario goals (e.g., missing required waypoints).
3)
Efficiency: We anticipate that periods of distraction will tend to be less efficient, due to the above issues and due to less precise control over the aircraft (e.g., slower damping of over-correcting heading changes).

Subjects were recruited from flying clubs in the Corvallis, OR and Los Angeles regions. Subjects qualified for the study through meeting either flying hour criteria or flight simulator experience criteria. Each subject was given a practice period with the PTN station and then asked to complete six PTN scenarios. The collected data are being used to train the machine learning model, comparing both the predictive power and the latency and resource requirements for each of our deep learning modeling techniques.

4 Adaptive Instruction

4.1 Adaptive Recommendations Model

OMEGA processes detect engagement levels to generate adaptive recommendations to help an instructor restore lapsed engagement. During a scenario, based on combinations of different state signals, OMEGA will generate a set of intermediate inferences. These inferences include, for example, whether poor performance is due to consistently bad results versus irregular behavior or inconsistency (e.g., carelessness). The model employs both the basic state model and the aggregated inferences as inputs to calculate a scoring ranking for different adaptive interventions.

Our model considers three levels of outcomes: performance, responsiveness, and efficiency, each representing a distinct dimension of quality. Performance represents the basic ability to complete the assigned tasks, based on the performance criteria for those tasks (e.g., following a set of waypoints). Responsiveness represents the speed and effectiveness for a learner to adjust to new tasks or requirements (e.g., if a waypoint is moved, how quickly does the user adjust heading). Efficiency represents lean and strategic use of resources to complete a scenario (e.g., faster completion times).

These quality criteria can each be thought of as building upon each other: a learner must adjust heading to a new waypoint or else there is no way to determine responsiveness. Likewise, efficiency is impossible if the user is not responsive enough to stay on course. This means that only some factors should be addressed with certain types of learners (e.g., high vs. low expertise). For example, if a user is failing to master proper take-off procedures, critiquing fuel efficiency would add no training value. On the other hand, an otherwise high-performing student pilot who is drifting off-course or leaving assigned altitudes may benefit from noting a need for improved in-flight checks.

The policy for adaptive interventions is depicted schematically in Fig. 4. The right-hand side of Fig. 4 shows the interventions proposed for PTN. These include three distinct types of interventions: Messaging (Information about the task), Motivation (Context about the task and learning goals), and Recommendations (Suggestions on different tasks or breaks to improve learning). The left-hand side of Fig. 4 outlines a high-level policy for when specific interventions are expected to be appropriate for users with different skill levels and in different states. These connections between intervention types and student states do not represent the actual model. Instead, they represent key dynamics that the model will produce. However, since the actual state space to calculate an effective intervention policy is too large to easily convert into a short graph, this model captures the key behaviors that the intervention model will be tested against, to ensure it behaves reasonably versus what would align to theoretical frameworks for engagement and responding to disengagement. The intervention types are outlined in Table 1.

Table 1. Intervention types for PTN

Full size table

4.2 Generating Adaptive Recommendations

We propose two distinct methods for generating adaptation recommendations based on the internal state of the models used to measure engagement. The first approach is much less computationally-intensive and will produce recommendations with lower latency. However, given the highly contextualized nature of the input data, we expect the second approach to produce more accurate results. Work is currently in-progress for testing the trade-offs between timeliness and quality under different simulation conditions.

In the first approach, we use the calculated values from the engagement model (performance, efficiency, and responsiveness) as inputs to a machine learning classification model. Using the labeled data set produced during the data collection trials discussed above, we apply several traditional machine learning modeling techniques to the classification task, where the outputs are the available set of adaptations available in the PTN training environment. We use both Naive Bayes and Support Vector Machine (SVM) approaches. Since the input metrics include variables that are highly interdependent, we expect that SVM will yield superior results. SVMs have been demonstrated to predict the likelihood of learner withdrawal from online courses, for example [14]. These techniques do not account for the contextual nature of the data, instead analyzing each time slice as a separate case. As was the case for detecting engagement levels, the determination of an appropriate adaptation will depend on the context in which the disengagement event occurs, as well as on the environmental conditions being simulated.

We define context as the data stream of pilot behaviors and actions preceding and following a time slice, and environment as the set of conditions that obtain for that particular segment of the simulation (e.g. aircraft attitude, airspeed, status of systems). In order to fully account for the contextual nature of both disengagement detection and the recommendation of an appropriate adaptation, we are developing a second, more powerful model for capturing temporal information and learning high-level representations hidden in the metric data stream based on Artificial Neural Networks (ANN).

Much of the research in applying ANN to the interpretation of sensor data streams has been focused on traditional neural network approaches, such as feed-forward neural networks (FFNN) and deep convolutional neural networks (CNN). Al-Shabandar, et al. [15] employed a range of machine learning models including ANNs to investigate factors driving student motivation in massively-online open courses (MOOCs). Recent success of recurrent neural networks (RNN) with long short-term memory (LSTM) in other applications has led to promising trials of this approach in using sensor data to predict highly contextual operational states. RNNs have been used for, among other applications, associating student engagement with outcomes in MOOCs [16].

Variations of this approach have recently explored incorporating operational conditions into the predictive model. These approaches use several BD-LSTM models and a final FFNN layer to integrate both the contextual information encoded in the data stream (representing the sensor data) and the available operating context and environmental data. The model we have designed adapts this approach to the interpretation of Prepar3D data for (1) predicting pilot engagement and detecting disengagement; and (2) using context about the nature of the disengagement to predict the most effective adaptation to recommend to the human instructor in the context of PTN training.

The model is composed of several stacked layers of ANNs. The first BD-LSTM network extracts latent features from the multiple metric data streams describing pilot behavior. The second BD-LSTM network extracts latent features from the metric data stream describing aircraft movement, and the third BD-LSTM network extracts higher level features describing the operational environment. These layers are stacked with a final neural network layer to predict pilot disengagement level and events, depicted schematically in Fig. 5. The states of the internal layers of these BD-LSTM networks are then used as the input into a separate recommendation model to predict the most appropriate adaptation in a given context. The recommendation model layer is a CNN network, which will be trained using the labeled data set from the pilot trials.

5 Conclusions

We have concluded data collection and will present our results from the machine learning model development during the conference. A formative evaluation using Air Force instructor pilots to provide feedback to OMEGA’s recommendations will immediately follow the model development. Simulations enhanced with VR provides immersive training that promises to advance learning outcomes and retention. A key factor in achieving positive results is learner engagement, which is more challenging to assess than directly observable or objectively measurable factors. In some instances, the VR environment itself can obscure cues relevant to learning engagement from instructor view. OMEGA addresses this gap by using machine learning models to develop predictive associations between simulation events and learner actions on the one hand, and learner engagement on the other. OMEGA also incorporates a model of adaptive interventions to remedy engagement lapses, and employs machine learning to develop associations between the context and environment of the engagement lapse and the optimal intervention to recommend.

Our results will provide concept validation to establish more general-purpose, service-oriented appliance that client learning applications can employ for detecting lapses in engagement and motivation, and for recommending adaptive interventions. OMEGA can thus address a need, across the service branches, to ensure that simulation-based training, and training incorporating VR, results in engaged and motivated warriors, using adaptive instruction and providing data to help training managers track the efficacy of new technologies and paradigms.

References

Hu, P.J.H., Hui, W.: Examining the role of learning engagement in technology-mediated learning and its effects on learning effectiveness and satisfaction. Decis. Support Syst. 53(4), 782–792 (2012)
Article Google Scholar
Chi, M.T., Wylie, R.: The ICAP framework: linking cognitive engagement to active learning outcomes. Educ. Psychol. 49(4), 219–243 (2014)
Article Google Scholar
Core, M.G., Georgila, K., Nye, B.D., Auerbach, D., Liu, Z.F., DiNinni, R.: Learning, adaptive support, student traits, and engagement in scenario-based learning. In: I/ITSEC, 2016 (2016)
Google Scholar
Porter, L.W., Lawler, E.E.: Managerial attitudes and performance (1968)
Google Scholar
Gawel, J.E.: Herzberg’s theory of motivation and Maslow’s hierarchy of needs. Pract. Assess. Res. Eval. 5(11), 3 (1997)
Google Scholar
Pintrich, P.R.: Multiple goals, multiple pathways: the role of goal orientation in learning & achievement. J. Educ. Psychol. 92(3), 544 (2000)
Article Google Scholar
D’Mello, S., Graesser, A.: Dynamics of affective states during complex learning. Learn. Instruct. 22(2), 145–157 (2012)
Article Google Scholar
Baker, R.S., Corbett, A.T., Roll, I., Koedinger, K.R.: Developing a generalizable detector of when students game the system. User Model. User-Adap. Interact. 18(3), 287–314 (2008)
Article Google Scholar
Bell, B., Kelsey, E., Nye, B.: Monitoring engagement and motivation across learning environments. In: Proceedings of the 2019 MODSIM World Conference, Norfolk, VA (2019)
Google Scholar
Harrivel, A.R., et al.: Prediction of cognitive states during flight simulation using multimodal psychophysiological sensing. In: AIAA Information Systems-AIAA Infotech, p. 1135 (2017)
Google Scholar
Wickens, C.D.: Attentional tunneling and task management. In: 2005 International Symposium on Aviation Psychology, p. 812 (2005)
Google Scholar
Cummings, M.L., Mastracchio, C., Thornburg, K.M., Mkrtchyan, A.: Boredom and distraction in multiple unmanned vehicle supervisory control. Interact. Comput. 25(1), 34–47 (2013)
Article Google Scholar
Casner, S.M., Schooler, J.W.: Vigilance impossible: diligence, distraction, and daydreaming all lead to failures in a practical monitoring task. Conscious. Cogn. 35, 33–41 (2015)
Article Google Scholar
Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N.: Predicting MOOC dropout over weeks using machine learning methods. In: Proceedings of EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, pp. 60–65 (2014)
Google Scholar
Al-Shabandar, R., Hussain, A.J., Liatsis, P., Keight, R.: Analyzing learners behavior in MOOCs: an examination of performance and motivation using a data-driven approach. IEEE Access 6, 73669–73685 (2018)
Article Google Scholar
Piech, C., et al.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, pp. 505–513 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Eduworks Corporation, Corvallis, OR, USA
Benjamin Bell & Elaine Kelsey
USC Institute for Creative Technologies, Playa Vista, CA, USA
Benjamin Nye
Warfighter Readiness Research Division, 711 HPW/RHA, Wright-Patterson AFB, OH, USA
Winston (“Wink”) Bennett

Authors

Benjamin Bell
View author publications
You can also search for this author in PubMed Google Scholar
Elaine Kelsey
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Nye
View author publications
You can also search for this author in PubMed Google Scholar
Winston (“Wink”) Bennett
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Bell .

Editor information

Editors and Affiliations

Soar Technology, Inc., Orlando, FL, USA
Robert A. Sottilare
Fraunhofer FKIE, Wachtberg, Germany
Jessica Schwarz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bell, B., Kelsey, E., Nye, B., Bennett, W.(. (2020). Adapting Instruction by Measuring Engagement with Machine Learning in Virtual Reality Training. In: Sottilare, R.A., Schwarz, J. (eds) Adaptive Instructional Systems. HCII 2020. Lecture Notes in Computer Science(), vol 12214. Springer, Cham. https://doi.org/10.1007/978-3-030-50788-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-50788-6_20
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50787-9
Online ISBN: 978-3-030-50788-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics