Keywords

1 Introduction

Simulation-based learning for the acquisition and maintenance of skills plays an increasingly integral role in most medical training curricula. Reasons for this shift in teaching and assessment include changes in health care delivery resulting in academic environments where patient-based teaching is limited, a desire to reduce the likelihood of medical errors and enhance patient safety, and a paradigm shift to competence-based demonstrations of performance, for which simulation technology enables objective performance measurement (Scalese et al. 2007).

Fortunately, momentum has gained in recent years to establish proficiency-based criteria for specific medical tasks. This is a good first step towards quantifying the return on investment of simulation use for training acquisition purposes and ensures each individual is competent to pass the program when specific criteria are met. Unfortunately, little guidance exists regarding the duration of acquired skills or the appropriate dosing of simulation experiences – meaning how much simulation is optimal, what type of simulation would best achieve goals, and when should those training experiences optimally be delivered. It is our argument that sustainment should also be proficiency-driven, meaning that individual training schedules should be determined on the basis of when individuals skills are predicted to decay below acceptable criteria. This paper seeks to present several case studies demonstrating the ability to personalize and tailor medical simulation training around individual learning needs.

Within the Air Force, medical sustainment requirements are arguably even more acute. A primary emphasis of the Air Force Medical Service (AFMS) is to maintain the readiness of personnel to perform wartime healthcare. Over the past decade, the AFMS has been successful in achieving these goals because clinicians routinely care for combat casualties in a wartime environment. Of considerable concern is the impact that the recent decreased military operations in Iraq and Afghanistan will have on sustaining readiness skills. Further, because skill retention across an array of medical tasks tends to decay between three and six months if not practiced (Stefanidis et al. 2006; Oermann et al. 2011; Tuttle et al. 2007), it is likely the case that Air Force nurses attempting to demonstrate competency every 24 months will find that their skills not only have decayed below reasonable levels, but will need to be reacquired.

1.1 The Simulation Environment

Use of effective simulation is key towards helping nurses both acquire and sustain performance effectiveness when real-world exposures and opportunities are lacking. For these reasons, a number of simulators have been validated as training tools and simulation centers have been established in many educational and training institutions and environments (Fried et al. 2004; Korndorffer et al. 2006). Simulation provides participants with the opportunity to practice clinical judgment and apply problem solving skills in a risk-free, replicable clinical environment (Jha et al. 2001; Rosen 2008; Ziv et al. 2000; Prion 2008). Further, medical simulation technology provides a platform for trainees to practice and acquire relevant clinical skills that will later translate to clinical outcomes on patient care (Dunn et al. 2004; Tekian et al. 1999).

In the specific case of laparoscopic surgery, evidence for benefits of using simulation technology is even more compelling. Laparoscopic surgery imposes specific hindrances such as the loss of 3-dimensional visualization, loss of tactile feedback, and counterintuitive instrument movement (Ahlberg 2007). Given these challenges, a conventional apprenticeship model of skill development does not fit well. Further, during the period before technical proficiency has been reached, the risk for complications is greatly increased (Moore et al. 1995; Deziel et al. 1993; Joice et al. 1998). Thus, it is critically important to objectively assess the performance levels of trainees as they acquire laparoscopic surgical skills, to mitigate risk and maximize patient safety.

Despite advances in objective performance measurement, simulation technology, proficiency-based curricula, it is unfortunately the case that few guidelines exist for how simulation may be used most effectively, what type of simulation platform would be most beneficial for specific individuals, whether specific skills trained will transfer to the clinical environment, and the longevity of specific skills trained using simulation (Stefanidis et al. 2005). Further, no evidence exists regarding how to use proficiency-based data to predict the future performance effectiveness of a learner or determine when refresher training show optimally occur to mitigate skill decay. Consequently, medical simulation curricula could benefit greatly from such affordances. Providing instructors with a principled approach towards tracking individual performance and prescribing tailored training schedules around individual needs could transform the current state-of-the-art and effectively and efficiently ensure that trainees would acquire or maintain proficiency at specific points in time. The current research assesses the extent to which the application of a cognitive model could provide accurate predictions of future performance at the individual learner level of analysis.

1.2 Cognitive Model of Learning and Decay

Briefly stated, a cognitive model’s purpose is to scientifically and formally translate a conceptual theoretical framework of a basic cognitive process (i.e., learning, perceiving, remembering, problem solving, decision making) by reformulating those assumptions into a more rigorous mathematical or computer language (Busemeyer and Diederich 2009). They are derived from basic principles of cognition (Anderson and Lebiere 1998) and produce precise, quantitative predictions of performance that may be empirically tested. The cognitive model we will now describe lays its roots in this type of mathematical foundation.

Over a hundred years of research in the field of learning and forgetting has robustly demonstrated that the temporal spacing of training has a dramatic effect on the individual’s ability to retain the knowledge or skill (Ebbinghaus 1885). If training is scheduled in such a way that knowledge and skills are more susceptive to decay, then additional training resources and time must be put forth to ensure that the individual reacquires the knowledge and skills. This results in higher training costs, more training hours, and may ultimately risk patient safety.

Conversely, maintaining knowledge and skills efficiently leads to the graduation of short-term memory to long-term memory, which is much less susceptible to decay. When the knowledge and skills are committed to long-term memory, the learner does not need sustainment training as frequently. Therefore, prescribing deliberate and individualized training may result in greater knowledge and skill retention (less decay), faster knowledge and skill accessibility, and reduced chance of error.

Predictive Performance Equation.

Recently, several researchers on this proposal who work within the Cognitive Models and Agents Branch of the Air Force Research Laboratory have developed a cognitive computational model to explain how the spacing of practice and other factors affect knowledge and skill acquisition and retention. The model is an extension of the general performance equation (Anderson and Schunn 2000) called the predictive performance equation, or PPE (c.f. Jastrzembski et al. 2009).

In PPE, three factors impact the acquisition and retention of knowledge: (1) amount of practice (frequency effect); (2) elapsed time since practice occurred (recency effect); and (3) the temporal distribution of practice over time (otherwise known as the spacing effect).

The spacing effect is a robust phenomenon of human memory revealing that separating practice repetitions by a delay slows learning but enhances retention (for review, see Benjamin and Tullis 2010; Cepeda et al. 2006; Delaney et al. 2010). The spacing effect is extremely general. It occurs in tasks that involve declarative knowledge (Hintzman and Rogers 1973; Janiszewski et al. 2003), procedural skills (Lee and Genovese 1988; Moulton et al. 2006), and academic competencies (Rohrer and Taylor 2006; Seabrook et al. 2005). The spacing effect has been replicated in laboratory studies (Cepeda et al. 2006), and in ecologically-valid educational settings (Carpenter et al. 2012). Children and adults show spacing effects, as do different animal species (Delaney et al. 2010).

PPE captures the spacing effect by using a multiplicative effect of practice and elapsed time, meaning that more recent training experiences are weighted more heavily than early ones, and weights associated with training experiences decrease exponentially with time. These mechanistic details allow PPE to capture learning trends across a depth and breadth of empirical data sets where spacing effects are present.

In PPE, as in other cognitive models, the psychological parameters that control cognitive processes vary across individuals. We have developed techniques to estimate these parameters at the level of the individual in order to quantify latent characteristics of the learner (i.e. decay rate, susceptibility to spacing, and retrieval variability). To the extent that these parameters and the psychological processes they represent are engaged in different and unique tasks and contexts, they provide a novel way to predict generalization and future performance. In the research presented here, we examine the predictive validity of PPE to longitudinal, repeated measures assessments of performance across laparoscopic surgery, CPR, and trauma assessment domains.

The model accounts for numerous empirical phenomena in the memory literature, demonstrating its sufficiency as a theoretical account of the spacing effect (Walsh et al. submitted). It has been validated using data from more than a dozen experiments that involve the acquisition of factual knowledge and procedural skill across timescales ranging from minutes to years. For these reasons, we are assessing PPE’s validity in more applied and complex training domains.

2 Case Examples

PPE is currently being tested in a nationwide, multi-site field study with the American Heart Association and Laerdal to assess the validity of personalizing cardiopulmonary resuscitation skills training for medical care providers. PPE is also being used to prescribe refresher training for nurses performing trauma assessment on higher fidelity mannequin training platforms, and has been used to track and predict performance in virtual reality laparoscopic surgery domains. Plans to assess PPE using virtual and virtual reality training environments for trauma assessment, and intracranial pressure monitoring skill decay of critical care nurses (a collaboration with the United States Air Force School of Aerospace Medicine) are funded and underway as well.

2.1 Cardiopulmonary Resuscitation Skills Acquisition and Retention

Purpose.

The current standard for hospital employees to maintain certification for cardiopulmonary resuscitation (CPR) skills is taking the American Heart Association course every 2 years. Literature reveals that most medical skills, including CPR, have decayed below proficient levels somewhere between 3 and 6 months post-acquisition, however. As such, the American Heart Association is using that evidence to make a potential policy shift, mandating that hospital employees increase their training 8-fold, and complete CPR training every 3 months, rather than every 2 years. This policy shift also changes the nature of training from a curriculum that is subjectively assessed, to one that is objectively measured using more intelligent manikin training systems. The problem is that the potential policy shifts seeks to fix the problem with undertraining trainees, by potentially overtraining individuals who do not need the additional training. For this reason, researchers at AFRL are collaborating with the American Heart Association and Laerdal to determine whether CPR skills could be personalized around individual learning needs, saving both time in training and reducing patient risk through proficiency-based training scheduling.

Participants.

We recruited nursing students from a total of 9 nursing schools around the United States of America, and have enrolled approximately 400 students in a 2 year, repeated measures field study thus far.

Materials.

CPR was assessed using the ResusciAnne Simulator designed by Laerdal. This system provides real-time, dynamic, visual feedback to trainees regarding compression depth, compression rate, compression hand placement, compression hand release, ventilation volume, and ventilation rate. The simulation system produces a score ranging from 0–100% regarding the quality of both compressions and ventilations. These scores are used in the American Heart Association’s Resuscitation Quality Index (RQI) program. Proficiency was set at 75% in concordance with the RQI program. All data were recorded using Laerdal’s learning management system (LMS), so that individuals could easily log in, examine their performance profile and history, and receive real-time training prescriptions immediately following the current training session for those enrolled in the PPE-prescribed condition (as described in the Design section below).

Design.

The empirical design consisted of a 24 month, repeated measures training and assessment schedule. We employed a pre-test (no feedback), training with real-time, dynamic feedback, post-test (no feedback) design across sessions. Compressions and ventilations were each performed for approximately 1 min for each assessment (e.g., pre-test, training with feedback, and post-test). We included a pre-test/post-test design in order to provide the cognitive model with as much information as possible to adequately estimate its parameters in a very short amount of time. In this way, we could quantify the efficacy of the training itself and how much trainees learned within a session (between the pre and post-test), and estimate the degree of decay that occurred between unique sessions.

We included multiple conditions in this large-scale research design. Firstly, we manipulated the training calibration schedule, meaning we had participants come in for initial acquisition training for 4 sessions spread either daily, weekly, monthly, or quarterly (every 3 months). The purpose of having a difference in our initial acquisition sessions was to assess the validity of PPE model prescriptions as a function of how quickly those individual learning and decay parameters were estimated.

Next, we compared performance of trainee groups across 2 fixed retention intervals of either 3 or 6 months, to groups of individuals assigned to a PPE-prescribed training schedule. For those in the fixed interval condition, they completed 2 reassessments. For those in the PPE-prescribed training condition, they could complete up to a maximum of 10 training sessions or up to 2 years of training time total. In the PPE-prescribed training condition, a subsequent training session was scheduled according to either when that individual’s performance was predicted to decay below 75%, or to help them acquire 75% proficiency in the first place.

Results from the full field study are not yet available as the study is set to complete about 9 months from now. As such, we seek to present pilot results demonstrating how the model functions and results from the smaller pilot group.

Model Application.

Data from the pre and post-tests from the first 4 training sessions were used to estimate unique learning and decay parameters for individual trainees. The model produces empirical predictions of human performance for subsequent training sessions using a one-step-look-ahead procedure, and iteratively update unique learning and decay parameter estimates as new data become available.

Figure 1, below, reveals empirical data from a portion of our pilot sample, examining performance differences as a function of either daily or weekly initial acquisition of skill, and returning for reassessment 3 months post-acquisition. We will focus our analyses on compressions only at this time.

Fig. 1.
figure 1

Empirical data for 8 participants with PPE model predictions, based on calibration to initial 4 sessions. Right panel of each graph reveals projected decay curve for each participant.

Pilot study results reveal key findings. (1) Acquisition is faster when participants train ore quickly, in line with effects of recency, and (2) PPE may be used to successfully track and predict trauma assessment skills performance at the individual learner level. Extrapolations of decay curves indicate that PPE correctly classified 7/8 participants correctly when it came to who would and who would not be proficient at the 3 month retention assessment. In the larger field study, we are doing more than simply classifying who needs training at fixed intervals. We are prescribing precisely when individuals need to return for retraining to either (1) acquire skills initially (as would be the case for S24 in Fig. 1, above), or (2) utilize the decay curves to determine precisely when a participant should return for training to sustain skills before they are predicted to dip below the 75% criterion (e.g., S17 in Fig. 1, above, would be required to return approximately 3.5 weeks post-training acquisition, as that is the point in the decay curve that dips below proficiency). Interestingly, in this very small sample we did not see a trend toward enhanced retention as a function of more distributed training upfront. It will be interesting to see whether true effects of spacing are present in the larger field study.

At a finer level of detail, Fig. 2, below, reveals how the model functions at the individual learner level of analysis. This figure takes pilot data from S24 and extrapolates forward in time to reveal how this individual would likely perform under the current standard of training cycle, which is every 2 years. In this scenario, this participant wouldn’t achieve proficiency until nearly 12 years into their career, meaning they would be performing “at risk” approximately 43% of their career.

Fig. 2.
figure 2

PPE model predictions of performance for S24 over a 20 year horizon, based on calibration to initial 4 sessions, compared to PPE model predictions of performance based on a PPE-prescribed training schedule over the same timeframe.

By contrast, Fig. 2 also reveals a PPE-prescribed training regimen, demonstrating that more training would be required upfront to help this individual first attain proficiency. After proficiency is attained, then training refreshers are principally spaced farther and farther apart temporally, as knowledge and skills become more and more stable. This simple comparison revealed a 45% decrease in training time, and a 99% reduction in risk over the same 20 year horizon.

2.2 Trauma Assessment Skills Acquisition and Retention

Purpose.

Air Force nurses must acquire and sustain a high level of trauma assessment skills to manage patient wounds from current and future conflicts and to be prepared for mass casualty events. Due to drawdowns from Iraq and Afghanistan, decreased deployments have resulted in reduced exposure to trauma management and care, and Air Force nurses do not have frequent enough opportunities to care for trauma patients stateside to maintain currency. This case study sought to determine whether:

  1. (1)

    Trauma assessment skills may be objectively assessed using simulation

  2. (2)

    A trauma assessment curriculum could be established to help nurses establish and sustain proficiency

  3. (3)

    The application of PPE may capture and predict learning and decay so that principled prescriptions may be assessed in an a priori fashion in a follow-on study.

There is also a vacuum in trauma assessment training, as performance is subjectively measured, self-assessed, and trained only once every four years in the Trauma Nurse Core Course (TNCC).

Participants.

Active duty United States Air Force nurses stationed at Wright-Patterson Air Force Base were targeted for inclusion in this 12 month, repeated measures pilot study. Any type of nurse (i.e., medical-surgical, critical care, emergency department) was deemed acceptable to participate, as all types have the core requirement to perform trauma assessment. A total of 5 active duty nurses were successfully recruited and due to deployments, 3 were able to successfully complete the project.

Materials.

Trauma assessment performance was assessed using a moderate to high-fidelity Advance Life Support Patient Simulator designed by Laerdal, and individual-level objective performance metrics were developed and validated by Lt Col Dufour and Jastrzembski (2015), using an adaptation of a previously validated trauma assessment tool for teams (Holcomb et al. 2002). This measurement tool allowed for a detailed quantitative examination of performance across specific portions of the trauma assessment task. Proficiency was set at 70% in concordance with the Trauma Nurse Core Course criterion.

Design.

The empirical design consisted of a 12 month, repeated measures training and assessment schedule. We employed a pre-test (~20 min), didactic training (45 min at initial training, 15 min for subsequent refreshers), post-test design (~20 min + 40 min debriefing) across sessions. Rationale for including both a pre-test and post-test within a session mirrored that of the CPR study – namely, we wished to determine how effective training was within a session, and we wished to assess how rapidly skills decayed between sessions by establishing a baseline. Participants came in for a total of 5 sessions, occurring at onset of the study, 1 month, 3 months, 6 months, and 12 months. At the 12 month session, only an assessment occurred (no didactics or post-test was administered).

Model Application.

The first 3 training points (pre and post tests at baseline, and reassessment at 1 month) were used to estimate unique learning and decay parameters. The model produced empirical predictions of human performance for subsequent training sessions using a one-step-look-ahead procedure, and iteratively updated unique learning and decay parameter estimates as new data became available. Not surprisingly, use of additional data for model calibration purposes produces better predictions for future events, as shown in Table 1 below, though calibrating with more than 3 training points produced diminishing returns, suggesting that for this skill set, at this cadence of training, PPE makes valid predictions out to 12 months based on only 1 month of data to calibrate with.

Table 1. Correlation and mean-squared error values between model and empirical data based on the number of data points calibrated with.

The model was able to track performance extremely well when compared against empirical data, as shown in Fig. 3, below.

Fig. 3.
figure 3

Empirical data for 3 participants with PPE model predictions, based on calibration to initial 3 data points. Right panel of each graph reveals projected decay curve for each participant.

Based on this small sample, we demonstrated that PPE may be used to successfully track and predict trauma assessment skills performance at the individual learner level. Based on our extrapolations of projected sustainment, we also argue that restructuring the way trauma assessment is taught could produce a 75% reduction in training time with the added benefit of more prolonged skills sustainment. A follow-on study testing the model prescriptions with a larger sample is currently underway.

2.3 Laparoscopic Surgery Skills Acquisition and Retention

Purpose.

Training laparoscopic surgical techniques is not a skill that is amenable to the typical apprenticeship training model, as specific hindrances including loss of 3-dimensional visualization, loss of tactile feedback, and counterintuitive instrument movement are the nature of the beast (Ahlberg et al. 2005). Additionally, during the period before proficiency has been attained, the risk for complications is dramatically higher (Moore and Bennet 1995; Deziel et al. 1993; Joice et al. 1998). Thus, great care has been placed into developing proficiency-based simulation training environments, so that students may practice and hone their craft without risking patient safety. Though this represents a huge shift in the right direction, the question still remains how long skills trained in simulation will last. As such, researchers at the Carolinas Simulation Center collaborated with AFRL to assess whether PPE could be applied to the laparoscopic surgery domain. We used archival data as a starting point, to assess how well PPE could track and predict performance in a 12 month longitudinal study.

Participants.

Our data analysis examined performance of seventeen second-year medical students, who were trained to proficiency in a laparoscopic suturing task (see Stefanidis et al. 2005, for details).

Materials.

Students trained using the Fundamentals of Laparoscopic Surgery (FLS) training model, and performance was assessed using an objective performance measurement scale based on time and accuracy.

Design.

The empirical design consisted of a 6 month, repeated measures training and assessment schedule. All students were initially trained to proficiency at session 1 (mean of 54 ± 22 repetitions and 5.6 ± 1.4 h). Students were then split into 2 conditions – either a proficiency control or proficiency + maintenance-based training group. Reassessments occurred at 2 weeks, 1 month, 3 months, and 6 months. The ongoing proficiency-based training group received additional training at the 1 and 3 month sessions in order to maintain criterion performance. At 6 months, a single reassessment was performed.

Model Application.

Data up to and including the 2-week reassessment data were used to calibrate the model’s learning and decay parameters. The model extrapolated learning trajectories for each individual to generate predictions for suturing performance at the 1, 3, and 6 month follow-up sessions.

The model was able to track performance extremely well when compared against empirical data, as shown in Fig. 4, below.

Fig. 4.
figure 4

Empirical data for 17 participants with PPE model predictions, based on calibration to initial 2 weeks of data.

These results provide a proof-of-concept that PPE may be used to successfully track and predict laparoscopic suturing skills performance at the individual learner level. In the next phase of studies, we seek to use PPE to drive principled training prescriptions around individual learning needs to determine when students should return for simulation training to sustain competencies.

3 Summary

It is evident that calendar-based, subjectively rated training schedules and programs are antiquated training methodologies that possess huge costs in terms of training time, training dollars, and increased performance risk. We argue that incorporating our state-of-the-art cognitive modeling approach, personalization of training is not only possible, it is affordable, feasible, and can dramatically reduce risk. It is necessary that appropriate care be given to lay a foundation of objective performance measurement systems so that 21st century approaches to personalized learning may be validly applied. We are enthusiastic about the momentum we have gained thus far and are encouraged by government, academic, and industry investments being made to fundamentally help change the training status quo.