Keywords

1 Introduction

Almost as soon as general-purpose computers came to life, researchers began to consider how they could use these powerful machines to improve education and training [22]. These researchers were drawn by the transformative promise of being able to deliver high-quality educational services universally. No longer would the quality of a student’s educational experience be limited to their proximity to excellent teachers. Students everywhere could benefit from a state-of-the-art education. Moreover, these electronic teachers would never tire, never lose patience, and never have a bad day.

Soon, however, these researchers began to encounter significant scientific difficulties. They had to wrestle with how best to use this new power to attain educational outcomes. At the same time, the researchers had to develop a language that they could use to communicate their recommendations so that they could be evaluated. These were significant theoretical challenges.

However, in some ways, the theoretical challenges paled in comparison to the practical ones. Relatively few individuals were competent to develop computer-based courseware. Further, the process of handcrafting this computer-based courseware was frightfully slow. Given the costs and timelines involved, it was impossible for these few pioneers to create a collection of courseware that could meaningfully impact society, or even local enterprises.

Therefore, at the same time that they were developing a “design language” [7] for communicating, testing, and implementing their innovations with respect to computer-based courseware, these early innovators had to address the practical imperatives of developing the courseware faster, cheaper, and with a broader workforce. In the next section, we will review some of these strategies they employed and some of the consequences of those strategies.

2 Cost Reduction Strategies over the Decades

In the early days of computer-assisted instruction (CAI), the load was carried by a few men and women of extraordinary talent. These pioneers needed to be experts in a new and rapidly evolving technology and at the same time master the not-yet developed field of instructional systems design. Much of the early courseware developed in these labs was developed on mini-computers like the DEC PDP-7 or PDP-9 using assembler language (W. Judd, personal communication, December 20, 2019).

Not surprisingly, these early efforts were “exceedingly time consuming and frightfully expensive” (W. Judd, personal communication, December 20, 2019). Almost immediately, the leaders of the field began to look for ways to reduce cost and increase efficiency. More specifically, the hunt was on for approaches that would reduce courseware development timelines and the required expertise. These remain priorities today. Moreover, additional cost drivers that were less important in the early days of the field have come under increased scrutiny. These include reducing the costs associated with media development and hardware procurement and sustainment.

In this section, we will review some of the strategies that they employed and summarize their results.

2.1 The Starting Point: Early Courseware Development Efforts

The first step toward reducing time lines involved switching from assembler languages to higher-order languages such as FORTRAN, COBOL, and others. While these languages reduced programming time, they still required skilled programmers. Further, they often lacked features needed for instructional programming.

Researchers addressed the latter issue by creating extensions of these languages that made them a better fit for CAI. For example, languages like CATO and FOIL were extensions of FORTRAN (W. Judd, personal communication, December 20, 2019). The former issue, however, remained a challenge. By requiring the expertise of skilled programmers, use of these tools limited the pool of potential authors and increased development costs.

To address this limitation, the next step in the evolution of courseware development was the introduction of macros [7]. Macros “rolled-up” more atomic commands into higher-order, easy to remember functions. A less experienced developer could run a macro, provide some arguments, and the system would convert the supplied content to the specific lower-level commands that would be executed to create the learner experience.

2.2 The Rise of Authoring Languages

The need for better tools, the advent of macros, and a few other factors lead to the development of specialized courseware authoring languages. One of the earliest such language was known as TUTOR and it was developed specifically for Programmed Logic for Automatic Teaching Operations (PLATO) system. PLATO used a centralized computer serving a number of “dumb” terminals to present lessons, illustrations, drills, etc. to learners [1]. In 1967, Paul Tenczar recognized that a more efficient method of creating PLATO lessons was needed and possible. His efforts led to the development of the TUTOR language [26].

While an advance, TUTOR was still laborious. Sherwood [26] provides a useful example. Imagine that we want to create an exercise that asks the student to name a geometric figure, in this case a triangle. The student will type a response and the system will reply with an indication that the answer is right or wrong. To produce this very simple question, TUTOR (which was unquestionably an advance!) required the series of commands shown in Table 1.

Table 1. Sample TUTOR programming sequence

While still cumbersome, TUTOR signaled the start of a wave of emerging authoring languages. Frye [6] identified 22 authoring languages. In his analysis, Frye apportioned the languages into the four categories shown in Table 2.

Table 2. Categories of authoring languages

An argument could be made that Frye could have included at least two more categories in his analysis – table-based systems that separated content and programming logic and question generators (E. Schneider, personal communication, 2020).

These languages made it easier to develop CAI and lowered the barriers to the development task. In doing so, they were a significant first step toward achieving the first two efficiency goals that we introduced above: reducing courseware development timelines and reducing the required level of expertise. However, they were still (unapologetically) programming languages. While they had features that made them well suited to instructional development, they required a level of dedication to learn, use, debug, etc. As computers became more sophisticated and the range of input, output, and media options exploded, the barriers to entry and time commitment once again began to increase. However, at the same time, those same capabilities made graphical user interfaces practical and opened the door for the next step in the evolution of authoring tools – the development of authoring systems.

2.3 Continuing Efficiency: The Emergence of Authoring Systems

In some ways, the basis of many authoring systems brings us back to the early days when systems like PLATO were emerging. A contemporary of PLATO, the Time-shared, Interactive, Computer-Controlled Information/Instructional Television (TICCIT) system was designed to test the ability to deliver instruction to homes. TICCIT linked minicomputers to home televisions via coaxial cables [10]. If this wasn’t ambitious enough, the TICCIT developers wanted to give learners a significant degree of control over their instructional experience. To achieve their goals, the team had to develop a strategy that would allow instructional experts and software experts to communicate effectively. They also needed a strategy that would make it practical to deliver the content and provide learners with the desired level of control. To meet these challenges, the TICCIT team developed the concept of the “base frame” [10].

For TICCIT, a base frame was a logical unit of instruction. It described a single strategic interaction with the student. Each base frame comprised a series of elements. Within a given frame, some elements remained static; others changed over time in response to the instructional programming or student actions. Not only did this concept provide a foundation for communication and development, it also provided the basis for a range of modern authoring systems such as Authorware, IconAuthor, Quest, etc. Developing instruction came to consist of specifying a series of frame types and then describing how to populate the various elements of each frame.

Most authoring tools were essentially “code” generators that hid the programming requirements behind user-friendly interfaces. These tools then produced the detailed computer code that separate “runtime environments” would ingest to create the programed instructional experiences. Different runtime environments (e.g., computers using different operating systems) would require different programs, so various translators had to be present.

Essentially, these authoring systems provided a more graphical front end on the tried-and-true macros used by earlier developers. Rather than using verbal macros (like a “write” command), these tool used menus, forms, and draggable objects to allow designers to specify their intent and to gather the data needed to populate macros. For this generation of authoring systems, the goal was to hide the complexity of instructional programming behind a graphical user interface. As noted by Gibbons [8], these tools made it easier for non-programmers to outline a series of instructional experience types and then to specify the details of those experiences (including learner-sensitive variations). By eliminating the need for coding (or even pseudo-coding), these tools did a tremendous job of opening the field of instructional development to a much wider range of users. They have significantly reduced the level of expertise required to develop courseware. Further, they have made that development process much faster. However, in doing so, these tools have provided some unintended consequences. We’ll discuss these consequences in Sect. 2.5.

2.4 More Recent Trends: Using Standards and Technology to Further Reduce Costs

Earlier developers were not particular concerned with media development or hardware costs. In the former case, the computers of the day did not process and display “media” as we use the term today. As we noted earlier, if a designer wanted a triangle on the screen, he/she did not display a picture of a triangle; rather, the designer used programming tools to draw a triangle. Similarly, because fewer computers were involved and their costs, while high, were swamped by development costs, the price of hardware wasn’t of much concern.

More recently, these concerns have become more prominent. Just as CAI tools have evolved over the years to reduce “programming” costs, so too have media tools, like Photoshop matured, becoming both more powerful and easier to use. To further reduce costs, customers and producers have become much more interested in reducing cost through re-use. If I want a picture of a triangle, why should I have to hire a graphic artist to create it? Why can’t I use one that has already been produced? Similarly, why can’t I build one lesson on a topic (e.g., Ohm’s Law) and reuse that lesson whenever it is needed. Concerns like these have given rise to various initiatives such as the Sharable Content Object Reference Model (SCORM) and its constituent standards to promote reuse, content portability, and other cost reducing benefits.

Hardware costs have been trending downward on a nearly continuous basis. The concern now isn’t so much the concern of the device itself, but rather a concern with maintenance costs. As the number of personal computers and operating systems has proliferated, it has become difficult to manage the software present on each machine. This is especially concerning as cyber security concerns have increased. To reduce maintenance costs while increasing courseware currency, many users are opting for web-based delivery of content. To accommodate this, authoring systems are migrating away from runtime environments and toward plugins, the use of HTML5, and other tools designed to maximize the efficiency of delivery within the available bandwidth.

2.5 The Downside of Efficiency

Let’s now return to the notion of unintended consequences that we introduced at the end of Sect. 2.3. Authoring languages and, to a greater extent, authoring systems have made it very simple to create courseware. However, in doing so, they have adopted a specific design language which has arguably, constricted the thinking of would-be designers [7, 9].

Design languages emerge to allow teams of developers, who often come from different communities of practice, to communicate goals and intents effectively. Often, these design languages give rise to tools that embody them. For example, the notion of a base frame in TICCIT that allowed software engineers, instructional designers, and media producers to communicate effectively.

However, this process also works in the other direction. As tools come to embody a certain design language, the tools shape how users come to think about the problem. Those functions that a given tool does not accommodate easily become easier for users to ignore. Those functions that are easiest for tool producers to program and for tool users to employ become preeminent [7]. These effects lead to the narrowing of the range of instructional design.

This becomes increasingly insidious when narrowed design conceptions are coupled with commercial price pressures such as the Government’s use of low-price technically acceptable (LPTA) contracts. LPTA contracts mandate that as long as the product meets certain minimal standards, the purchaser must accept the lowest priced offering, even if it does not represent the solution that provides the best values. In these environments, it is easy, perhaps necessary, to adopt the “quick-and-dirty” solutions that authoring systems make very easy to develop. Experience has shown that this leads often to “death by PowerPoint” experiences masquerading as courseware.

3 Imagining a Different Approach

Gibbons and Fairweather [9] noted that when considering authoring tools, users should pick a tool “(1) that allows you to express your powerful ideas in a computer program without compromising them, and (2) allows you to create, read, and maintain the program as easily and efficiently as possible.” Experience has shown that these competing goals have been reconciled in ways that are not entirely pleasing.

As we noted in Sect. 2.5, the design language of most tools limits the creativity and power of the designs that they promote. At the same time, customers have repeatedly expressed that the current generation of tools are not easy enough to use. Many customers do not want to pay external developers to produce courses. To save money and to make course maintenance easier, they would prefer that their “in-house” instructors and subject-matter experts build the courses. However, these local personnel generally lack the background and time to use the existing tools.

Gibbons and Fairweather [9] suggested that authoring tools could be described and compared by considering their productivity, ease of use, and power. Productivity refers to the amount of work that an individual can accomplish in a unit of time. For our discussion, this might equate to the length of time required to produce an hour of courseware. The notion of ease of use is fairly easy to understand. Often, researchers expand this to a more general notion of usability as described in Table 3. Power refers to the flexibility of the tool and the number of things that it can be used to accomplish. “A tool has more power than another if it allows the author to accomplish more or if it allows the author to approach and solve a particular problem in more ways” [9].

Table 3. Conventional dimensions of usability

Following the general trend in computer engineering, and in keeping with their roots with high-order programming languages, authoring tools have commonly been “general purpose” tools. That is, the designers of these tools have tried to maximize their power. Over the years, ease of use and productivity have been increasingly important and various “shortcuts” have been introduced that gradually reduced or hid the power of these tools.

Our research team is currently exploring the hypothesis that a better approach is possible. Specifically, we want to consider the value of building a collection of highly specialized authoring tools. Each tool would be optimized for a very specific learning task or instructional outcome. Our thought is that by significantly reducing the power of the tool, we can tremendously increase its usability and productivity. The goal is to radically minimize the level of effort that novice developers (e.g., in house instructors or subject matter experts) must expend to achieve a specific learning outcome. Resorting to hyperbole, we refer to this as our “zero authoring goal.” In the long term, these very specific zero authoring systems could be federated to create a collection of tools, building up power to the extent needed in a given setting.

We are currently developing the first such system. In the next section, we discuss its foundations.

4 Case Study: Minimal Authoring for Adaptive Rote Learning

The goal of the current effort was to develop the first of a family of hyper-focused “zero authoring” adaptive training systems. To begin this line of research, we wanted to focus on a very basic learning outcome that nonetheless would be amenable to meaningful individual adaptations. Our choice was the rote memorization of facts.

One advantage that our team has is that research on the best way to develop mastery of facts dates back to the very beginning of the study of psychology. For example, one typical application is to learn geography facts [18, 23]. In this application, the instructional system might present learners with an outline of a continent or region and ask the students to identify the shaded country, state, etc. Alternatively, the system might present the learners with the name of a country, state, etc. and ask them to identify it on a map. Other commonly researched topics are foreign words/phrases [5, 16, 27], math facts [31], history facts [2], and flower types [16].

For these two classes of learning outcomes, the primary instructional manipulation is the intelligent application of the well-documented spacing effect. Dating back to the work of Ebbinghaus in the late nineteenth century, this finding shows that material that is studied across two or more sessions distributed in time is remembered more easily than when an equal amount of time is devoted to studying during a single session (i.e., distributed practice is superior to massed practice). These effects exist for recall over both relatively short periods and relatively long periods. Researchers have postulated several mechanisms for this effect [30], for example:

  • The Deficient Processing Hypothesis,

  • The Context Variability Hypothesis, and

  • The Study-phase Retrieval Hypothesis.

The deficient processing hypothesis holds that when exposure is massed, less processing effort is devoted to a given knowledge item. Because the item is associated with less elaborative processing, its memory trace is weaker and more likely to fade over time. The context variability hypothesis holds that as the time between exposures increases, so does the variability in environmental cues. Associating an increasing number of contextual cues with the knowledge item provides an increasing number of retrieval pathways, thus increasing the likelihood of recall. The study-phase retrieval hypothesis argues that, like exercising a muscle, each time an item is recalled, the original memory trace is strengthened. The more difficult the recall effort, the greater the increase in strength.

At the same time that they have worked to understand the mechanism behind the spacing effect, researchers investigated questions such as:

  • What type of exposure is appropriate, and

  • By how much should those exposures be separated?

The nature of the exposure that is required is fairly clear – the spacing effect relies on intentional/effortful processing. Manipulations that minimize the effort required to process the information detract from long-term memory [28].

In examining the optimal intervals within the spacing effect, researchers have looked at questions such as whether the spacing should be in fixed or expanding intervals, and whether longer spaces provide a bigger impact on learning.

Experiments comparing fixed and expanding exposure schedules have produced equivocal results [3, 19]. Within the expanding interval model, intervals are short early in the learning process and progressively expand as learning progresses. This contrasts with strategies in which the spacing remains constant. For example, Storm, Bjork, & Storm [29] reported an advantage for an expanding practice interval, but Karpicke & Roediger [12] reported that equal intervals produced better outcomes. Karpicke & Bauernschmidt [13] reported no difference between the models.

A slightly more consistent pattern of results is seen when examining the effect of the spacing interval. Several studies in the mid-twentieth century seemed to indicate that longer spacing intervals (also known as the interstimulus interval) promoted retention better than shorter intervals (see [3] for a review). However, as Carpenter, et al. [3] note, care must be taken to avoid making the spacing interval so long that the content has been forgotten and each exposure approximates a “first exposure.”

There is also evidence that the most desirable interstimulus interval (ISI) is dependent on the required retention interval (RI, that is, the gap between the last exposure and some test of memory, also known as the test delay;). In fact, for very short retention intervals, the spacing effect may be reversed, favoring massed practice [28]. Carpenter, et al. [3] cited work indicating that the optimal spacing gap was 10%–20% of the retention interval. This effect has been termed the proportionality rule [11]. Mozer and Lindsey [21] described this effect in terms of a power function relationship (see Eq. 1).

$$ {\text{Optimal}}\;{\text{Spacing}} = 0.097{\text{RI}}^{0.812} $$
(1)

Many of these “timing factors” can be reconciled through the notion of “desirable difficulty.” This principle holds that the more difficult it is to retrieve an item during exposure X, the easier it will be to retrieve the item during exposure X+Y [28]. In this way, memory can be likened to weightlifting. Practicing with heavy/difficult weights maximizes gains, as long as those weights are within the capability limits of the performer. Research has indicated that the desirable difficulty might lead to better encoding of the memory trace at each recall opportunity (see [28] for a review).

This is generally consistent with the “retrieval effort hypothesis” [18]. The evidence for this hypothesis holds that the difficulty of a successful retrieval is directly associated with long-term retention of the material. That is, the benefit of practice is maximized when retrievals are difficult, but ultimately successful [12, 14, 24].

Another way of expressing this notion of desirable difficulty is to say that the optimal time to re-expose a learner to an item is just before the learner would otherwise forget that item. This approach would maximize the interstimulus interval while allowing the learner to recall the item successfully. Continuing with the weightlifting metaphor, we want the learner to lift as much as possible without failing. Within this conception, the confusing results regarding fixed or expanding intervals and the optimal interstimulus interval can be seen as consequences of uncontrolled variability – that is, the level of mastery present in a given learner. This level of mastery can be predicted to vary as a function of the individual learner (e.g., differences in aptitude), an individual item (e.g., differences in difficulty), and the interaction between the two. Superior performance might be expected if the exposure schedule (i.e., the interstimulus interval) were adapted to these factors. This is the challenge that we take up in the following section. In doing so, we will move from descriptive to computational models of the spacing effect.

4.1 Describing the Learning Task

The “recipe” for learning facts is well-established [20]:

  1. 1.

    Present the fact to be learned

  2. 2.

    Provide opportunities to recall the fact

Flashcards provide a useful vehicle for this type of instruction. Each flashcard instance represents a combination of a knowledge component/label, a face (picture or text), and a form. A given label might be associated with multiple faces (e.g., multiple pictures of the same thing). Further, the flashcards will assume various forms.

To accomplish the first part of the recipe, students will begin by paging through virtual flashcards delivered via a web browser. The interface, shown in Fig. 1, will show the “face” of the flashcard and present a button with the “label” associated with the card. In this way, a given knowledge component/label might be presented several times, if it is associated with several faces.

Fig. 1.
figure 1

Initial passive presentation of flashcard facts

Thereafter, each student will receive an individualized practice experience comprising various flashcard forms. The various forms will have differing levels of difficulty, allowing the adaptive scheduler (see Sect. 4.2) to make finer-grained decisions. The basic form will present the face of the card and four labeled buttons (see Fig. 2).

Fig. 2.
figure 2

Label-based flashcard

The second flashcard form will reverse the process. In this form, the system will use a label as the prompt and various faces will serve as clickable options (see Fig. 3).

Fig. 3.
figure 3

Face-based flashcard form

The third option is a “matching” form in which the student must drag four labels to the corresponding faces. This is shown in Fig. 4. For each flashcard form, the student will be given feedback indicating whether his/her answer was correct and, if not, which of the options should have been selected. The student will then click a “next” button and the adaptive scheduler will determine which knowledge component, flashcard, and form to present.

Fig. 4.
figure 4

Matching flashcard form

4.2 Calculating the Optimal Spacing Interval

The adaptive scheduler we envision will embody the following design principles.

  1. 1.

    Establish a clear mapping between items and knowledge components.

  2. 2.

    Include a passive learning session prior to active recall attempts.

  3. 3.

    Promote recall from long-term memory.

  4. 4.

    Use recall likelihood to guide item selection.

  5. 5.

    If multiple items are eligible for selection, use cross-item knowledge component estimates to bias selection.

  6. 6.

    Establish a configurable recall likelihood threshold.

  7. 7.

    Consider both individual and item factors when determining the likelihood of recall.

  8. 8.

    Use item difficulty as the primary item factor when determining likelihood of recall.

  9. 9.

    Use knowledge component mastery as the primary individual factor when determining likelihood of recall.

  10. 10.

    Avoid dominating a short session with “failed” attempts.

  11. 11.

    If no items are ready for recall, add new item.

  12. 12.

    If no new items are available, select item with lowest likelihood of recall.

  13. 13.

    Do not retire items.

In keeping with the work of Mozer & Lindsey [21], Choffin, et al. [4] and others, practice items should have a clear association with an identified knowledge component. This mapping could be 1:1, 1:Many, Many:1, or Many:Many.

Mettler, et al. [17] and others have demonstrated that while recall accuracy does not improve with the delivery of passive learning opportunities, recall efficiency does. That is, when the size of the learning gain is divided by the number of trials used, including passive trials actually improves the “learning rate.” To take advantage of that effect, RATE will deliver a block of passive trials prior to engaging in active quizzing. Within these passive trials, the object to be learned is presented, together with only the correct answer. After a few seconds (Mettler, et al. [17] used four seconds), navigation controls allow the student to advance. After the student has reviewed all knowledge components, active practice begins.

While there is a certain value to massed practice for newly introduced and/or difficult items to build a stable memory trace that the system can then strengthen, this value disappears when students can retrieve information from working memory. To ensure that this does not happen, items pertaining to the same knowledge component should not appear on consecutive trials.

Another stable empirical finding is that systems maximize short- and long-term retention when recall occurs just prior to the point of forgetting. We will incorporate this finding in our approach. However, various research teams have used and/or recommended a wide variety of targeted recall levels, ranging from 33% to 95%. It seems like the optimal value may be the subject of subsequent research and may vary by instructional goals, domains, etc. Therefore, we will make this threshold configurable. If the likelihood of recall estimate for several items is below the threshold, the system should select the easiest item associated with the knowledge component with the lowest average likelihood. In this way, the system will attempt to “rescue” the knowledge component most susceptible to forgetting.

Mozer & Lindsey [21] and others demonstrated that (1) there is a high level of variability in both individual factors and item factors that can lead to differences in recall rate, and (2) these factors matter when attempting to derive an optimally individualized practice schedule. Therefore, our system will measure both classes of factors and our scheduler will use these data.

The primary item factor that the system will monitor and employ will be item difficulty. The system will derive its item difficulty estimate from data on accuracy, reaction time, and inter-item confusion. The system will create different difficulty estimates for different forms of a given item (i.e., alternative forms will be considered separate items that are mapped to the same knowledge component).

The primary individual factor that the system will monitor and employ is knowledge component mastery. Calculation of mastery should include aptitude across knowledge items (i.e., taken as a whole, across knowledge components and items does the individual appear to have an aptitude for a given subject-matter domain). The determination should include all available evidence (e.g., the correctness/incorrectness of item types that differ in reliability). The system will assume that different types of evidence have differing reliabilities. Similarly, and in keeping with the various model-based approaches reviewed, the value of this evidence should diminish over time.

Our estimation of the probability of recall will be based on so-called hybrid approaches that merge individual data, group data, and models of learning and forgetting. For example, cognitive models such as ACT-R and the multiscale context model (MCM) provide very accurate and elegant descriptions of memory strength. These models assume that the brain constructs memory traces each time a learner is exposed to an item and that the traces decay at rates dictated by the temporal distribution of past exposures [15]. Unfortunately, these models are not very suitable to real-time adaptation of practice sequences.

To improve the efficiency of the models, Lindsey and his colleagues [15, 21] developed a modeling approach that combined data-driven machine learning approaches that use population data to make inferences about individuals and items and theory-driven approaches that characterize the temporal dynamics of learning and forgetting. They labeled their model DASH to represent the features to which it is sensitive: Difficulty, Ability, and Study History.

The DASH model has two key components:

  1. 1.

    A representation of study history that can characterize learning and forgetting, and

  2. 2.

    A collaborative filtering approach that can infer latent difficulty and ability factors from incomplete data.

The DASH model begins by modeling forgetting after a student is exposed to material one time. The model does this with a forgetting curve in the form shown by Eq. 2.

$$ \Pr \left( {{\text{R}}_{si} = 1} \right) = m\left( {1 + ht_{si} } \right)^{ - f} $$
(2)

Where:

  • Rsi = the response of student s to item i after retention interval tsi.

  • m = free parameter interpreted as the degree of initial learning (0 ≤ m ≥ 1)

  • h = a free parameter that acts as a scaling factor time (h > 0)

  • f = a free parameter that acts as a memory decay exponent (f > 0).

DASH individualizes the decay exponent (f) for each student-item using latent traits for student ability and item difficulty. The decay exponent is defined using the formula shown in Eq. 3.

$$ {\text{f}} = e^{{(\theta_{s} - b_{i} )}} $$
(3)

Where:

  • \( \theta_{s} \) = Student Ability

  • \( b_{i} \) = Item Difficulty

DASH further individualizes the degree of learning parameter (m) using the formula shown in Eq. 4.

$$ {\text{m}} = \frac{1}{{1 + e^{ - (\theta - b)} }} $$
(4)

The next step is to include support for an arbitrary study history. That equation takes the form shown in Eq. 5.

$$ \Pr ({\text{R}}_{\text{sik}} = 1\,|\,\theta_{s} ,b_{i} ,\varvec{t}_{1:k} ,\varvec{r}_{1:k - 1} ,\phi ) = {\upsigma}(\theta_{s} - b_{i} + h_{\phi }(\varvec{t}_{s,1:k}, \varvec{r}_{s,i,1:k - 1} )) $$
(5)

Where:

  • \( \Pr ({\text{Rsik}} = 1\,|\,\theta_{s} ,b_{i} ,\varvec{t}_{1:k} ,\varvec{r}_{1:k - 1} ,\phi ) \) = The probability of student s responding correctly to kth trial of item i, conditioned on that student-item tuple’s specific study history.

  • \( \theta_{s} \) = Student Ability

  • \( b_{i} \) = Item Difficulty

  • \( \phi \) = is a parameter vector, learned by DASH, that governs the \( h_{\phi } \) function.

  • \( \varvec{t}_{s,1:k} \) = the times at which trials 1 through k took place

  • \( \varvec{r}_{s,i,1:k - 1} \) = the accuracy of each previous trial

As noted above, \( h_{\phi } \) is a function parameterized by vector \( \phi \). \( \phi \) is learned by DASH. \( h_{\phi } \) is the portion of the equation that applies psychological principles of learning and forgetting. Mozer & Lindsey [21] began with a “default” version of this equation (see Eq. 6).

$$ h_{\phi } (\varvec{t}_{s,1:k} ,\varvec{r}_{s,i,1:k - 1} ) = \sum\nolimits_{W = 0}^{W - 1} {\phi_{2w - 1} \log (1 + c_{siw} ) + \phi_{2w} \log (1 + n_{siw} )} $$
(6)

Where:

  • w = an index of expanding time windows

  • \( c_{siw} \) = the number of correct outcomes of student s on item i in time window w

  • \( n_{siw} \) = the total number of correct outcomes of student s on item i in time window w

They also created versions of the function that were more closely aligned with the memory dynamics of the more detailed MCM and ACT-R frameworks. The MCM version replaces the time windows with time constants that determine the rate of exponential decay of memory traces. The model assumes that the counts \( n_{siw} \) and \( c_{siw} \) are incremented at each trial and then decay over time at a timescale specific exponential rate τw. Mechanically the \( c_{siw} \) and \( n_{siw} \) terms in the default versions are redefined to be equal to:

$$ c_{siw} = \sum\nolimits_{{\upkappa = 1}}^{k - 1} {r_{si\kappa } e^{{ - (t_{sik} - t_{si\kappa } )/\tau_{w} }} } \quad \quad \quad \quad n_{siw} = \sum\nolimits_{{\upkappa = 1}}^{k - 1} {e^{{ - (t_{sik} - t_{si\kappa } )/\tau_{w} }} } $$

The ACT-R version of the equation replaces \( h_{\phi } \) with a function that allows the influence of past trials to continuously decay according to a power law. The redefined \( h_{\phi } \) is shown in Eq. 7.

$$ h_{\phi } = \phi_{1} \log (1 + \sum\nolimits_{\kappa = 1}^{k - 1} {\phi_{{3 + r_{si\kappa } }} (t_{sik} - t_{si\kappa } )^{{ - \phi_{2} }} } $$
(7)

Mozer & Lindsey then tested the ability of the personalized model to schedule content review sessions with middle school students. They compared the ability of six models to predict student performance. Three of the models were variants of the DASH framework that adopted different models of memory dynamics. The other three were a model created in ACT-R, a model based on item-response theory, and a model based on each student’s history of accuracy.

All DASH variants outperformed the other models and did not significantly differ among themselves. Mozer & Lindsey [21] also looked at the ability of the adaptive scheduler to improve student performance on exams delivered at the end of a semester and again 28 days later. The researchers compared the performance of the personalized scheduler to a “massed” schedule that concentrated on one unit at a time and a “generic” scheduler that included review of topics from the preceding unit. The personalized scheduler outperformed the alternatives.

To maintain student interest, the system will avoid including several items that have a low likelihood of recall within a given time window. Instead, the system will add new items only when no items in the practice set are below the established threshold. In keeping with previous principles, if there are no new items to add, the system will select the item with the lowest likelihood of recall value. Further, we see no value in retiring items given the temporal fading approach embodied in earlier principles. The system will not remove/retire well-learned items (although students will practice them relatively rarely).

4.3 Authoring the Required Material

As noted earlier, our goal is to minimize the authoring demands associated with the specialized tools that we envision. For this first use case, the authoring task is to create flashcards. All the author needs to do is import a number of “faces” (i.e., images or text passages) and associate them with appropriate labels. The easiest way to accomplish this is to allow the author to pull together all the relevant pictures into a specific folder on the local computer and then import them into the authoring environment all at once. The author would then simply attach a label to each picture. Figure 5 provides an indication of the type of interface that an author might use to complete this task.

Fig. 5.
figure 5

Labeling flashcards

Figure 5 also hints at other authoring approaches. For example, on the right, it illustrates empty picture slots that can be used to associated additional pictures with a given label. Similarly, along the left, it shows how text-based descriptions, definitions, etc. can be associated with each label. The blank row at the bottom indicates that the authoring interface will automatically expand to accept new knowledge components/labels.

5 Conclusion

Over the decades, tremendous progress has been made in techniques to support adaptive training and tools to make the development of training systems more affordable. Nevertheless, although consumers appreciate the gains that adaptive systems bring, they consistently reject the level of effort associated with their development and maintenance.

The current generation of tools seems unlikely to address the challenge of creating powerful learning environments while constraining development timelines. The design language that they employ provides subtle constraints on development while there general purpose focus creates significant development overhead.

To address this issue, our research team is exploring an alternative path. Rather than creating general purpose authoring and delivery tools, we are exploring the viability of creating special purpose tools designed to support specific learning outcomes. Our hypothesis is that by reducing flexibility we can increase usability to the level required by consumers.

The development effort is on-going and we hope to report results in the near future.