Introduction

During the past two decades or so, work across a number of disciplines has begun to converge on what may be seen as a new, interdisciplinary account of the nature and evolution of teaching in humans. Contributions come from researchers studying differences in methods of cultural transmission in humans and other animals (e.g. Csibra and Gergely 2009, 2011; Franks and Richardson 2006; Hoppitt et al. 2008; Thornton and Raihani 2008; Tomasello 1999); coevolution of language and the human brain (e.g., Deacon 1997); neurobiology (Schaafsma et al. 2015; Stout et al. 2008); archeology (Morgan et al. 2015; Tehrani and Collard 2002; Tehrani and Riede 2008); cultural anthropology (MacDonald 2007; Maynard 2002); evolutionary cultural anthropology (Hewlett 2016; Hewlett and Roulette 2016); peer teaching in children (Strauss et al. 2002; Strauss and Ziv 2012); evolutionary psychology (Barkow et al. 1995; Pinker 1997, 2003); niche construction (DeVore and Tooby 1987; Pinker 2003; Sinha 2015); and evolutionary relationships among traits such as human life history, language, and culture (Bogin 1990; Fogarty et al. 2011; Kaplan et al. 2000; Laland 2016; Locke and Bogin 2006; Schwartz 2012).

A central claim of this emerging account is that human pedagogy is at once a cultural and biological behavior, fundamentally enabled by language, and resulting from millions of years of the coevolution of genes and culture. As such, the capacity and inclination to teach and learn through language, are best understood, and can only be fully understood, as deeply embedded components of the full human package. Somehow, in a highly complex, little understood, and by no means inevitable sequence of genetic and environmental events, our hominin ancestors came to rely on a new system of communication— involving conventionalized combinations of vocalization and gesture (including facial expression, hand gestures, and postures)—to accomplish a broad range of purposes, including the transmission of hard-won knowledge and skill from experts to novices, and from one generation to the next. Considered in this way, teaching is not just a special kind of social behavior, which some individuals in modern times are paid to engage in and may be more or less good at. Rather, our capacity and instinct for teaching, through our species-unique medium of symbolic, syntactic language, is no less a part of what makes us human than our oversized brains, nimble fingers, and risk of falling in love with strangers.

The question we pose in this paper is what, if anything, this new account of human pedagogy has to offer the AIED community, and, more generally, the educational technology community. Does it matter, we want to ask, that the seeds of human language may have begun to germinate well over three million years ago, in the bodies and brains of some long-extinct ancestor species in the Pliocene? Does it matter that the capacity for language may have evolved, whenever it did, largely under pressure for a way of efficiently transmitting cultural knowledge and skill from experts to novices? Does it matter that for millions of years, throughout all but the tiniest sliver of recent history, teaching and learning occurred, not in classrooms, but in the context of authentic and essential tasks and group activities, such as the manufacture of foraging tools, and the use of these tools to gather and hunt? Does it matter that all children are clearly born to learn and teach— that human infants naturally assume that an adult’s communicative acts (such as pointing and naming) may be interpreted as generic pedagogical acts (Csibra and Gergely 2009, 2011), and that the capacity and inclination to use language to teach others follows a predictable developmental trajectory, becoming fully evident by the age of about four (Strauss and Ziv 2012)? Does it matter that, beginning at some point in the last 100,000 years, cultural evolution began to accelerate rapidly, quickly outpacing the glacial pace of biological evolution? As if Stone Age Rip Van Winkles, living with Stone Age brains and bodies, had nodded off to sleep in front of a campfire, then jerked awake a few seconds later in front of a computer screen?

We suspect that these things do indeed matter. In particular, we suggest that the emerging biocultural account of human teaching and learning through language may have at least the following implications for AIED practitioners and others who seek to explore the use of technology to support human pedagogy:

  1. 1.

    The ancient coevolutionary relationships among human language, brains, and teaching (and other components of the human package), quite possibly dating back more than three million years, argues for the centrality of language in education, and, in particular, for systems that support and encourage dialogue between novices and experts.

  2. 2.

    Assuming that human brains evolved, over millions of years, to teach and learn in social contexts, in the performance of authentic tasks and activities, it makes sense that modern learning environments ought to be organized, at least in part, around opportunities for relative novices to work alongside relative experts on tasks that have practical significance, i.e., are not just “school work.”

  3. 3.

    Given that teaching and learning are, and have always been, essentially social activities, conducted by individuals with social and emotional needs (e.g., needs for trust and respect) as well as purely intellectual ones, it seems clear that educational technologies need to take into account the socioemotional concerns of participants.

  4. 4.

    Given that the biocultural account of human pedagogy predicts that instances of teaching and learning in particular cultural settings will include practices representing both brain-based cultural universals and culture-specific ones, and given also the affordances of the modern internet for creating environments that bring together teachers and learners from different cultures, it makes sense that AIED practitioners should seek to exploit commonalities, and be sensitive to possible cultural differences among learners.

  5. 5.

    Given that humans have evolved to engage, by default, in behaviors that are essentially prosocial (e.g. default cooperation and trust, see Chudek and Henrich 2011), and given that computer programs are inherently amoral, it follows that AIED practitioners need to be careful to ensure that new systems for teaching and learning, especially those involving autonomous pedagogical agents, engage in and promote ethical behaviors.

Finally, as an overarching implication, we suggest that the new understanding of human teaching and learning through language as an evolutionary adaptation, rooted in millions of years of gene-culture coevolution, should be at once inspiring and humbling to the AIED community. It is inspiring to know, or be reminded, that in finding ways to apply modern technologies to the problem of helping young people acquire crucial cultural knowledge and skill, we’re dealing with a force of nature. A human brain is arguably the most powerful system for teaching and learning in the known universe. Given the right contexts, young people acquire cultural norms, knowledge, and expertise almost automatically. But that’s the humbling part. Machine intelligence, as AIED researchers know well enough, is not human intelligence, and automated learning systems are not, in themselves, human contexts. Learning environments that combine human intelligence with machine intelligence (the product of human intelligence) are likely to be more effective, efficient, and ethical than systems that rely on machine intelligence alone. At least that’s a reasonable hypothesis (see Baker 2016, discussed below).

Reading the foregoing, some readers may be thinking, “Of course, we know all this.” And to a large extent we agree. The conclusions and implications that follow from the new biocultural account of human teaching and learning for the most part support and are largely consistent with the twentieth-century sociocultural-cognitive theories of learning that have helped shape AIED research from the beginning—including Vygotsky’s social development theories (Vygotsky 1978), social learning theory (Bandura 1971), cognitive apprenticeship (Collins et al. 1991), situated learning (Lave and Wenger 1991), and social constructivism (Palincsar 1998). Moreover, the practical implications, including the importance of dialogue, the need to embed learning opportunities in authentic, interesting contexts, the importance of the learner’s socioemotional reactions and concerns, the need to ensure that autonomous pedagogical agents follow ethical guidelines, and the advantage of keeping human intelligence “in the loop,” are all fairly obvious, and not inconsistent with common and emerging practices in educational technology.

That said, like many research communities, AIED has been a bit insular, enriched but arguably limited by contributions from a fairly small set of disciplines, including cognitive science (and psychology generally), computer science, and computational linguistics. The emerging interdisciplinary understanding of human pedagogy, grounded as it is in an accumulating body of evidence from a broad range of other scientific disciplines—including anthropology, primatology, archeology, and evolutionary biology—may have much to offer AIED. Perhaps more importantly, as we suggest in the concluding section of the paper, the nature of our work puts AIED researchers in a unique position to test important hypotheses generated by the biocultural account and in other ways contribute to a new, more rigorous, interdisciplinary science of human teaching.

The rest of the paper is organized in the following way. First we summarize the main claims of the biocultural account of teaching and learning. We then walk the reader through a single aspect of this account—the evolutionary relationship between teaching, language, and technology— focusing specifically on the likely antiquity of this relationship. As we hope to show, while the issue of language origins is interesting on its own (and notoriously open to armchair speculation), it also provides a useful and important way of thinking about how the natural human capacity for teaching and learning through language is best understood as part and parcel of the complete human package. We end the paper with brief discussions of each of the implications listed above, and how they may relate to the AIED research agenda.

Because this new way of thinking about human teaching and learning is still at an early stage, and because this is, to our knowledge, a previously unexplored area for AIED, we seek nothing more here than to offer a brief primer based on on our own understanding of the central claims of the new account, suggest some of the implications, and in this way make an opening move in what we hope will become a useful and important discussion. Quite likely readers will disagree with at least some of the arguments we make, and will find errors to correct. Needless to say, any such contributions will be welcome.

The Biocultural Account of Human Pedagogy

As we claimed at the outset, recent contributions from a broad range of scientific disciplines amount to an emerging new account of human pedagogy as an evolved trait, fully embedded in the human “adaptive suite” (Bartholomew 1970; Lovejoy 2009), a unique way of being in the world that traces back to the origins of our lineage in deep time, at least as far back as the split from the chimpanzee lineage some 5–7 million years ago (Kumar et al. 2005). Key claims include the following:

  1. 1.

    Although other animals (e.g., meerkats, cheetahs, chickens, hawks, and many other primates) are known to modify their behaviors, at some cost to themselves, in ways that help their young acquire important knowledge and skill, only humans intentionally instruct offspring through the medium of language (for a discussion see Hoppitt et al. 2008; Thornton and Raihani 2008).

  2. 2.

    This uniquely human capacity for teaching through language is an integral part of a suite of phylogenetic adaptations that must have coevolved, in a sequence of tiny steps, with occasional leaps forward (owing to confluences of genetic and environmental events), over millions of years.

  3. 3.

    These adaptations include, but are not limited to:

  1. a.

    a body plan built on but distinct from that shared by other great apes, including committed bipedalism, hands with “precision” and “power grips” associated with tool use (Skinner et al. 2015), hidden ovulation in females, a relatively “gracile” (slender) skeletal structure with reduced dimorphism (males and females more similar in size than in other primates); and an oversized brain, with relatively more brain tissue (and therefore more energy) devoted to cognition than to other systems, such as eyesight or sensorimotor control (see, e.g., Aiello and Wheeler 1995; Isler and Van Schaik 2014).

  2. b.

    an expanded (ultimately, global) foraging territory, associated with a marked dietary shift toward high-quality, nutrient-dense, difficult-to-acquire food resources, enabled in part by male provisioning of females and their offspring (Kaplan et al. 2000).Footnote 1

  3. c.

    a radically altered life history, marked by a substantially increased lifespan, extended period of juvenile dependence, and support of reproduction by post-reproductive females (Bogin 1990; Kaplan et al. 2000; Schwartz 2012);

  4. d.

    an enhanced cognitive predisposition to engage in cooperative, prosocial behaviors, including acquisition and enforcement of group norms (Fehr and Rockenbach 2004; Chudek and Henrich 2011; MacLean 2016);

  5. e.

    enhanced neural (Theory of Mind, ToM) circuitry that allows humans to “read”’ the minds of conspecifics (Call and Tomasello 2008; Premack and Premack 2002; Premack and Woodruff 1978; Schaafsma et al. 2015);

  6. f.

    a unique system of communication, allowing symbolic reference through systematic combinations of gestures and vocalizations (e.g., Deacon 1997; MacWhinney 2005; Pinker 1997; Pinker 2003; Pinker and Bloom 1990);

  7. g.

    a natural capacity and disposition, emerging in infancy, to treat utterances of others as having a communicative intent (Csibra and Gergely 2009), and— following a predictable developmental trajectory that begins in infancy, emerges full blown in childhood, and continues throughout adulthood—a natural capacity and inclination to help others acquire critical cultural norms, knowledge, and skill, i.e., to teach (Strauss and Ziv 2012).

  1. 4.

    Although this cluster of interrelated adaptations sets our species very much apart from other animals, the biological systems that make human pedagogy possible must have been built from systems shared, to varying degrees, by other primates (e.g., see De Waal 2016; Levinson 2016; MacWhinney 2005; Seyfarth and Cheney 2013; Whiten and Erdal 2012), and, looking more deeply into the past, by common ancestors— in other words, there is no abrupt non-Darwinian “discontinuity” between humans and other animals (c.f. Penn et al. 2008).

In sum, these adaptations created the possibility of cumulative cultural transmission, the “ratcheting” effect (Tomasello et al. 1993; Tomasello 1999, 2014) whereby innovations created by experts in one generation have been passed along to novices in upcoming generations, some of whom in turn become innovators and master teachers. In this way, over millions of years, and for better or worse, assault rifles have replaced spears in some societies, and humans have come to dominate the planet.

This view of human pedagogy as a species-unique biocultural phenomenon with deep roots in human evolution is surprisingly recent. While largely consistent, as suggested above, with many twentieth-century learning theories, it draws on a gathering body of evidence from a range of scientific disciplines, and, as it turns out, lends strong support to several prevailing educational philosophies. Most importantly, as discussed below, knowing that teaching and learning through language is a deeply human behavior, a product of millions of years of nature’s tinkering, may cause us to think at least just a bit differently about our own role as designers of intelligent educational systems.

On the Antiquity of Language and Its Relationship to Teaching

A central claim for the biocultural account of human pedagogy is that teaching depends on language, which may have arisen for that very purpose (e.g., see Laland 2016). However, estimates for the timing of the emergence of language, and therefore the possibility of teaching through language, range by orders of magnitude—from as recently as the last 100,000 years (some 200,000 years after the first appearance of anatomically modern humans and “modern behaviors” some 250,000 to 300,000 years ago; McBrearty and Brooks 2000; Richter et al. 2017) to as far back as 3.3 million years ago, coincident with the earliest evidence (to date) of stone tool manufacture (Harmand et al. 2015), and, as discussed below, perhaps even earlier, at some point soon after the divergence of the human and chimpanzee lineages, likely some 5–7 million years ago.

That is a pretty significant difference, with potential implications, it seems, for how we think about the design of modern (human-engineered) learning environments. For if language emerged within just the last 100,000 years, then you could think we are just apes with brains that happened to get upgraded for language as a kind of happy accident. In this way of thinking, language is just a set of rules for combining meaningful symbols, and, since computers are especially good at manipulating symbols according to sets of rules, even more rapidly and consistently than human brains, then you might suppose that the problem of programming a machine to interact in some meaningful way with human brains, while clearly challenging, is at least tractable. But if the story of language, and teaching through language, begins at some point millions of years ago, then you might have to think there’s a more fundamental, organic relationship between brains, language, teaching, and other aspects of the human package. If human pedagogy is really that ancient, then it seems likely that the capacity for language, and the inclination to use language to help offspring acquire life-sustaining knowledge and skill is not just a “module” somehow grafted onto the brain of an ape, but a deeply embedded, highly complex biological system, intricately connected with everything else that makes us human. And if language, and teaching through language, is enabled by something more than just a mental communication device, then we may have to think somewhat differently about the design of devices aimed at helping humans learn.

Since the timing of the emergence of language in humans is a matter of some debate, with wildly differing estimates, and since the evidence and reasoning behind different scenarios has a bearing on the broader relationship between teaching, language, material culture, and social relationships—all issues of importance to the AIED community—we think it may be useful to walk through the different scenarios and the reasoning behind them. (Readers who are already familiar with these issues may safely skip these sections.)

What Do We Mean by “Language?” What about “Teaching?”

Before going further, we need to acknowledge that debates about the origins of language, and teaching through language, are complicated by a lack of agreement concerning the meaning of these terms. Quite a lot of typing has been devoted to the problem of which other animals, if any, engage in teaching (e.g., see Hoppitt et al. 2008; Thornton and Raihani 2008), and somewhat less to the question of whether teaching is found in all human cultures, or only some (Atran and Sperber 1991; Lancy 2010). To make things simple, let’s just agree that by “teaching” we refer only to teaching in humans, a special kind of joint attentional activity (Bruner 1972; Tomasello 1999; Tomasello and Todd 1983), focused on some object of joint attention (Bruner 1981), in which a relative expert intends to help one or more novices acquire some new belief or skill concerning the object of joint attention, through one or more intentional communicative acts (e.g., some combination of gesture and vocalization). By this definition, for example, an expert pointing to a novice’s method of shaping a wooden spear and saying “That’s right...” qualifies as teaching. However, if the expert is simply sharpening the spear herself, without concern for whether the novice is watching, this would not constitute teaching, even though the novice is indeed watching and learning.

Let’s agree also that by “language,” we mean human language— a brain-based system of communicating beliefs, desires, and emotions that employs conventionalized combinations of vocalizations and gestures, including facial expressions, hand gestures, and body posture. Importantly, we want to include both language in its highly complex modern form (in which a finite set of sounds, combined in accordance with a finite set of rules (syntax), can be used to represent an infinite number of meanings) and earlier, less complex versions, sometimes referred to as “protolanguages” (Bickerton 1992), which may have had less complicated syntax, or none at all. Whether the features of full modern language evolved only gradually, or in a small set of sudden leaps is, for our purposes, less important than that human language has a history, and that this history is thoroughly entangled with the history of human evolution.

At What Point Did Human Language and Teaching First Emerge?

For reasons explained above, we’d now like to walk through some recent thinking regarding the point at which language, and teaching through language, first evolved as a significant factor in the course of human evolution. A review of the relevant literature reveals four broad schools of thought.

Modern Onset (<250,000 years)

The most recent date, probably best described as a staunchly-held minority position, rests precariously on a narrow linguistic definition which considers the only truly unique aspect of human language to be the faculty of recursion (“unbounded Merge”), which is imagined by some to have arisen fewer than 100,000 years ago (see Chomsky 2010; Hauser et al. 2002; Hauser et al. 2014; and, for critiques, Behme 2014; Jackendoff and Pinker 2005). At around this time, as Chomsky puts it, “a rewiring of the brain took place in some individual, call him Prometheus, yielding the operation of unbounded Merge...” (Chomsky 2010:59). A slightly earlier date is proposed by Bickerton (2007), who associates the emergence of “full language” with the first appearance of H. sapiens, recently dated to some 300,000 years ago (Richter et al. 2017).

Support for a relatively modern onset, though not typically cited by its linguist advocates, may be found in a combination of geological, archeological, and DNA evidence. Taken together, this evidence suggests that something important did indeed occur within a particular population of anatomically modern humans somewhere in East Africa roughly 80,000 to 60,000 years ago (see Lahr and Foley 2016; Mellars 2006). For one thing, this was a period of sharp oscillations in annual rainfall, which would have created selection pressure for groups that were capable of devising solutions to this challenge. Indeed, the DNA evidence suggests a population bottleneck in this group (down to just a few thousand individuals), followed by a rapid expansion, and subsequent migration out of Africa and into Europe, and Asia, with humans reaching Australia, apparently by boat or raft, some 40,000 years ago. The archaeological record also points to a dramatic increase in the complexity of the “technological, economic, social, and cognitive behaviors” in these populations at around this time (Mellars 2006:9381). In this view, then, language emerged fairly suddenly, perhaps in a single individual, as a kind of genetic accident, and quickly spread through the existing population of modern humans in Africa, who, aided by the affordances of the new system of communication, traveled out into the rest of the world, eventually replacing existing hominin populations (Neanderthals, Denisovans, and Homo erectus), and causing the extinction of numerous other species, including wooly mammoths, mastodons, giant ground sloths, etc. (for a recent account of these extinctions, see Harari 2014:63–74).

Recent Onset (~500,000 years)

A more widely held view regarding the timing of the origin of language, quite possibly the consensus view at this time (2017), puts it just a bit earlier (in evolutionary time) at around 500,000 years ago, in Homo heidelbergensis, generally considered to be the last common ancestor of modern humans and Neanderthals. Support for this earlier date includes both biological and archeological evidence. For one thing, reconstructions of vocal tract and ear anatomy, together with the presence of human versions of the FOXP2 gene (known to be involved in the fine motor control of speech organs), suggests that H. heidelbergensis (and therefore their descendents, the Neanderthals), would have been at least capable of articulated speech (Dediu and Levinson 2013). Also, the archeological record shows that these ancestors, whose energy-hungry brains were already twice the size of a chimpanzee’s (1200 cm3; Rightmire 2004), were using fire to cook, and manufacturing carefully shaped wooden spears, some with hafted stone points.

The Neanderthals, who emerged in their “standard form” in the Middle East some 200,000 to 250,000 years ago, then spread north and east, apparently inherited and refined these practices (though not substantially), and thereby managed to eke out a precarious existence in the harsh, subArctic climate of Western Europe and Central Asia until some 40,000 years ago. Neanderthals almost certainly had sewn skin clothing and footwear. They hunted large animals cooperatively, hafted stone tools with pitch heated with fire, and also used fire for cooking meat and various kinds of starchy foods. They buried their dead, possibly with ritual offerings, and took care of elderly and infirm members of their groups (for a full and entertaining account of what is known about Neanderthal life, see Wynn and Coolidge 2011). Indeed, although Hauser et al. (2014) refer dismissively to the notion of “talking Neanderthals,” it is hard to imagine how these human relatives could have lived in this way, and how they could have passed down their material culture from one generation to the next, without some form of language (see Dediu and Levinson 2013, and further discussion below.)

Ancient Onset (~1.8–3.3 million years)

But could language have emerged even earlier? The argument for a much earlier date leans partly on the idea that certain kinds of technical expertise, notably the manufacture of stone tools, cannot be easily acquired by simple observation alone, but is learned more efficiently when taught by experts, creating selection pressure for the emergence of some form of language (Watch. Do it like this...). To use a commonly-cited example, the key skill in stone tool manufacture is called “knapping,” a process which exploits the fracture properties of flint and other such rocks with a crystalline structure susceptible to easy splitting. The process involves selecting a suitable rock (a “core”), then striking it, with a certain amount of force and at a certain angle (about 60 degrees off-center), to detach a sharp flake, itself useful as a small knife. The detached flake leaves a scar, also sharp at the edge, and if the core is rotated slightly and struck again at just the right place with suitable force, another flake is detached and the scar is extended. If the process is repeated on both sides, you get a bifacial (two-sided) cutting tool—a chopper or hand axe.

As of July 2017, the earliest stone tools that show evidence of intentional knapping date back to 3.3 million years, from a site in West Turkana, Kenya (Harmand et al. 2015). Before this discovery, the earliest evidence of stone tool manufacture was dated to some 2.5 million years ago, considered to mark the start of what has been termed the Oldowan Tool Industry. Importantly, stone tools from the Oldowan onward remain relatively unchanged for some 700,000 years, until finally giving way to the Acheulean Industry (at 1.7 million years ago), when human ancestors (Homo habilis, “handy man”) began producing somewhat more sophisticated artifacts, requiring intentional reshaping of the core to achieve a predetermined form, and possibly satisfying aesthetic concerns (Stout et al. 2008). The Acheulean Industry itself remained stable for at least another million years before evolving into the substantially more sophisticated methods associated with the Mousterian Industry and the Neanderthals (for overviews, see Ambrose 2001; de la Torre 2011).

The specifics of these different “industries” raise at least three important questions relevant to the antiquity of teaching through language. First, could the knowledge and skill evidenced in the products of these different methods of manufacture have been acquired through simple social learning, i.e., novices carefully observing and imitating the work of experts, or, rather, does the evidence not imply intentional instruction, and therefore at least a rudimentary form of language? Second, given that these industries, over long periods of time, seem to show step increases in sophistication, and assuming that at some point the complexity of knowledge and skill reached a point at which learning through observation and imitation was no longer possible, creating selection pressure for a new way of helping novices acquire this skill, when did that occur? Third, what explains the apparent stasis in these industries between the apparent step increases in complexity? Why did methods of tool manufacture remain essentially unchanged for millions of years?

Experiments with modern human subjects, while in no way definitive, suggest that language does indeed make it easier to teach knapping. In one such study, conducted by the anthropologist Thomas Morgan and his colleagues (Morgan et al. 2015), graduate students were taught Oldowan knapping techniques, then asked to teach other students using one of five different methods: (1) “reverse engineering” (subjects had to figure out how to knap just by inspecting an example); (2) “imitation/emulation” (subjects observed a demonstration, then tried themselves, without any instruction); (3) “basic teaching” (the expert carefully demonstrated the skill and in some cases physically guided the learner’s hands); (4) gestural teaching (teachers and students were free to interact using any gesture they chose but could not talk; and (5) verbal instruction, meaning that students and teachers could both gesture and talk. As it turned out, and as one might suspect, teaching, and especially verbal instruction, gave the best results. Learners who received explicit instruction, especially verbal directions, produced a greater percentage of high-quality flakes, and were able to work more rapidly, than learners who were left to learn from a physical example (“reverse engineering”) or by observing the expert, but without help.

Having produced evidence suggesting that access to language for intentional instruction does indeed make it a bit easier to help novices (or at least novices with modern brains and language) acquire skill in knapping, the authors suggest that the lack of progress from the first appearance of crudely sharpened tools at around 2.6 million years ago (recently pushed back to 3.3 million years ago, as noted) and the relatively sudden emergence of the more sophisticated Acheulean Industry at around 1.7 million years ago, could be explained by the emergence of a protolanguage at this time (Morgan et al. 2015). Supporting evidence comes from brain scans of contemporary subjects engaged in manufacture of Acheulean hand axes, which show activation of a brain region associated with discourse-level prosodic and contextual language processing (Bookheimer 2002; Stout et al. 2008).

Ancient Roots (>3.3 million Years)

But suppose that some form of language was indeed available for teaching some three million years ago. What came before? As it turns out, there are a number of reasons to at least entertain the possibility that the roots of language, and teaching through language, extend even deeper in time, to some point before, and possibly well before, the first appearance of intentionally sharpened tools. First, note that to this point we’ve been making an assumption about the relationship between teaching, language, and the “sophistication” of material culture. The idea is that material culture, as preserved in the archeological record of the various stone tool industries, serves as a kind of marker for the presence of language. Thus, for example, the more than 1.5 million years of apparent stasis that separate the first appearance of intentionally sharpened tools at 3.3 million years ago from the emergence of the more sophisticated Acheulean tools at 1.7 million years is explained by the lack (or insufficiency) of language; a step increase in the communicative power of existing forms of language might explain the emergence of the Acheulean; and another step, say recursion (which substantially increases the efficiency and referential power of language by allowing multiple embedding, as in “the animal that killed the animal that we found yesterday”), may explain the dramatic acceleration in the evolution of material culture (specifically, technology) which seems to have begun some 60,000–80,000 years ago (Mellars 2006), and continues to this day.

While there’s certainly some justification for this way of thinking—clearly there’s a relationship between technical expertise and the sophistication of language, itself a kind of tool (see Sinha 2015)—the relationship between language and material culture, as preserved in the archeological record, must be only part of the story. For one thing, while stone tools indeed provide markers of technical expertise, they constitute a biased sample. The use of sharpened sticks, nets, deadfalls, and other foraging tools made of perishable organic materials might have provided an even earlier record of technical expertise, if only these things had been preserved. We know that human ancestors were systematically exploiting the fracture properties of certain kinds of rock 3.3 million years ago because we have the evidence. But what other tools might they have had at their disposal and when did they start using them? The absence of stone tools prior to this point need not imply the absence of other technical expertise, nor the absence of language. Language, after all, has many useful purposes for a group of social foragers. It can be used to give directions to food sources, to recruit assistance, to give names to individual people, places, and things, and to make useful distinctions, such as between edible and non-edible plants, or between rocks that fracture nicely, and those that don’t.

Another shaky assumption is that once a group has access to language, technology will inevitably become more sophisticated. This is clearly incorrect. If it were true, and if language were indeed the primary catalyst for the step increase in sophistication of stone tool manufacture at around 1.8 million years ago (the Acheulean), then what explains the subsequent plateau for the next million years or so? One explanation might be that the version of language that made the Acheulean industry possible was only just good enough, and good enough was plenty. The same is true of tools generally, which tend to persist in their established form and structure until something better comes along. The pencils we use today (but not the pens) are very nearly identical to the ones we used in elementary school in the 1950s. Modern skyscrapers in Hong Kong and other Asian cities are, to this day, constructed using a system of bamboo scaffolding developed in China some 5000 years ago (Chung and Yu 2002).

But an even more important problem has to do with the relationship between the use of tools and the social context of their use. Here it is useful to compare tool use in humans and other primates, especially chimpanzees, whose use of tools has been studied extensively (for a review, see Wynn et al. 2011). Chimpanzees, it turns out, employ a fairly extensive set of foraging tools, including stout, sharpened sticks for spearing small mammals in their burrows, and also for poking holes in termite mounds; slender sticks, sometimes with intentionally roughened tips, for fishing termites out of the holes; leaf sponges for soaking up water from holes in trees; and stones for cracking open nuts using a hammer and anvil technique. Importantly, these technologies are transmitted culturally (as evidenced by differences in use within and across groups), through social learning—but not intentional instruction. Young chimpanzees, for example, accompany their mothers on termite-fishing expeditions, and those that do so most frequently achieve a more advanced level of skill, apparently just by observation and their own trial-and-error learning (Matsuzawa et al. 2008). Mother chimps have been observed providing an infant with a suitable twig for termite fishing, but it seems this is nearly always in response to the infant’s own begging. And as infants grow into juveniles, mothers become less tolerant, and less helpful, to the point of chasing their offspring away. Chimpanzee mothers rarely, if ever, as a human mother would almost certainly do, provide direct assistance, correct a youngster’s technique, or call attention to a particular aspect of their own technique (for an apparently rare exception in chimpanzees, see Boesch 1991). Why not?

It seems that the best answer we have so far is that non-human primates never evolved the full cognitive circuitry, sometimes called the Theory of Mind (ToM) network (Schaafsma et al. 2015), that underlies our human ability to share attention and engage in joint attentional activities (Bruner 1972; Tomasello 1999; Tomasello and Carpenter 2007; Tomasello and Todd 1983). This is not to say that other primates, or indeed other animals, do not have the ability to read minds, in the sense that they can predict, with reasonable accuracy, another animal’s future behavior based on its past and current behavior. In fact, as social animals, chimpanzees and other primates have an advanced capacity for social cognition, which includes an especially clear understanding of others as autonomous, intentional beings. Chimpanzees also have the ability and inclination to influence the behavior of others communicatively, through combinations of vocalizations and physical gestures, even including something very much like turn-taking in humans (Fröhlich et al. 2016; Levinson 2016). But chimpanzees don’t have language, are not capable of sustained joint attentional activity, and don’t teach in the way that humans do (see Tomasello and Carpenter 2007). So, the question is why humans alone evolved the more complex ToM network, and the more complex system of communication that allows us to give voice to our individual beliefs and desires, and to better understand, and influence, the beliefs and desires of others— the system that has become modern language, and which, together with enhanced ToM, makes teaching possible.

The Socio-Cognitive Niche: The Selection Context for Human Teaching through Language

Traits as complex as a capacity for linguistic communication, and the biological systems that make it possible, including advanced ToM circuitry, do not emerge out of nowhere, but must instead be built from existing systems (see Jacob 1977). Modern language, as Chomsky suggests, may indeed have come about as a “rewiring” of the brain, but the brain must already have been wired in a certain way if the rewiring was to work, and this new wiring must have remained functionally connected with other systems only indirectly connected with language, e.g., chest muscles for breath control. Also, new traits do not emerge inevitably, even in animals that could potentially benefit from having such traits. Rather, traits evolve in response to unlikely combinations of genetic accidents, selection pressures, and environmental opportunities. So, what was the selection context that favored individuals with enhanced mind reading, language, and teaching through language?

The full story is lost in time. Sound waves quickly dissipated. Most of the physical evidence is still buried or rotted away. But we do know that something important must have happened between Time A, some 5–7 million years ago, when hominins and chimpanzees embarked on our separate evolutionary paths, and Time B, some 3.3 million years ago, the earliest known date (so far) of stone knapping. We know that at around this time in Africa, and throughout the world, forests were shrinking and grasslands expanding, tended by herds of large grazing mammals who, by trampling trees and fertilizing the soil with their dung practiced a kind of unintentional agriculture, and who were in turn tended (herded) by their predators (Bobe and Behrensmeyer 2004; Strömberg 2011). It was, it now seems, within this environment that groups of distant human ancestors gradually abandoned the safety of trees, and the security of small foraging areas with their regular supplies of fruit, green leaves, and insects, and began to pursue a more dangerous, itinerant life first on the forest edge (where our cousins the baboons still linger), then out into the expanding grasslands, full of new opportunities and dangers. The new niche selected for a new set of traits, built from systems inherited from our last common ancestor with chimpanzees, but ultimately resulting in a markedly different biological package. In addition to committed bipedalism, these traits eventually came to include hands shaped for tool use, concealed ovulation, pair bonding, cooperative breeding, decreased dimorphism, a diet based on nutrient-rich but difficult-to-acquire food sources (such as the meat of other animals), and several related life history changes, including a lengthened period of juvenile dependency, shorter birth intervals, an increased lifespan, and, in females, a post-reproductive period (menopause); and, not least, an increasingly oversized, energy-hungry brain.

Although it is not known which of these new traits was the initial catalyst (bipedalism is a reasonable bet), it seems not to matter. Once the rudiments of this new package began falling into place, a multi-causal, coevolutionary process would have kicked in. As it does with any animal that creates a new niche for itself, the new niche would have begun to select for individuals with genetic profiles that increased their fitness (and, in the case of sexual selection, perceived fitness) within the new niche. This would not have happened all at once, nor, once the process began, would it have led to unbridled “progress.” Rather, changes in any given trait (say, increased brain size) would have been limited by prevailing constraints, such as an ability to extract sufficient nutrients from the environment (e.g., see Broadhurst et al. 1998).

Among other things, the new niche would have begun to select for individuals with traits that enhanced their own fitness, and ultimately the group’s fitness, for engaging in the new social arrangements, notably pair bonding, cooperative foraging, and expertise associated with cooperative foraging in the food-rich but dangerous new habitat. This context must have favored, in a way that niches inhabited by other apes did not, mind reading and social communication. The trend toward pair bonding, and the benefits it brought, would have favored individuals with even slightly enhanced abilities to understand the beliefs and desires of mates, and to influence these desires and beliefs through communicative acts. The trend toward cooperative foraging and exploitation of nutrient-rich, heavily defended food sources would have favored groups with individuals who were even slightly better at engaging in joint attentional activity, and in coordinating these activities, through communicative acts. And the emergence of a new system of communication involving conventionalized combinations of vocalizations and gestures would have selected for individuals with brains that were even slightly better at acquiring and manipulating the evolving system.

In summary, while language in its full modern form, with recursive syntax, may well have emerged only within the last 250,000 years, a highly functional form of language (sufficient to support the rich social life and sophisticated foraging techniques of the Neanderthals) was quite likely available at least half a million years ago, in H. heidelbergensis, the common ancestor of Neanderthals and modern humans. Further, the considerable knowledge and skill evidenced in the intentional manufacture of stone tools from more than three million years ago, and the nature of the foraging lifestyle it implies, suggests that language, and teaching through language, may have an even earlier origin. In any case, we should probably not be surprised to learn that it may well have taken nature millions of years to build, from precursor primate parts, the complex system of muscle, cartilage, bone, teeth, and neural circuitry that underlies our astonishing capacity to affect each other’s hidden desires, beliefs, and knowledge simply by producing certain combinations of sounds and gestures in the context of joint attention. It is increasingly hard not to believe that language— the basis for teaching— and teaching itself, are both truly ancient.

Implications for AIED

We now turn to the implications that this new account may have for the AIED research community. Let’s review the basic claims:

  1. 1.

    Human teaching, a joint attentional activity enacted largely through the medium of language, is a species-unique adaptation, quite likely rooted in millions of years of hominin evolution.

  2. 2.

    Language, and teaching through language, did not evolve independently of other traits; rather, the capacity for natural human pedagogy must have coevolved with other interrelated and unique components of the human package—including our bipedality, Swiss-Army-knife hands, pair bonding, prosociality, unique life history (including extended juvenile period), dependence on high-energy diet, and oversized, energy-hungry brains.

  3. 3.

    Teaching and learning through language are inextricably and naturally bound up in each other; human infants begin acquiring whatever language their group speaks from birth, attend carefully to their caregiver’s communicative acts (such as pointing and naming), and assume that these convey accurate generic information. Children themselves begin to teach other children at around the age of four, and continue to teach and learn throughout the juvenile period, into adulthood.

  4. 4.

    While the natural capacity and inclination to teach and learn through language evolved in evolutionary time, and is therefore a cultural universal, the particular forms that teaching takes have evolved in historical time, and therefore may be expected to differ across cultures.

  5. 5.

    The capacity for human pedagogy as a biological trait likely evolved over millions of years, and has changed relatively little, if at all, in the past 200,000 years. However, beginning about 60,000 years ago, the pace of cultural change, including technological change, began to accelerate rapidly, and has continued to accelerate ever since. Women and men with Stone Age brains built the systems that put men on the moon.

Individual AIED researchers and research groups will clearly need to decide for themselves what implications, if any, this new account of human pedagogy may have for their work. What we offer here are no more than some initial thoughts on a few selected topics we ourselves find especially interesting.

The Centrality of Dialogue in Human Pedagogy

From the beginning, AIED researchers have recognized the centrality of language in teaching— in particular, the importance of expert-novice dialogue— and have long sought to create and refine systems that engage learners in some form of natural language conversation (e.g., Carbonell 1970; Graesser 2016; Heffernan and Koedinger 2000; Nye et al. 2014; Rosé et al. 2001; Rus et al. 2013). In calling attention to the long coevolutionary relationship between teaching and language, the emerging biocultural account of human teaching summarized here at once justifies this emphasis on dialogue-based systems, and at the same time reminds us (not that we necessarily need to be reminded), what a daunting challenge it is to build a computer-based system capable of engaging in any meaningful way with a human brain.

Although some researchers have wondered in print why teaching through one-on-one dialogue is so effective (see Graesser et al. 2011:412), the emerging biocultural account of human pedagogy provides a solid answer to a good question. Human brains are designed not just for dialogue (Garrod and Pickering 2004), but for teaching and learning through dialogue. It is wrong to think that teachers have commandeered language for the purpose of teaching; rather, over millions of years, partly under pressure for an efficient means of cultural transmission, language and human brains have apparently shaped each other into the unique, multipurpose system that has made teaching possible. In this way, by tinkering with off-the-shelf primate systems for mind reading and communication, nature has found a way to help humans overcome the cognitive limitations of social learning through observation and imitation alone. With our enhanced theory of mind, we can think, as chimpanzees apparently cannot, “She doesn’t know how to hold the rock.” With language, we can say, “Not like that. Like this.”

Given this, the question we want to raise is not so much whether it is possible to write computer programs capable of interacting with human learners in something like the way human teachers do, but whether it is worth the effort, and given that it is, how much effort it is worth, and what sorts of efforts make the most sense. The challenges are obvious and well-known. For one thing, however simple they may have been at the outset, over evolutionary time, and more recently, over historical time, human languages have become not only astoundingly complex, but astoundingly complex and ill-formed— full of irregularities, redundancies, and arbitrary distinctions, as if designed by a crazy person. In spite of this, all normal human children readily acquire the fundamentals of whatever crazy language or languages are spoken by the other humans in their lives, without much difficulty, for the simple reason that it could not be otherwise. Languages that were unlearnable by children would not persist (Deacon 1997). Needless to say, the same does not apply to computer programs and the machines they inhabit and run, which have to deal with the craziness without the benefit of millions of years of language-brain coevolution.

Children, of course, have additional advantages. Not only were their brains designed for language acquisition, and language for their brains, but children have the advantage of acquiring language in the context of real-world interactions with other human beings, who naturally view the exchange as a joint activity, and do their best to make the communication work, for example, by correcting their interlocutor’s ill-formed utterances (if only as a check on understanding), asking for clarification, agreeing or disagreeing (and thereby giving indirect feedback), and so forth (see Snow 1977, on mother-child language interaction). A child’s interlocutors do not need to teach the child how to speak, and could not if they tried, but they can help her learn. As discussed below, computer programs designed to engage learners in natural language dialogue may benefit from similar help.

Another issue is whether a pedagogical agent needs to have a theory of mind, and if so what that might look like. In humans, an understanding that interlocutors have their own hidden desires and beliefs, and that these can be guessed at, confirmed, and manipulated through language is fundamental to both language and teaching. In an ITS, it seems that the closest thing to a theory of mind is the learner model, the data structure that stores the program’s current estimate of what the learner knows and can do, along with information about the learner’s current affective state, preferences, and so forth (for recent discussions, see Desmarais and Baker 2012; Pynadath and Marsella 2005). While learner models are of obvious educational benefit (e.g., in estimating and tracking a learner’s progress toward mastery of a domain), it is not clear to what extent, or how necessary it is, that a learner model can help steer the move-by-move course of a dialogue in the way that a human interlocutor’s ToM circuitry does (but see Chounta et al. 2017).

It seems that these fundamental problems—which sum to the simple problem that nature builds brains, not computer programs—point to at least one obvious solution, which is to let language itself, and language users, do the hard work. While a full review of recent advances in natural language processing is neither possible nor appropriate in this already longish paper, we trust it will be sufficient to note that after decades of modest progress, researchers are beginning to get traction from at least two new leverage points: supervised and unsupervised machine learning based on data mining of large natural-language text corpora (Boyer et al. 2011; Morrison et al. 2014, 2015; Vail and Boyer 2014); and systems that learn in the context of dialogue with human interlocutors (e.g., Weston 2016). In both cases, computer algorithms acquire elements of linguistic competence from humans, as humans do, but of course in different ways.

Teaching and Learning in the Context of Authentic Work

For almost all of human prehistory, i.e., throughout the Pleistocene, our ancestors were cooperative foragers, living in small, mobile bands of no more than 50 closely related individuals, loosely connected with other such bands (Pearce and Moutsiou 2014; Zhou et al. 2005). As we’ve seen, technologies were initially relatively simple, but did not remain so simple that children could easily acquire necessary survival skills simply by imitating the actions of adults. Rather, at some point, continuing group survival, and continuing technological development, would have been limited by a young person’s ability to acquire, and an adult’s ability to teach, relatively complex skills such as stone tool manufacture (Högberg and Gärdenfors 2015; Morgan et al. 2015; Stout et al. 2008), animal tracking (Hewlett 2016; Hewlett and Roulette 2016; MacDonald 2007), cooking, and identification of edible plant species (Atran and Medin 2008; Hewlett 2016; Hewlett and Roulette 2016). In short, children learned, from each other, and from adults, in the context of participation in authentic work, successful performance of which had obvious importance to the larger group.

But while our human brains, along with our biologically-inherited inclination and capacity for both teaching and learning, haven’t changed much at all in historical time, cultural contexts, social structures, learning ecologies, and language itself have all changed dramatically. Beginning at about the age of seven, children around the world now start spending large amounts of time each day in classrooms, away from their parents and the practical world of work, taught basic academic skills, cultural knowledge, and, eventually, technical skills— all in the relative safety of schools. Their teachers are typically not part of their immediate social group, may not be members of the local community, and are usually not themselves working practitioners of the skills they’ve been hired to teach.

Going at least as far back as John Dewey (e.g., see The Child and the Curriculum, Dewey 1902), educational reformers have directed attention to the problem of reconciling the natural capabilities, concerns and interests of children with the established purposes and methods of schooling. At their best, which is very good, modern schools open windows onto the world beyond the classroom, revealing (largely through books, and more recently through various other media) mind-altering vistas of space and time that take young people well beyond the narrow limits of their own neighborhoods and personal experiences— opportunities that were, needless to say, unavailable to children in the Pleistocene. At their most helpful, teachers in modern schools engage young people in intelligent and interesting conversations that help them extend and expand their growing knowledge and understanding of the larger world, while at the same time helping them connect new ideas and ways of thinking to their own experiences and understandings. Good schools and good teachers also help their students acquire important technical skills, including the skills of reading, writing, mathematics, and scientific inquiry. And in some schools, “full-service community schools” (Dryfoos 2005), school buildings are not places apart from the community, but round-the-clock community centers, providing, among other services, “...primary-care health clinics, dentistry, mental health counseling and treatment, family social work, parent education, enhanced learning opportunities, community development, and whatever else is needed in that school community” (Dryfoos 2005:8). In short, there is nothing inherently wrong with schools as safe places to learn about the world beyond, and to begin developing the skills required for full and productive participation in that world.

But, at their worst, schools can be, to put it in a child’s way, BORING! places, largely disconnected from a young person’s own interests and experiences— problems which may well have been exacerbated in the United States by the recent emphasis on high-stakes assessments (e.g., see Kohn 2000; Mora 2011). Although this may be changing, it is probably still the case that in the majority of classrooms the focus of discussion is on producing correct answers to the teacher’s questions following the persistent “initiate-respond-evaluate” discourse pattern (Cazden and Beck 2003; Mehan 1979; Sinclair and Coulthard 1975). Deep conceptual understanding and application of complex skills to the solution of ill-formed problems often takes second place to correct answering. And with something on the order of 15–20 students in a classroom, or more, and a tendency for a subset of more active participants to dominate discussions, many students get left on the sidelines.

For these reasons— the great potential that modern schools offer as safe, organized places for learning and their well-known shortcomings— schools are clearly appropriate contexts for AIED research and development. Broadly speaking, and viewed from this perspective, AIED systems fall into two categories. In the first, we may include systems that seek to work within the prevalent school discourse paradigm, focused on helping students produce correct (expected) solutions to well-structured problems. Examples arguably include the majority of intelligent tutoring systems, including those whose purpose is to help students through a series of problem-solving steps leading to a correct answer (e.g. ANDES; Vanlehn et al. 2005; the Cognitive Tutor; Anderson et al. 1995; Ritter et al. 2007); simpler systems, also focused on correct answering, which combine traditional assessments with hints and explanations (e.g., ASSISTments; Koedinger et al. 2010); and, as in the case of AutoTutor, systems designed to elicit and evaluate an expected answer to an open-ended question (Nye et al. 2014; Graesser 2016). Systems that depart from the correct answering paradigm in some way (and there are many of these) include those that help students work through solutions to ill-structured problems (Lynch et al. 2012); those that support collaborative knowledge building and inquiry (Bielaczyc and Ow 2014; Rummel et al. 2016; Soller et al. 2005); systems that simulate real-world work environments (e.g., Shaffer 2006; Shaffer and Gee 2005); those that feature “serious games” (Boyle et al. 2016; Girard et al. 2013; Shute et al. 2013), and those that involve various kinds of peer interaction (see below).

Certainly there is room for both types. It is hard to deny the potential usefulness of systems that help students learn how to produce correct answers, especially in domains such as physics and mathematics, where the ability to apply procedural knowledge accurately to the solution of well-defined problems is important.. However, there is also reason to be concerned about systems that place exclusive or undue emphasis on correct answering, and reason to wonder, for example, whether “WTF” responses (Wixon et al. 2012), and other “disengaged” behaviors of students in certain computer-based learning environments (Baker and Rossi 2013) may reflect deeper problems stemming from fundamental incompatibilities between the child’s natural learning strategies (based on dialogic interaction with relative experts in authentic contexts), and systems that are in some ways at odds with how a young learner’s brain works.

Social Dimensions of Teaching and Learning

Of course, there are other good reasons to be concerned about the socio-emotional concerns of students and their changing “affective states.” Multiple studies point to a positive association between teacher-student rapport, and student affect, motivation, and learning outcomes (e.g., Immordino-Yang and Faeth 2010; Pintrich 2003; Van Geert and Steenbeek 2008; Wubbels et al. 2016). The relationship makes good sense from an evolutionary perspective. Teaching, as we have noted, is fundamentally and necessarily a joint attentional activity (Bruner 1972; Tomasello 1999; Tomasello and Todd 1983), a matter of doing something together, a meeting of minds, something that humans alone among primates seem capable of.

But humans are still primates, and retain a primate’s deep-seated concerns with social status, including a desire to build and maintain strong, trusting, and useful relationships with others (Seyfarth and Cheney 2013), and attendant fears of ostracism and rejection. For this reason, before a learner can attend cognitively to whatever it is the teacher wants her to attend to, she must be feel sufficiently safe in the social relationship to drop her natural guard. Presumably, this would have been less problematic in the ancestral environment. The hunter-gatherer lifestyle that shaped our primate brains into human ones occurred not just in a “cognitive niche” (DeVore and Tooby 1987; Pinker 2003), but more specifically in a “socio-cognitive niche” (Sinha 2015; Sterelny 2007; Whiten and Erdal 2012). As they do in remaining hunter-gatherer cultures, children would have acquired technical and cultural expertise from parents, grandparents, aunts and uncles, siblings, cousins, and other children (see Hewlett and Roulette 2016). The small size of hunter-forager bands, and the relatively simple learning ecologies would have meant that the average social distance between individuals, and between experts and novices, was relatively small.

In a modern classroom setting, the baseline social distance between the teacher and her students is considerably larger, and takes effort to narrow. Further, in a group of some twenty other students, individual children must also be concerned about their own social status in respect to their peers. Giving wrong answers, or in some cases giving correct answers, may have negative social consequences. In a one-on-one ITS, where the student is interacting with a computer, the interaction is presumably less risky. In recent years, AIED researchers have attempted to deal with the socio-emotional concerns of students in at least two ways: by building in “affect detectors” (Busso et al. 2004; D’Mello et al. 2014; Picard and Picard 1997), and by building lifelike pedagogical agents that respond appropriately to human emotion (Elliott et al. 1999; Johnson and Lester 2016; Kim and Baylor 2016). These efforts make good sense given the emerging biocultural account, which views the natural capacity for teaching and learning as deeply embedded in, and inseparable from, the brain’s other business, including its socio-emotional concerns. However, for the same reason that we have to wonder about the capacity of computer-based systems to take on, in any sophisticated way, the work of a human teacher, so too we need to be cautious about the ability of an animated software agent to sense and respond appropriately to the feelings of a human learner. We offer no solution to this problem beyond the suggestion that this may be yet another reason why optimal systems will be those that find a way to keep humans in the loop.

Peer Tutoring Explained

A central claim of the emerging biocultural account of human pedagogy is that the capacity and inclination to teach others in one’s group follows a universal (cross-cultural) “normative developmental trajectory” that begins in infancy, emerges full-blown by the age of 7, when children are detecting and correcting false beliefs in others (Strauss and Ziv 2012), and continues to develop into adulthood. For example, beginning at some point between the ages of 9 and 11, children start to give “strategic advice” such as how to win at a game (Strauss et al. 2002).

That children would begin to develop the capacity and inclination for pedagogy from an early age is predictable from an evolutionary perspective. For one thing, given the close relationship between language and teaching, to the point language may even have developed at least partly for the purpose of teaching (Laland 2016), it is unsurprising that as language develops, so too would teaching. And for good reason. In ancestral hunter-gatherer groups, children who were still too young to accompany adults on long-distance and potentially dangerous hunting and foraging expeditions would have remained behind, helping with chores around the base camp, and playing with other children, both older and younger. Quite likely, if the practices of remaining hunter-gatherer groups are any guide, children would have engaged in serious play, practicing important skills, such as throwing rocks or shooting arrows at targets representing different animals, sometimes pulled along the ground by other children (MacDonald 2007). It is therefore not surprising that peer tutoring in modern settings has been shown to be effective across age groups and subject areas (Bowman-Perrott et al. 2013), even in the absence of formal training (Cohen et al. 1982). It is in keeping with the claims of natural pedagogy that children do not need to be taught how to teach.

AIED researchers have long shown an interest in the peer tutoring literature, if only as evidence that tutoring, including peer tutoring, is more effective than “traditional classroom instruction,” the standard argument for the development of intelligent tutoring systems. However, the idea that students can help each other learn is also central to several systems, including those designed to help students learn by tutoring other students (Biswas et al. 2016; Chou and Chan 2016), and collaborative inquiry and knowledge-building systems (Rosé and Ferschke 2016; Rummel et al. 2016; Stahl et al. 2006). Clearly these directions make good sense given an understanding of children as natural teachers (Strauss et al. 2002; Strauss and Ziv 2012).

Hybrid Systems in a Global Learning Ecology

In a recent essay in this journal provocatively titled “Stupid tutoring systems, intelligent humans,” Ryan Baker offers a vision of future AIED systems whose function is not so much to simulate human intelligence, as to amplify it (Baker 2016). In such systems, as Baker puts it,

“AI technology is used to derive important information about an online learning system, but the action taken is not by the system itself; instead action is taken by a human. The learning system is not itself intelligent; the human intelligence that surrounds the system is supported and leveraged.”

The notion that AIED research and development efforts might be more effectively focused on supporting and leveraging human intelligence than on attempting to replicate it is clearly consistent with the biocultural account we have summarized here. Given that it has taken nature millions of years to shape the human brain into what is arguably, to repeat, the most sophisticated system for teaching and learning in the known universe, the notion that a machine might be programmed to replicate this system in any substantial way in the foreseeable future seems, on the face of it, far-fetched (for a similar critique regarding the limits of “brain augmentation” see Saniotis et al. 2014). It is true that much has been learned from efforts to simulate the work of human tutors (e.g., Boyer et al. 2008; Litman and Forbes-Riley 2013). However, it seems the more interesting and useful question is not whether computers or humans make better teachers, but how computer programs can help human teachers become better (and vice versa)— and more generally, what roles members of the AIED research community might most usefully play in improving the increasingly complex global ecology of teaching and learning we find emerging around us.

Before we address that question, we would like to return briefly to our story about the evolution of human teaching. We left off at the point some 60,000 years ago, when a population of our recent ancestors (H. sapiens), with fully modern brains, bodies, and language, began for some reason to grow rapidly in numbers, migrated out of Africa, and soon established habitats throughout the world (Mellars 2006). It was from this point onward that Tomasello’s “ratchet” of cumulative cultural transmission (Tomasello et al. 1993) began to crank at a rapidly accelerating pace. It seems that some combination of factors—possibly including environmental pressures, some genetically-based increment in the expressive power of language (e.g., a capacity for recursive syntax), the collective brainpower of new concentrations of human innovators, and the ability to transmit new innovations through intentional instruction—ignited a gathering explosion of technical innovations that led, in just a few seconds of evolutionary time, to our present opportunities and predicaments.

Importantly, the opportunities and predicaments that confront us now were arguably there from the beginning of this new phase, and have simply become magnified, though greatly so. As humans spread out around the world, new habitats and ways of life created selection pressures for new technologies (e.g., the technologies of ocean navigation, agriculture, and warfare), but many of these technologies led to environmental degradation, mass extinctions, and genocide. As populations became geographically separated, cultures and languages diversified, leading to the possibility of synergistic sharing across cultures, but leading also to intensified tribalism and internecine conflict. These problems, as we know, continue to the present day. Readers will have their own lists, but certainly the catastrophic consequences of runaway global warming will be near the top of most. Others include famine, ancient tribal hatreds, civil war, distrust of science, “fake news,” income inequality, religious intolerance and extremism, racism, political polarization, and an attendant inability or unwillingness to engage in intelligent civil discourse. Against this background, arguably the most urgent question of all for the AIED community is not just how artificial intelligence can support and leverage the work of human teachers, but how this work can help to alleviate, if only in some small way, one or more of these present threats.

The most promising answers, it seems to us, may lie somewhere in the area of global education. Here in the early part of the twenty-first century, our systems for transmitting critical cultural knowledge and skill from one generation to the next have become dramatically more various and complex than they were just decades ago. This is largely due to the emergence of the global internet as an arena for teaching and learning. Although parenting, textbooks, and conventional classroom instruction remain dominant, twenty-first century learners around the world now have access to a rich and complex learning ecology (Barron 2006) including massive open online courses (MOOCs; McAuley et al. 2010), online human tutoring services (Morrison et al. 2014, 2015), intelligent tutoring systems (Ma et al. 2014; VanLehn 2011), and countless resources for informal learning, notably computer games, question answering systems (e.g., Fader and Etzioni 2014), YouTube, and Wikipedia. These new settings for teaching and learning offer tremendous potential advantages—flexibility, timeliness, open access, adaptation to individual needs, and economies of scale, to name just a few. But these opportunities do not automatically translate into better teaching and learning. Issues include the difficulty of extracting useful information from the massive datasets generated by online instruction, issues of privacy and data security, the trustworthiness of online information resources, the varying quality of online instruction, and the problem of how to help learners make the best use of the potentially confusing array of possibilities that confront them. Opportunities for self-directed learning abound, but it is not always clear how to find them, or, once found, how best to use them.

In keeping with Baker’s notion of “stupid tutoring systems, intelligent humans” (Baker 2016), it seems that the most powerful solutions to these problems will likely involve combinations of human and machine-based intelligence, making full use of our ancient, highly-evolved natural capacity for teaching and learning through language, while at the same time leveraging and supporting this capacity with sophisticated modern technologies.

As an example, consider the case of a multipurpose online (chat-based) human tutoring platform that employs an expandable set of tools including data-mining algorithms, report generators, real-time dashboards, and hooks into one or more intelligent tutoring systems. A matching system, similar to those used for online dating services, brings together human tutors and students anywhere in the world. A rating system lets tutors and students rate each other. Human tutors can send students off to engage with selected ITSs for particular purposes, and receive reports on their progress, with suggestions for areas of focus. Also, as our own research group has recently demonstrated, automatic classifiers, trained on human-annotated training sets, can be used to quickly transform transcripts documenting hundreds of thousands of human tutoring sessions into sets of machine-readable codes. The resulting datasets can then be mined for useful information about the relationship between patterns of language use and learning outcomes (Morrison et al. 2014, 2015). Employed within the context of a hybrid tutoring platform, data-mining technologies have several potential uses, including basic research, quality control, and the development and refinement of ITSs (e.g., in the form of enhanced dialogue act classifiers).

Indeed, it is not hard to imagine that at some point, possibly in the near future, substantial areas of the new online learning ecology will come to be inhabited, and to some degree regulated and improved, by a population of more or less autonomous software agents, which, like workers in a social insect colony, will automatically perform a number of useful tasks, including knowledge extraction, provision of personalized recommendations to individual learners, direct tutoring of human learners, assistance to human tutors, and generation of reports for managers of these and other systems (see Nye 2016). However, while machine assistance may substantially improve the quality of online learning resources, autonomous pedagogical agents will be trustworthy and effective only to the extent that their developers make them so. Unlike humans, intelligent agents are not natural teachers nor are they necessarily benevolent. Artificially intelligent tutors and other such agents may indeed be helpful, but, without appropriate controls provided by their developers, they could cause harm to what Lisa Delpit has reminded us are “other people’s children” (Delpit 2006). Pedagogical agents will be at risk of hijack and reprogramming for malicious purposes, of intentionally or inadvertently misleading their users, providing false information, pretending to be something they are not (e.g., human), and just being generally inept. In short, computer programs are only as benevolent, ethical, and effective as their developers design them to be.

Toward a New Science of Human Pedagogy

Arguably the most important and exciting aspect of the emerging biocultural account is that it offers a solid foundation for a new, interdisciplinary science of human pedagogy (see concluding remarks in Strauss and Ziv 2012). The twentieth century theories of teaching and learning that have served as inspiration for the AIED community from the outset—including Vygotsky’s notion of a zone of proximal development (Vygotsky 1978); social learning theory (Bandura, 1986), cognitive apprenticeship (Collins et al. 1991), situated learning (Lave and Wenger 1991), social constructivism (Palincsar 1998), and more rigorous cognitive models (notably ACT-R; Anderson 1983)—have obviously been useful and productive. Further, as argued above, these socio-cognitive theories are not inconsistent with the basic claims of the biocultural account. For example, in line with the cognitive apprenticeship and situated learning models (Collins et al. 1991; Lave and Wenger 1991), the biocultural account claims that humans learn naturally in the context of interactions with experts, and that human language likely evolved at least partly for this purpose.

However, learning theories that fail to take into account the evolutionary origins of human teaching and its nature as a fundamentally biocultural phenomenon are fundamentally incomplete, with consequently limited explanatory power. To cite just one example, decades of research tell us that even “untrained” peer tutors can be effective (for a recent meta-analysis see Bowman-Perrott et al. 2013), but not why. The biocultural account explains that a human child’s capacity and inclination to use her developing linguistic competence to teach others critical cultural norms, knowledge, and skills is not just a cultural behavior, but a natural (biologically-based) cognitive capacity which follows a predictable trajectory from childhood into adulthood (Strauss and Ziv 2012). As such, teaching in children is best understood as an adaptive trait which co-evolved with, among other traits, language, our prolonged period of juvenile dependence (e.g., see Dean 2007, on the phenomenon of “growing up slowly”), our dependence on technology, and the consequent need for efficient cultural transmission, which includes efficiencies gained from a child’s natural ability and inclination to teach other children. Peer tutoring “works” for the simple reason that, over millions of years, nature designed children’s brains for this purpose.

But while the universality of peer teaching and other hypotheses generated by the biocultural account are obviously testable in theory, important practical barriers make this difficult. The problem is that the study of teaching and learning in humans remains, in Kuhnian terms (Kuhn 1962), an immature science, lacking a standard set of definitions about what teaching is, what it consists of, and how to describe similarities and differences in teaching across different cultures and contexts. For example, in spite of a likely consensus that the heart of teaching is dialogue, consisting of a set of back and forth moves, a review of the literature on human tutoring, much of it conducted by AIED researchers (e.g. Boyer et al. 2011; Collins et al. 1975; Graesser et al. 1995; Merrill et al. 1995; Chi et al. 2001) reveals very little overlap in taxonomies. Because different researchers use different categories and units of analysis, meta-analysis becomes impossible, or at least difficult to interpret. Comparisons across black-box categories such as “human tutoring,” “intelligent tutoring,” and “traditional instruction” (e.g., Ma et al. 2014; VanLehn 2011) are of limited practical value if we are not told what’s going on inside those black boxes.

In short, the learning sciences (better understood as the interdisciplinary science of teaching and learning) are badly in need of, and will benefit greatly from, a multilevel analytic framework that will allow researchers to record, in a standard way, what goes on both inside and outside the black boxes, i.e., from the level of individual moves up through the embedded contexts of these moves. Given the fundamental and (as argued here) ancient relationship between teaching and language, i.e., that human teaching and learning is fundamentally a way of “doing things with words” (Austin 1965) a descriptive framework needs a way of mapping linguistic categories to pedagogical ones. For example, a teaching episode (Waimon and Hermanowicz 1965) may understood as a sequence of one or more distinct moves (undertaken by the participants, both experts and novices), most of which are accomplished linguistically, in the form of speech acts (Searle 1969), or, in the case of dialogue, dialogue acts (Bunt 1978). In either case, there is a shared attentional frame, and within this, an object of joint attention (Bruner 1972). And because pedagogical moves (i.e., speech or dialogue acts) are by definition intentional, we can assume that each represents a hidden tactic, chosen in accordance with a particular chosen strategy, which is in turn consistent with a metastrategy (on the relationship between dialogue acts, tactics, strategies, and metastrategies, see Morrison and Rus 2014). Finally, teaching episodes occur within particular activity settings (Farver 1999), which differ across cultures and subcultures; the activity settings within which teaching and learning occur within a given culture or subculture constitute a learning ecology (Barron 2006).

With such a framework in hand, researchers will be in a position to test hypotheses generated by the biocultural account of human pedagogy. For example, this account predicts that teaching practices (characterized by dialogue acts associated with particular tactics) will consist of a mix of both cultural universals and practices specific to certain cultures and subcultures. And because teaching has a cultural component, we can expect to find variation across individual teachers and learners, all of whom will be more or less influenced by the sum of their cultural experiences. And because these practices vary, we can expect that some will be more efficacious than others.

The AIED community, it seems to us, is in a unique position to develop, refine, and test a standardized analytic framework such as the one suggested here. After all, many AIED researchers are in the business of modeling, at a granular level, pedagogical moves, and linking these with learning outcomes. Further, by applying data-mining techniques to large transcript corpora (for both human-human and human-machine interactions), especially within global, web-based environments, AIED researchers are in a position to begin testing hypotheses generated by the biocultural account, and in this way make an important contribution to a new science of human teaching and learning.