Keywords

1 Introduction

In recent years audiobooks have become increasingly more accessible, both in the market and in free, often publicly supported, streaming sites and appsFootnote 1. Denmark follows this trend and many local schools (primary and secondary) consider free audiobook repositories as part of their resources. However, our data suggest that integration of audiobooks in the class poses many challenges. In our observation of primary and secondary school classes we noticed that audio resources are often used in a ‘passive’ way, for instance fruition of audiobooks takes more time than reading, often audiobooks offer a shorter version of the original text, finally audiobooks are perceived as more passive than regular books as it is not possible to take notes and mark specific passages as with books and e-readers. We noticed a lack of active uses of audio content for example in relation to generation and editing of audio content, and virtually no audio-based interactivity.

In the past two years we have conducted three case-studies to explore the untapped potential of audio in the context of learning of English as a foreign language in primary and secondary schools. Our studies suggest that textual and video modalities are predominant in classes practice, even when most of the orchestration (Prieto et al. 2015) of learning activities is oral. We consider the lack of interactive audio as a missed opportunity to promote richer use of multimodal resources in class, especially since many available technologies (e.g. smartphones) easily support creation, fruition, and efficient storage of audio content. Moreover, audio contents might support learners with special needs, such as: visually impaired people or pupils affected by dyslexia or when learning a foreign language ordering commutes for occasional, non-formal learning.

In the following sections we present related work (Sect. 1.1), our three case-studies (Sects. 2 and 3), discussions (Sect. 4) and conclusions (Sect. 5).

1.1 Related Work

Several studies have already explored use of audiobooks as tools for creative engagement with literary stories and tools for learning foreign languages. For instance Furini (2007) and Huber et al. (2007) challenged the typical use of audiobooks, which was found to be passive with respect to books. Both studies argue in fact that readers of audiobooks might gain a more passive experience, being constrained to listen to the story, therefore, they have explored possibilities to enable users to interact with non-linear narratives creating their own stories. The study conducted by Furini aims at turning the passive reader of audiobooks into the “director of the story” (Furini 2007, p. 1). In developing his system he looks at the use of audiobooks through a cinematic metaphor, imagining the experience of editing new stories as if the reader was editing video sequences, referring to movies like Sliding Doors and Pulp Fiction as displaying non-linear stories. The system targets three main use cases: entertainment, education, and game applications. The design focuses on two main principles: transparency, as the book file should be standard and easily handled by the system, and security, as only the owner of the audiobook file should be able to play it and she should not be able to alter the original media files from which the audiobook was created. The article discussing Furini’s studies, however, focuses on the technical aspects of the design of the system and does not discuss in details the expected user experience or results from testing.

Huber et al. (2007) take a similar approach and discuss the evolution of audiobooks into interactive media, suitable for editing non-linear stories. The authors propose to combine elements from computer games with the experience of listening to oral presentations, which are defined by Huber et al. as immersive and entertaining. Moreover, sonification was used as a resource for interaction, in order to enable the users to interact with the system mainly through sound, but this interaction style was found difficult by the users.

A relevant study has been conducted by Alcantud-Díaz and Gregori (2014), who propose an extensive review of the use of audiobooks in foreign language learning and two projects named Tales of the World and The Power of Tales: Building a Fairer World. The authors claim that even though in their country, Spain, audiobooks are not commonly used, they can see great potentials in supporting English learning in relation to the five skills listed in the Common European Framework of Reference for language learning: listening, reading, spoken interaction, and writing. The two projects discussed by Alcantud-Díaz and Gregori aim at spreading awareness of languages as scaffolding for intercultural values and respect for human rights in the educative community. The outcome from both projects were collections of tales, for the Tales of the world project 40 tales were gathered from underprivileged countries, for The Power of Tales 15 tales against violence were collected. All the tales were edited into free downloadable audiobooks. The format of audiobooks was chosen in order to give access to pupils with learning and visual difficulties; moreover, audiobooks were seen as a mean to improve learners’ English pronunciation. The other studies discussed in the review (Alcantud-Díaz and Gregori 2014) focus on the use of audiobooks for primary school pupils dealing with language learning, such as Wilde and Larson (2007) who argue that audiobooks enabled children 8 to 12 years of age to find more time to read, hence reading more books. Moreover, Baskin and Harris (1995) found that use of audiobooks supported students with learning difficulties, who find it challenging to interpret written text, in making sense of written texts and improving their reading fluency in English as first language.

Moving away from the learning context, we can find another form of interactive audio: audio walks or audio tours (van Zeijl 2013). Similar to the audio material offered by museum, audio walks are usually implemented as mobile apps where users can follow predefined audio commentary while moving around a city or a building. An interesting commercial product of this kind is yapQ’s “Worldwide city guides”Footnote 2, a mobile app that offers audio walks in multiple languages and for many cities; the application uses geolocation and text-to-speech to generate interactive audio guides. The contents in this case are not user-generated. SoundCloudFootnote 3 instead is an example of user-generated and socially shared resources: “SoundCloud is […] social sound platform where anyone can create sounds and share them everywhere”. Among other sound collections, SoundCould offers a selection of audio walks.

In the following sections we will show more possible ways in which audio can be made interactive and we will explore the possibilities offered by social creation and sharing of audio data.

2 Two Supporting Case Studies

The main case study described in this paper is supported by unpublished data from 2 other case studies conducted in the past two years, which provided insights on the advantages of interactive audiobooks. All the three case studies adopted the User Centred Design method supported by qualitative methods. Our students had to conduct a full design iteration consisting of: a field study investigating the practice, in which users participate in; a phase of analysis in which design requirements are formulated; a phase of conceptualization through brainstorming and prototyping techniques; testing in which a semi-functioning prototype is evaluated with the users. The testing was conducted as a play test session with focus groups, involving users in demonstration of the prototypes. Qualitative methods were chosen for several reasons: first of all our students engaged with a limited number of users, either high-school, primary school classes, or focus groups. Second, the students’ goal was to closely explore current user experience and opportunities for improvement, also enabling the users to propose possible ideas. Specifically, for the main use case, our students adopted visual ethnography in situ, semi-structured interviews for which they were requested to prepare a minimum set of pre-defined questions for the users (Yliriksu and Buur 2007). The students were therefore required to analyze the gathered video recordings scrutinizing how users interacted during class activities and how they talked about their practice (verbal and non-verbal language) during the interviews, with the goal of identifying aspects that needed improvements or support. Semi-structured interviews and observations were also adopted in the two supporting studies. Given that our students were still learning about UCD and qualitative methods, we took part in many of the phases of the 3 studies, complementing their field work with our notes and reflections. The findings discussed in the following sections are the result of this process.

2.1 Audio Deliverables

The audio deliverables application originated from the supervision of 4 groups of students attending the Software Engineering and IT bachelor at the University of Southern Denmark (SDU); the semester long project, run in fall 2014, was about developing user-centered software solutions to better support English teachers in 2 Danish primary schools. The field study started with observations of 2 classes of 4th graders learning English, one in each school. After a preliminary visit and meeting with the 2 teachers who agreed to participate in this study, the groups of SDU students visited the school repeatedly and proceeded by defining requirements and producing a few prototypes, from low-fidelity ones to partially working horizontal prototypes (created using MIT’s AppInventorFootnote 4).

The 2 teachers, here called Anders and Britta for anonymity, were also interviewed; they showed very different approaches of using technology in their teaching. Anders can be considered a designer of content. He states openly that he has limited IT skills but he is very creative in the design and generation of new content. In the first visit he showed us how he wrote a short dialogue with 4 roles, for his students to read aloud. In fact, spoken interaction and comprehension are the main goals for the 4th grade English curriculum. The dialogue was about 3 friends who interact with the waiter (the other role) in a British restaurant, and have to order, confirm their orders, eat and pay the waiter, who in turn has asks typical questions about their choice of food, beverages and how they want to settle their check. It was clear that Anders compensates the lack of interactivity in his material (which was not given to the pupils in digital format, but written at his computer and then printed) with role play and social interaction. Britta is much more in touch with IT and in particular likes to use what is available online, but she re-contextualizes it according to her pupils’ needs. She has a toolbox approach and often uses tools that are not originally pedagogical, like video editing, comics authoring tools and online audiobooks in English. In our first visit Britta brought her class to the IT lab for the English lecture; the pupils kept switching from audiobooks to cartoon editing, to chats with the teacher and each other.

We found these two approaches very intriguing and believe they should be further studied. However, in this paper we are mostly interested in user-generated audio contents, therefore, we will focus on the group of SDU students working with Anders’ class. They noted the various problems he had orchestrating the class with his printed material: the pupils were divided in groups of 3 to 4, and had to read the text a few times, waiting for Anders to drop by, listen to them and provide feedback. The result was the audio deliverable application, a mock-up mobile app that allowed pupils to read an English text aloud in group, and deliver it to the teacher as an audio recording. These audio deliverables afford good peer interaction and make the communication with the teacher more asynchronous. Moreover, they represent a form of audio content generation that is natural and very easy to master for 4th grade pupils who are typically proficient in the use of smart phones; the focus was mainly on reading skills.

The development and testing of the audio deliverable application, together with the feedback we received from Anders and his class convinced us that audio content can be easier to generate than written English (for Danish 4th graders). Recordings enable more asynchronous teacher/pupils interaction, they open for peer reflection and can be preserved to serve as a learning diary to make students more aware of their progress.

2.2 Carbooks

This study demonstrates the versatility of audio as a communication modality, by mapping gamebooks into mobile-friendly, interactive audiobooks. The goal of this project was to offer an entertaining and relaxing experience to kids who often get car-sick in long car trips, and have problems reading or watching videos while traveling. In this case playing videogames using mobile devices is not an option; audiobooks instead can offer relief and help passing the time in a fun or perhaps educational way. However, audiobooks provide a passive experience and can become boring in long trips, so we wanted to investigate how non-linear narrative can be used in audiobooks, to create interactive and enjoyable experience for kids and young adults. A focus group was created to play-test the interactive audiobooks, composed of 10 young adults (age 19 to 25) and 2 kids (10 and 12); the family of the 2 kids was among the other stakeholders involved in the project. The Carbook bachelor project tested various ideas, running in the fall 2015 semester and through 3 iterations, with the central focus to develop an audio-only interactive application for android platform. The main tools were UnityFootnote 5 and Google Text-To-Speech.

Removing the graphical user interface while retaining the interactivity typical of digital games proved one of the major challenges; the project also explored possibly mappings between input modalities and choice in the non-linear narrative. A mobile phone offers gestures, microphone and orientation/motion detection. Typical gestures we considered are touch, hold and swipe. As for microphone input, voice recognition was too complex to work in practice and it would have been mostly limited to English language, so volume level was used instead; microphone input was used in the second iteration of the interactive audio book prototype, but turned out to be unreliable and difficult to use by the players, who got frustrated by the experience. In the third (and final) prototype microphone was replaced by orientation (basically reading the state of the phone’s gyroscopes). These input modalities were to be used in steering the narrative of the interactive audio book, mostly without the player looking at the screen, and that required some analysis too; background audio clues were also used (in version 2 of the prototype) to help players orient themselves while exploring the locations in the story. In printed gamebooks the player is often faced with 3 to 6 options to select from, but in Carbooks we had to break down the player options in sequences of binary choices. This restructuring of the choices made it easy to map input events with binary alternatives, but it could be argued that it has limited the non-linearity of the narrative (reducing de facto the branching factor of the multi-linear plot).

The Carbooks project shows that interactivity can work in audio-only (or audio-first) applications, and that the user experience is similar to that of slow-paced exploration/adventure video games, such as classic text-based games of the 1980s. Smart phones, with their current computing power, audio support and their wide range of input modalities, proved a reasonable choice of platform for audio-only interactive applications. The main limitation of the project however, was that it did not focus on content creation, so while we have evidence that interactivity and audio work for simple, fun non-linear stories, we have to progress further with our studies before we can directly link interactive audio books to language learning.

3 The Main Case Study: Social Audiobooks

The last and main case study was conducted in relation to an elective course in Media Sociology, the course lasted for five weeks in the fall semester of 2015, and focused on e-learning with students from the Multimedia Design program (MMD for short) at the Lillebaelt Academy in Odense, Denmark. The 21 students had to work on a mini-project in groups of three or four, in cooperation with Nyborg gymnasium, a high-school located in Nyborg a small town on the island of Funen, Denmark. From the point of view of the Lillebaelt Academy, the learning goal of the mini-projects was to create conditions for the MMD students to conduct a rigorous user centered design process, actively involving users, to adopt a contextual perspective on the design of learning technologies, and to critically reflect on how their new solution contributes to teaching and learning practices in the gymnasium. On the other hand the gymnasium in Nyborg was eager to explore and test together with MMD students new interactive solutions, which could enrich the current learning and teaching practices.

In their Media Sociology course, the MMD students were introduced to five research articles applying a specific learning theory to learning contexts and to the design of a digital solution. One particular group of three MMD students explored the design of an application to support interactivity with audiobooks, these students chose to work with the studies conducted by Hattie and Gan (2011) in visible learning and by Marchetti and Petersson Brooks (2012) in the sociocultural theory. Hattie and Gan (2011) explain how visible learning can affect learning practice, discussing the role of teachers in enabling the students in formulating learning goals and success criteria, in providing descriptive feedback, which enables students to improve their skills, and formative assessment, aimed at collecting evidence of the student’s achievement. Marchetti and Petersson Brooks (2012) instead adopt the sociocultural theory in the design of a digital exhibit, aimed at enriching the social interaction between guides and visitors during guided tours. The study aims at enriching the interaction between guides and visitors, looking into guided tours as a sociocultural activity, which is influenced by the traditions and practices of museum contexts. The project of our students aimed at designing an interactive solution to enrich learning practice and social interaction in English language class of the Nyborg gymnasium.

3.1 Audook: Social Experience of Audiobooks Social

The Audook mini-project by one group of three MMD students explored how interactive fruition of audiobooks could enrich learning practice in classes of English literature and language, with the cooperation of a gymnasium teacher (here called Sanne) and her class, 15 students of approximately 15–16 years of age. The outcome of the Audook mini-project represents an attempt of transduction of reading assignments from the visual to audio mode. Transduction is defined in social semiotics as a translation, in which meaning-material is moved from one mode to another, for instance “from speech to image, from writing to film” (Kress 2010, p. 125). Since each mode has specific material qualities and entities to be manipulated, for instance speech has words and images have colors, each mode has also a different history of social use. This in turn has implications on how the same meaning-material is formulated and transmitted by the sender, and on how the message is received and interpreted by the audience, so that the same message might be slightly altered in its meaning through the transduction process. Audiobooks represent for instance a case of transduction from the visual book format into an auditive one. As showed by related studies (Alcantud-Díaz and Gregori 2014) the fruition of the same story both through reading and in audio form affects significantly how learners experience reading, in some cases even enabling them to improve their skills.

Through their field study the three MMD students found that English classes in Nyborg, involved mostly reading and analyzing texts. The English teacher Sanne was concerned with choosing samples of English literature that the students could find interesting to “motivate her pupils to read and analyze the texts”. For this reason she said: “I am trying to look for novels that can be interesting, handling topics about social relations and adventures”. Her strategy involves “books that have become popular in recent years, often because they were adapted into movies, so that they have heard about them”. During our study for instance the class was reading “The Beach” by Alex Garland, which is also the subject of a popular Hollywood movie starring Leonardo Di Caprio. In this way the teacher was already encouraging a multimodal fruition and analysis of the assigned novel. We found that the Nyborg students are typically assigned a set of pages or entire chapters to read for a certain date. While in class they are asked to discuss in groups the read chapters and to fill a form with questions or aspects to reflect upon, such as the maturation of a character, the social conflicts, or narrative techniques adopted by the writer; afterwards, a group discussion is conducted in class. The students also watch the movie based on the novel they are reading, together with the teacher. This is supposed to keep them motivated to read and reflect on how the novel could be interpreted, and Sanne added with satisfaction “they often prefer the novel to the movie!” as the students notice that in the movie many elements were omitted or the actors representing specific characters do not match their imagination.

The gymnasium students complained, however, that reading requires a “total” involvement; several of them said that they can read mostly while on the bus or at home, but unfortunately they cannot read while running or walking in town. Reading is also perceived as isolating, so that for sharing impressions on specific passages they have to either meet or write through social media.

The design process that led to the creation of Audook, an application aimed at providing an alternative fruition of literary texts. The central idea was to operate a transduction of novels into audio, and create a gesture-based app for mobile phones. The requirements involved being able to use a hand gesture to add a bookmark on a specific passage, while listening to an audiobook; users should also be able to add comments in spoken and in written forms by opening a visual interface, and share their comments and bookmarks through social media.

The resulting prototype mobile application (visible in Fig. 1) offers a richer, multimodal experience than just reading and showcases the extension of annotating and sharing comments from a book to an audiobook. A summative evaluation provided criticisms and positive feedback. For example it was noted that audiobooks take longer to “read” than books. Both we and the group developing Audook agree that it is not a good idea to substitute visual reading because of the importance of seeing the text, especially in language learning. There were also concerns on the quality of the voices obtained via text-to-speech, with respect to those of actors and native speakers reading the texts. Concerns were raised by the teacher in relation to how she could fetch audiobooks for her students; the fully developed application should be able to connect with the collection of audiobooks of the school or of the local library, which is already available online, enabling the teacher and her students to easily get the novels they need.

Fig. 1.
figure 1

Overview of the interface of Audook. The top row shows the log-in screen and the initial access to the audiobooks library. The second row shows how the text can be visualized by the reader, and the sharing and annotation features.

On the positive side, audiobooks can be “read” also while doing sports, walking or running; they can be easier to access than books (and e-books) while travelling on public transportation with less chances of motion sickness. Using Audook, books critique and commentaries could be shared electronically in preparation for group discussion in class. It was also asked by a few students if it was possible to listen to an audiobook while watching the e-book version (a scenario similar to existing karaoke applications): in this way users could learn more effectively how to pronounce new words. Finally, the Audook app was positively evaluated as an interactive alternative to normal reading, expanding opportunities for multimodal fruition of novels and for sharing personal reflections on texts. In general the social aspect of the application and the possibility to listen to the story while engaging in outdoor activities were particularly appreciated as if they were making the experience of reading less isolating.

4 Discussion

The main case study and the 2 supporting studies show the wide spectrum of opportunities offered by audiobooks in language learning, from content generation to social and game-like interactivity. The main contributions of this paper are design insights to make audiobooks interactive and better integrated in the social interaction emerging in learning contexts, between learners and teachers but also among peer learners. At the same we aim at exploring how the transduction of literary texts could foster different experiences, when moving from the visual and tangible modes associated to the experience of physical books and e-readers, to the auditory modality enhanced by interactivity.

Comparing the three case studies, we can see that interactive audiobooks are preferred to non-interactive ones by potential users, who in our testing consistently described typical audiobooks as eliciting passive experiences. Interactivity with the text was evaluated positively both in relation to exploring non-linear stories, but also in contexts of language learning (Alcantud-Díaz and Gregori 2014). As pointed out in Kress (2010) the transduction of literary text into an auditory format can significantly alter how readers relate to the text. The auditory modality can make the reading activity more flexible and accessible for learners, for instance the possibility to create audio deliverables can support adoption of pedagogical approaches like visible learning (Hattie and Gan 2011), in which learners and their teachers can afford longitudinal monitoring of spoken language competences. The recordings created during language learning open the possibility to apply analysis techniques and data mining on audio content. This is valid also for learners who have a busy day and see in the auditory fruition of novels a better support for multi-tasking, enabling them to “read” also when traveling and reading might get them sick and when engaging in outdoor activities. Moreover, the audio modality can better support children who are still in the progress of developing writing skills in their own or in a foreign language, as well as learners with linguistic difficulties. Finally, the study in Nyborg provides new insights on how interactive audiobooks could contribute to turn reading into a social experience, as according to sociocultural theories in learning. Adopting a sociocultural perspective (Rogoff 1990), learning is seen as a social practice in which learners are facilitated by an expert adult, the teacher, but can also support each other, in a persistent and asynchronous way. Enabling learners to share their thoughts and bookmarks with each other, Audook can contribute to the emergence of a shared understanding of the text at hand enriching the process of textual analysis and reflection.

Building on these case studies, we propose insights on how audiobooks could be turned into an interactive medium:

  1. 1.

    Support generation of audio as well as fruition. Audio just requires a bit of technical support, for example, Google docs can be extended to allow voice comments on texts, by using add-on like KaizenaFootnote 6.

  2. 2.

    Leverage on social and asynchronous communication between teachers and students, and provide support for peer-learning.

  3. 3.

    Consider multiple storylines in audiobooks. Multiple storylines can allow for experiential learning (Furini 2007; Alcantud-Díaz and Gregori 2014) and support case-based reasoning. A major drawback of authoring non-linear narrative is the need to create multiple, potentially modular storylines; non-linear audiobooks in particular have always been human-intensive. Our Carbooks project, however, shows that text-to-speech technology is currently widely available (on laptops and even mobile devices) and good enough at least for English. All teachers in the schools we visited have at least basic IT skills, hence they have no problem in generating English texts and potentially create written non-linear narrative; our experience with Carbooks convinced us that by leveraging on text-to-speech and gesture-based non-visual interfaces, non-linear audiobooks in English can potential be created by the teachers themselves, in this way supporting language learning.

  4. 4.

    Socially generated audio content as a kind of social media data. We suggest to consider the audio content generated by a group of students learning English as similar to the content produced in a social media. Since voice data-mining is still very complex and dependent upon pronunciation, often imprecise and typically works for English and very few other languages, we consider social media approaches like user-created tags as the best option to classify and search through audio contents.

  5. 5.

    Audio as a complement to visual modality. Based on our studies we do not aim at replacing the visual modality of reading, but at providing complementary auditory alternatives that could enrich how people experience literary texts.

The exploration of interactive audiobooks is not new, as we can see in current research, however, we may argue that these studies have taken a limited perspective, mainly supporting the authoring of non-linear stories. On the other hand, when coming to learning these studies seem eager to argue that audiobooks can offer better support to learners in acquiring linguistic as well as intercultural competences (Alcantud-Díaz and Gregori 2014). In our studies we take instead a more cautious position, as results from our testing suggest that visual reading is perceived as more personal and active, as readers can decide for themselves how quickly they want to read, they can imagine for themselves the features of a character or a setting. At the same time audiobooks do not allow for that freedom, as they impose a specific timing and the voice of the reader, which could be found unpleasant or expressing feelings in an inappropriate way for the sensitive of the listener.

Audiobooks have many faces (or voices) and seem to us to possess untapped potential. The students from Nyborg gymnasium appeared eager to identify both the new possibilities offered by the Audook application, but were also aware of some intrinsic limitations of audiobooks.

5 Conclusion

The main contributions of this paper are insights on how to make audiobooks interactive and better integrated in learning contexts, in particular when learning English as a foreign language. The three case studies discussed show the large spectrum of opportunities offered by audiobooks in language learning, from content generation to social and game-like interactivity. Our prototypes provide evidence that audiobooks can help in documenting learning (thanks to audio deliverables), in supporting different learning experiences and styles, and in complementing visual information when exploring non-linear narrative.

We believe that the experience obtained in the three studies and the insights we gained can be used as design guidelines to develop more interactive audiobooks and audio-enabled applications. A fully functional mobile application based on the outcome of the main case study is currently under development.