skip to main content
10.1145/3536221.3556587acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

The Effects of an Embodied Pedagogical Agent’s Synthetic Speech Accent on Learning Outcomes

Published: 07 November 2022 Publication History

Abstract

Modern text-to-speech engines can be an effective speech choice for embodied virtual pedagogical agents. However, it is not known how synthesized accents influence learning outcomes and perceptions of the agent. In this paper, we conducted a between-subjects experiment (n=60) to determine the effect of a pedagogical agent’s machine synthesized text-to-speech accent (United States English or Indian English) on learning outcomes and perceptions of the agent for students in the United States. Our results indicate that learner gender interacts with synthesized speech accent to significantly affect learning outcomes and perceptions of the agent. Our results reveal that a foreign synthetic speech accent may affect the learning outcomes of female university students (n=30), but not male university students (n=30). Finally, our results indicate that learner gender interacts with synthesized speech accent to affect perceptions of the pedagogical agent’s human-likeness. We provide novel insights on the differences between male and female learners for interactions with pedagogical agents with synthetic TTS accents.

Supplementary Material

API-R, Pretest, Recall Test, Retention Test, Transfer Test (p198-do-supplements.zip)

References

[1]
Jeahyeon Ahn and David Moore. 2011. The relationship between students’ accent perception and accented voice instructions and its effect on students’ achievement in an interactive multimedia environment. Journal of Educational Multimedia and Hypermedia 20, 4(2011).
[2]
Magnus Alm and Dawn Behne. 2015. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?Frontiers in Psychology 6, July (2015). https://doi.org/10.3389/fpsyg.2015.01014
[3]
Donggun An and Martha Carr. 2017. Learning styles theory fails to explain learning and achievement: Recommendations for alternative approaches. Personality and Individual Differences 116 (2017), 410–416. https://doi.org/10.1016/j.paid.2017.04.050
[4]
Sean Andrist, Micheline Ziadee, Halim Boukaram, Bilge Mutlu, and Majd Sakr. 2015. Effects of Culture on the Credibility of Robot Speech: A Comparison between English and Arabic. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction (Portland, Oregon, USA) (HRI ’15). Association for Computing Machinery, New York, NY, USA, 157–164. https://doi.org/10.1145/2696454.2696464
[5]
Ekaterina Arshavskaya. 2015. International Teaching Assistants’ Experiences in the U.S. Classrooms: Implications for Practice. Journal of the Scholarship of Teaching and Learning 15, 2 (2015), 56–69. https://doi.org/10.14434/josotl.v15i2.12947
[6]
Robert K. Atkinson, Richard E. Mayer, and Mary Margaret Merrill. 2005. Fostering social agency in multimedia learning: Examining the impact of an animated agent’s voice. Contemporary Educational Psychology 30, 1 (2005), 117–139. https://doi.org/10.1016/j.cedpsych.2004.07.001
[7]
Roger Azevedo, François Bouchet, Melissa Duffy, Jason Harley, Michelle Taub, Gregory Trevors, Elizabeth Cloude, Daryn Dever, Megan Wiedbusch, Franz Wortha, 2022. Lessons Learned and Future Directions of MetaTutor: Leveraging Multichannel Data to Scaffold Self-Regulated Learning with an Intelligent Tutoring System. Frontiers in Psychology(2022), 1656.
[8]
Alice Baird, Stina Hasse Jørgensen, Emilia Parada-Cabaleiro, Simone Hantke, Nicholas Cummins, and Björn Schuller. 2017. Perception of paralinguistic traits in synthesized voices. In Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences. 1–5.
[9]
Amy Baylor. 2003. Effects of Images and Animation on Agent Persona. Journal of Educational Computing Research 28, 4 (2003), 373–394.
[10]
Amy L Baylor and Yanghee Kim. 2003. The role of gender and ethnicity in pedagogical agent perception. In E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education. Association for the Advancement of Computing in Education (AACE).
[11]
Sally Boyd. 2003. Foreign-born Teachers in the Multilingual Classroom in Sweden: The Role of Attitudes to Foreign Accent. International Journal of Bilingual Education and Bilingualism 6, 3-4(2003), 283–295. https://doi.org/10.1080/13670050308667786
[12]
Donn Byrne and Don Nelson. 1964. Attraction as a function of attitude similarity-dissimilarity: the effect of topic importance. Psychonomic Science 1, 1-12 (1964), 93–94. https://doi.org/10.3758/bf03342806
[13]
Sung-Woo Byun and Seok-Pil Lee. 2021. Design of a Multi-Condition Emotional Speech Synthesizer. Applied Sciences 11, 3 (Jan 2021), 1144. https://doi.org/10.3390/app11031144
[14]
Lam Vien Cao Ngoc. 2014. Effects of speaker’s accent in a multimedia tutorial on non-native students’ learning and attitudes. Ph. D. Dissertation.
[15]
Kit Ying Chan, Claire Lyons, Lo Lo Kon, Katelyn Stine, Melissa Manley, and Anthony Crossley. 2020. Effect of on-screen text on multimedia learning with native and foreign-accented narration. Learning and Instruction 67, August 2018 (2020), 101305. https://doi.org/10.1016/j.learninstruc.2020.101305
[16]
Erin K. Chiou, Noah L. Schroeder, and Scotty D. Craig. 2020. How we trust, perceive, and learn from virtual humans: The influence of voice quality. Computers and Education 146 (2020). https://doi.org/10.1016/j.compedu.2019.103756
[17]
Scotty D. Craig, Barry Gholson, and David M. Driscoll. 2002. Animated pedagogical agents in multimedia educational environments: Effects of agent properties, picture features, and redundancy. Journal of Educational Psychology 94, 2 (2002), 428–434. https://doi.org/10.1037/0022-0663.94.2.428
[18]
Scotty D. Craig and Noah L. Schroeder. 2017. Reconsidering the voice effect when learning from a virtual human. Computers and Education 114 (2017), 193–205. https://doi.org/10.1016/j.compedu.2017.07.003
[19]
Nils Dahlbäck, Seema Swamy, Clifford Nass, Stanford Ca, and Jörgen SkågEby. 2001. Spoken Interaction with Computers in a Native or Non-Native Language - Same or Different ?Human Computer Interact ’01(2001), 294–301.
[20]
Shivangi Dhawan. 2020. Online Learning: A Panacea in the Time of COVID-19 Crisis. Journal of Educational Technology Systems 49, 1 (2020), 5–22. https://doi.org/10.1177/0047239520934018
[21]
Tiffany D. Do. 2021. Designing Virtual Pedagogical Agents and Mentors for Extended Reality. In 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). 476–479. https://doi.org/10.1109/ISMAR-Adjunct54149.2021.00112
[22]
Tiffany D. Do, Ryan P. McMahan, and Pamela J. Wisniewski. 2022. A New Uncanny Valley? The Effects of Speech Fidelity and Human Listener Gender on Social Perceptions of a Virtual-Human Speaker. In CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 424, 11 pages. https://doi.org/10.1145/3491102.3517564
[23]
Steffi Domagk. 2010. Do pedagogical agents facilitate learner motivation and learning outcomes?: The role of the appeal of agent’s appearance and voice. Journal of Media Psychology 22, 2 (2010), 84–97. https://doi.org/10.1027/1864-1105/a000011
[24]
Susan Eckstein and Giovanni Peri. 2018. Immigrant Niches and Immigrant Networks in the U.S. Labor Market. RSF: The Russell Sage Foundation Journal of the Social Sciences 4, 1(2018), 1–17. https://doi.org/10.7758/RSF.2018.4.1.01 arXiv:https://www.rsfjournal.org/content/4/1/1.full.pdf
[25]
Lisa A. Elkin, Matthew Kay, James J. Higgins, and Jacob O. Wobbrock. 2021. An Aligned Rank Transform Procedure for Multifactor Contrast Tests. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST ’21). 754–768. https://doi.org/10.1145/3472749.3474784 arxiv:2102.11824
[26]
Logan Fiorella and Richard E. Mayer. 2018. What works and doesn’t work with instructional video. Computers in Human Behavior 89 (2018), 465–470. https://doi.org/10.1016/j.chb.2018.07.015
[27]
Fred Fitch and Susan E. Morgan. 2003. "Not a Lick of English": Constructing the ITA Identity Through Student Narratives. Communication Education 52, 3-4 (2003), 297–310. https://doi.org/10.1080/0363452032000156262
[28]
R. G. P. Galluccio. 2008. Animated Pedagogical Agents as Spanish Language Instructors: Effect of Accent, Appearance, and Type of Activity on Student Performance, Motivation, and Perception of Agent. 69, 12-A (2008), 4697.
[29]
Mary M. Gill. 1994. Accent and Stereotypes: Their Effect on Perceptions of Teachers and Lecture Comprehension. Journal of Applied Communication Research 22, 4 (1994), 348–361. https://doi.org/10.1080/00909889409365409
[30]
Arthur C. Graesser. 2016. Conversations with AutoTutor Help Students Learn. International Journal of Artificial Intelligence in Education 26, 1(2016), 124–132. https://doi.org/10.1007/s40593-015-0086-4
[31]
Jason M. Harley, Michelle Taub, Roger Azevedo, and François Bouchet. 2018. "Let’s Set Up Some Subgoals": Understanding Human-Pedagogical Agent Collaborations and Their Implications for Learning and Prompt and Feedback Compliance. IEEE Transactions on Learning Technologies 11, 1 (2018), 54–66. https://doi.org/10.1109/TLT.2017.2756629
[32]
Johannes Hewig, Ralf H. Trippe, Holger Hecht, Thomas Straube, and Wolfgang H.R. Miltner. 2008. Gender differences for specific body regions when looking at men and women. Journal of Nonverbal Behavior 32, 2 (2008), 67–78. https://doi.org/10.1007/s10919-007-0043-5
[33]
W. Lewis Johnson and James C. Lester. 2016. Face-to-Face Interaction with Pedagogical Agents, Twenty Years Later. International Journal of Artificial Intelligence in Education 26, 1(2016), 25–36. https://doi.org/10.1007/s40593-015-0065-9
[34]
Okim Kang and Meghan Moran Wilson. 2019. Enhancing Communication Between ITAs and US Undergraduate Students. A Transdisciplinary Approach to ITA Research: Perspectives from Applied Linguistics. In Multilingual Matters.
[35]
Yanghee Kim and Amy L. Baylor. 2016. Research-Based Design of Pedagogical Agent Roles: A Review, Progress, and Recommendations. International Journal of Artificial Intelligence in Education 26, 1(2016), 160–169. https://doi.org/10.1007/s40593-015-0055-y
[36]
Yanghee Kim and Jae Hoon Lim. 2013. Gendered Socialization with an Embodied Agent: Creating a Social and Affable Mathematics Learning Environment for Middle-Grade Females. Journal of Educational Psychology 105, 4 (2013), 1164–1174. https://doi.org/10.1037/a0031027
[37]
Brigitte Krenn, Stephanie Schreitter, and Friedrich Neubarth. 2017. Speak to me and I tell you who you are! A language-attitude study in a cultural-heritage application. AI and Society 32, 1 (2017), 65–77. https://doi.org/10.1007/s00146-014-0569-0
[38]
James C Lester, Stuart G Towns, Charles B Callaway, Jennifer L Voerman, and Patrick J Fitzgerald. 2000. Deictic and Emotive Communication in Animated Pedagogical Agents. In Embodied Conversational Agents. MIT press, Cambridge, MA, 123–154. https://doi.org/10.7551/mitpress/2697.003.0007
[39]
Ati Suci Dian Martha and Harry B. Santoso. 2019. The design and impact of the pedagogical agent: A systematic literature review. Journal of Educators Online 16, 1 (2019). https://doi.org/10.9743/jeo.2019.16.1.8
[40]
Richard E. Mayer. 2014. Principles based on social cues in multimedia learning: Personalization, voice, image, and embodiment principles. In The Cambridge Handbook of Multimedia Learning, Second Edition. Number May 2017. 345–368. https://doi.org/10.1017/CBO9781139547369.017
[41]
Richard E. Mayer and Roxana Moreno. 1998. A split-attention effect in multimedia learning: Evidence for dual processing systems in working memory. Journal of Educational Psychology 90, 2 (1998), 312–320. https://doi.org/10.1037/0022-0663.90.2.312
[42]
Richard E. Mayer and Celeste Pilegard. 2014. Principles for managing essential processing in multimedia learning: Segmenting, pre-training, and modality principles. In The Cambridge Handbook of Multimedia Learning, Second Edition. 316–344. https://doi.org/10.1017/CBO9781139547369.016
[43]
Richard E. Mayer, Kristina Sobko, and Patricia D. Mautone. 2003. Social cues in multimedia learning: Role of speaker’s voice. Journal of Educational Psychology 95, 2 (2003), 419–425. https://doi.org/10.1037/0022-0663.95.2.419
[44]
Shannon M McCrocklin, Kyle P Blanquera, and Deyna Loera. 2017. Student Perceptions of University Instructor Accent in a Linguistically Diverse Area. In Proceedings of the 9th Pronounciation in Second Language Learning and Teaching conference. 141–150.
[45]
Conor McGinn and Ilaria Torre. 2019. Can you Tell the Robot by the Voice?: An Exploratory Study on the Role of Voice in the Perception of Robots. ACM/IEEE International Conference on Human-Robot Interaction (2019), 211–221. https://doi.org/10.1109/HRI.2019.8673279
[46]
Roxana Moreno, Richard E. Mayer, Hiller A. Spires, and James C. Lester. 2001. The case for social agency in computer-based teaching: Do students learn more deeply when they interact with animated pedagogical agents?Cognition and Instruction 19, 2 (2001), 177–213. https://doi.org/10.1207/S1532690XCI1902_02
[47]
Roxana Moreno and Richard E. and Mayer. 1999. Cognitive principles of multimedia learning: the role of modality and contiguity. Journal of Educational Psychology 91, 2 (1999), 358–368.
[48]
Murray J. Munro and Tracey M. Derwing. 1995. Processing Time, Accent, and Comprehensibility in the Perception of Native and Foreign-Accented Speech. Language and Speech 38, 3 (1995), 289–306. https://doi.org/10.1177/002383099503800305
[49]
Clifford Nass, Jonathan Steuer, and Ellen R. Tauber. 1994. Computers are Social Actors. Proceedings of the SIGCHI conference on Human factors in computing systems (1994), 72–78. https://doi.org/10.1109/VSMM.2014.7136659
[50]
Clifford Ivar Nass and Scott Brave. 2005. Wired for speech: How voice activates and advances the human-computer relationship. MIT press Cambridge, MA.
[51]
Fred Paas and Jeroen J.G. van Merriënboer. 2020. Cognitive-Load Theory: Methods to Manage Working Memory Load in the Learning of Complex Tasks. Current Directions in Psychological Science 29, 4 (2020), 394–398. https://doi.org/10.1177/0963721420922183
[52]
Franz Pauls, Franz Petermann, and Anja Christina Lepach. 2013. Gender differences in episodic memory and visual working memory including the effects of age. Memory 21, 7 (2013), 857–874. https://doi.org/10.1080/09658211.2013.765892
[53]
Tamara Rakić, Melanie C. Steffens, and Amélie Mummendey. 2011. Blinded by the Accent! The Minor Role of Looks in Ethnic Categorization. Journal of Personality and Social Psychology 100, 1(2011), 16–29. https://doi.org/10.1037/a0021522
[54]
Anara Sandygulova and Gregory M.P. O’Hare. 2015. Children’s perception of synthesized voice: Robot’s gender, age and accent. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9388 LNCS (2015), 594–602. https://doi.org/10.1007/978-3-319-25554-5_59
[55]
Claudia Schrader, Tina Seufert, and Steffi Zander. 2021. Learning From Instructional Videos: Learner Gender Does Matter; Speaker Gender Does Not. Frontiers in Psychology 12 (2021). https://doi.org/10.3389/fpsyg.2021.655720
[56]
Noah L. Schroeder, Olusola O. Adesope, and Rachel Barouch Gilbert. 2013. How effective are pedagogical agents for learning? a meta-analytic review. Journal of Educational Computing Research 49, 1 (2013), 1–39. https://doi.org/10.2190/EC.49.1.a
[57]
Noah L. Schroeder, Fan Yang, Tanvi Banerjee, William L. Romine, and Scotty D. Craig. 2018. The influence of learners’ perceptions of virtual humans on learning transfer. Computers and Education 126 (2018), 170–182. https://doi.org/10.1016/j.compedu.2018.07.005
[58]
Katie Seaborn, Norihisa P. Miyake, Peter Pennefather, and Mihoko Otake-Matsuura. 2021. Voice in Human–Agent Interaction. Comput. Surveys 54, 4 (2021), 1–43. https://doi.org/10.1145/3386867
[59]
Sneha Shetty. 2016. Influence of gender in learning style preference in undergraduate medical students. In Proceedings of ASAR-IJIEEE International Conference. 28–29. https://www.digitalxplore.org/up_proc/pdf/253-147911407928-29.pdf
[60]
N. C. Subtirelu and S. Lindemann. 2014. Teaching First Language Speakers to Communicate Across Linguistic Difference: Addressing Attitudes, Comprehension, and Strategies. Applied Linguistics (2014), 1–20. https://doi.org/10.1093/applin/amu068
[61]
Rie Tamagawa, Catherine I. Watson, I. Han Kuo, Bruce A. Macdonald, and Elizabeth Broadbent. 2011. The effects of synthesized voice accents on user perceptions of robots. International Journal of Social Robotics 3, 3 (2011), 253–262. https://doi.org/10.1007/s12369-011-0100-4
[62]
Cara Tannenbaum, Robert P. Ellis, Friederike Eyssel, James Zou, and Londa Schiebinger. 2019. Sex and gender analysis improves science and engineering. Nature 575(2019), 137–146. https://doi.org/10.1038/s41586-019-1657-6
[63]
Jacob O. Wobbrock, Leah Findlater, Darren Gergle, and James J. Higgins. 2011. The aligned rank transform for nonparametric factorial analyses using only ANOVA procedures. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’11). 143–146. https://dl.acm.org/doi/pdf/10.1145/1978942.1978963
[64]
Ruth Wong. 2018. Non-native EFL Teachers’ Perception of English Accent in Teaching and Learning: Any Preference?Theory and Practice in Language Studies 8, 2 (2018), 177. https://doi.org/10.17507/tpls.0802.01

Cited By

View all
  • (2024)"We're not all construction workers": Algorithmic Compression of Latinidad on TikTokProceedings of the ACM on Human-Computer Interaction10.1145/36870198:CSCW2(1-31)Online publication date: 8-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction
November 2022
830 pages
ISBN:9781450393904
DOI:10.1145/3536221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accent
  2. pedagogical agents
  3. synthetic speech

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMI '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)"We're not all construction workers": Algorithmic Compression of Latinidad on TikTokProceedings of the ACM on Human-Computer Interaction10.1145/36870198:CSCW2(1-31)Online publication date: 8-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media