skip to main content
10.1145/3173574.3173930acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections

Designing Pronunciation Learning Tools: The Case for Interactivity against Over-Engineering

Published: 21 April 2018 Publication History


Paired role-play is a common collaborative activity in language learning classrooms, adding meaning and cultural context to the learning process. This is complemented by teachers' immediate and explicit feedback. Interactive tools that provide explicit feedback during collaborative learning are scarce, however. More commonly, supporting dialogue practice takes the form of computer-aided single-student read-and-record activities. This limitation is partly due to the complexity of processing language learners' speech in unconstrained tasks. In this paper, we assess the value of pronunciation error detection algorithms within a realistic, software-aided, paired role-playing task with beginning learners of French. We found that students' pronunciations improve regardless of the type of error detector employed -- even for those using simple heuristics. We suggest that speech technologies for language learning have been too focused on engineering goals. Instead, new interactive designs supporting collaboration may be used to overcome engineering limitations and properly support students' engagement.

Supplementary Material

ZIP File ( (pn3189-file3.mp4)
Supplemental video
MP4 File (pn3189.mp4)


Thom Baguley. 2012. Calculating and graphing within-subject confidence intervals for ANOVA. Behavior Research Methods 44, 1 (2012), 158--175.
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67, 1 (2015), 1--48.
David Birdsong. 2007. Nativelike pronunciation among late learners of French as a second language. In Language experience in second language speech learning: in honor of James Emil Flege. John Benjamins Publishing, 99--116.
Stephen Bodnar, Catia Cucchiarini, Bart Penning de Vries, Helmer Strik, and Roeland van Hout. 2017. Learner affect in computerised L2 oral grammar practice with corrective feedback. Computer Assisted Language Learning 30, 3--4 (2017), 223--246.
Stephen Bodnar, Catia Cucchiarini, Helmer Strik, and Roeland van Hout. 2016. Evaluating the motivational impact of CALL systems: current practices and future directions. Computer Assisted Language Learning 29, 1 (2016), 186--212.
Judy Breitkreutz, Tracey Derwing, and Marian Rossiter. 2001. Pronunciation Teaching Practices in Canada. TESL Canada Journal 19, 1 (2001), 51--61.
Barbara Burnaby and Yilin Sun. 1989. Chinese Teachers' Views of Western Language Teaching: Context Informs Paradigms. TESOL Quarterly 23, 2 (1989), 219--238.
Susanne Carroll and Merrill Swain. 1993. Explicit and Implicit Negative Feedback. Studies in Second Language Acquisition 15, 03 (1993), 357--386.
Chen Chen, Xiaojun Meng, Shengdong Zhao, and Morten Fjeld. 2017. ReTool: Interactive Microtask and Workflow Design Through Demonstration. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3551--3556.
Ray Clifford. 1998. Mirror, Mirror, on the Wall: Reflections on Computer Assisted Language Learning. CALICO Journal 16, 1 (1998), 1. http://search.proquest. com/docview/750443820?accountid=14771
Denis Cousineau. 2005. Confidence intervals in within-subject designs: A simpler solution to Loftus and Massons method. Tutorials in Quantitative Methods for Psychology 1 (2005), 42--45.
Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, and Andrés Monroy-Hernández. 2017. Calendar.Help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 2382--2393.
Gabriel Culbertson, Solace Shen, Erik Andersen, and Malte Jung. 2017. Have Your Cake and Eat It Too: Foreign Language Learning with a Crowdsourced Video Captioning System. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 286--296.
Arturo Deza, Jeffrey R. Peters, Grant S. Taylor, Amit Surana, and Miguel P. Eckstein. 2017. Attention Allocation Aid for Visual Search. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 220--231.
Joost van Doremalen, Lou Boves, Jozef Colpaert, Catia Cucchiarini, and Helmer Strik. 2016. Evaluating automatic speech recognition-based language learning systems: a case study. Computer Assisted Language Learning 29, 4 (2016), 833--851.
Farzad Ehsani and Eva Knodt. 1998. Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. Language Learning & Technology 2, 1 (1998), 45--60.
Maxine Eskenazi. 2009. An overview of spoken language technology for education. Speech Communication 51, 10 (2009), 832 -- 844.
James Emil Flege. 1987. The production of new and similar phones in a foreign language: Evidence for the effect of equivalence classification. Journal of phonetics 15, 1 (1987), 47--65.
James Emil Flege, Ocke-Schwen Bohn, and Sunyoung Jang. 1997. Effects of experience on non-native speakers' production and perception of English vowels. Journal of Phonetics 25, 4 (1997), 437 -- 470.
Nina Garrett. 2009. Computer-Assisted Language Learning Trends and Issues Revisited: Integrating Innovation. The Modern Language Journal 93 (2009), 719--740.
Lesson Nine GmbH. 2010. Tech Background: Babbel Speech Recognition. (June 2010).
Lesson Nine GmbH. 2018. Babbel. (2018).
Ewa M. Golonka, Anita R. Bowles, Victor M. Frank, Dorna L. Richardson, and Suzanne Freynik. 2014. Technologies for foreign language learning: a review of technology types and their effectiveness. Computer Assisted Language Learning 27, 1 (2014), 70--105.
Google. 2018. Google Cloud Speech API. (2018).
Saul Greenberg and Bill Buxton. 2008. Usability evaluation considered harmful (some of the time). In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Florence, Italy, 111--120.
Guangwei Hu. 2002. Potential Cultural Resistance to Pedagogical Imports: The Case of Communicative Language Teaching in China. Language, Culture and Curriculum 15, 2 (2002), 93--105.
D. Huggins-Daines, M. Kumar, A. Chan, A.W. Black, M. Ravishankar, and A.I. Rudnicky. 2006. Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, Vol. 1. I--I.
Yu-Wan Hung and Steve Higgins. 2016. Learners use of communication strategies in text-based and video-based synchronous computer-mediated communication environments: opportunities for language learning. Computer Assisted Language Learning 29, 5 (2016), 901--924.
Ivaylo Ilinkin and Sunghee Kim. 2017. Evaluation of Korean Text Entry Methods for Smartwatches. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 722--726.
Rosetta Stone Incorporated. 2018. Rosetta Stone. (2018).
Hernisa Kacorri, Kris M. Kitani, Jeffrey P. Bigham, and Chieko Asakawa. 2017. People with Visual Impairment Training Personal Object Recognizers: Feasibility and Challenges. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 5839--5849.
Matthew Kay, Shwetak N. Patel, and Julie A. Kientz. 2015. How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 347--356.
Michael G. Kenward and James H. Roger. 1997. Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood. Biometrics 53, 3 (1997), pp. 983--997.
Brian Mak, Manhung Siu, Mimi Ng, Yik-Cheung Tam, Yu-Chung Chan, Kin-Wah Chan, Ka-Yee Leung, Simon Ho, Fong-Ho Chong, Jimmy Wong, and Jacqueline Lo. 2003. PLASER: Pronunciation Learning via Automatic Speech Recognition. In Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing - Volume 2 (HLT-NAACL-EDUC '03). Association for Computational Linguistics, Stroudsburg, PA, USA, 23--29.
Richard Morey. 2008. Confidence Intervals from Normalized Data: A correction to Cousineau (2005). Tutorials in Quantitative Methods for Psychology 4, 2 (2008), 61--64.
Ambra Neri, Catia Cucchiarini, and Helmer Strik. 2006. ASR-based corrective feedback on pronunciation: does it really work?. In Interspeech 2006. 1982--1985. interspeech_2006/i06_1372.pdf
Howard Nicholas, Patsy M. Lightbown, and Nina Spada. 2001. Recasts as Feedback to Language Learners. Language Learning 51, 4 (2001), 719--758.
Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press. DisplayDCTMContent?documentId=0900001680459f97
Sean Robertson, Cosmin Munteanu, and Gerald Penn. 2016. Pronunciation Error Detection for New Language Learners. In Interspeech 2016. 2691--2695.
M. R. Salaberry. 1996. A Theoretical Foundation for the Development of Pedagogical Tasks in Computer Mediated Communication. CALICO Journal 14, 1 (1996), 5. accountid=14771
Sandra J. Savignon. 1987. Communicative Language Teaching. Theory into Practice 26, 4 (1987), pp. 235--242.
Peter Skehan. 2003. Task-based instruction. Language Teaching 36, 1 (2003), 1--14.
Nina Spada and Yasuyo Tomita. 2010. Interactions Between Type of Instruction and Type of Language Feature: A Meta-Analysis: Type of Instruction and Language Feature. Language Learning 60, 2 (2010), 263--308.
Theban Stanley, Kadri Hacioglu, and Brian Pellom. 2011. Statistical Machine Translation Framework for Modeling Phonological Errors in Computer Assisted Pronunciation Training System. In ISCA Workshop on Speech and Language Technology in Education. Venice, Italy. Stanley mt_for_phonological_error_modeling.pdf
B.G. Tabachnick and L.S. Fidell. 2012. Using Multivariate Statistics. Pearson Education, Limited.
Joshua Tan, Lujo Bauer, Joseph Bonneau, Lorrie Faith Cranor, Jeremy Thomas, and Blase Ur. 2017. Can Unicorns Help Users Compare Crypto Key Fingerprints?. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3787--3798.
Duolingo Team. 2018. Duolingo. (2018).
Preben Wik, Rebecca Hincks, and Julia Hirschberg. 2009. Responses to Ville: A virtual language teacher for Swedish. (2009).
Silke M. Witt. 2012. Automatic error detection in pronunciation training: Where we are and where we need to go. In Proc. of the International Symposium on Automatic Detection of Errors in Pronunciation Training (ISADEPT), Vol. 6. 1--8.
Ping Yu, Yingxin Pan, Chen Li, Zengxiu Zhang, Qin Shi, Wenpei Chu, Mingzhuo Liu, and Zhiting Zhu. 2016. User-centred design for Chinese-oriented spoken english learning system. Computer Assisted Language Learning 29, 5 (2016), 984--1000.
Yong Zhao. 2003. Recent developments in technology and language learning: A literature review and meta-analysis. CALICO journal 21, 1 (2003), 7--27.
Huiyuan Zhou, Aisha Edrah, Bonnie MacKay, and Derek Reilly. 2017. Block Party: Synchronized Planning and Navigation Views for Neighbourhood Expeditions. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1702--1713.

Cited By

View all
  • (2023)Evaluating a Conversational Agent for Second Language Learning Aligned with the School CurriculumArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-36336-8_22(142-147)Online publication date: 30-Jun-2023
  • (2022)DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00051(313-320)Online publication date: Dec-2022
  • (2022)Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphoneSpeech Communication10.1016/j.specom.2022.08.002144:C(42-56)Online publication date: 1-Oct-2022
  • Show More Cited By

Index Terms

  1. Designing Pronunciation Learning Tools: The Case for Interactivity against Over-Engineering



    Information & Contributors


    Published In

    cover image ACM Conferences
    CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
    April 2018
    8489 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 April 2018


    Request permissions for this article.

    Check for updates

    Author Tags

    1. collaborative education
    2. computer assisted language learning (call)
    3. computer assisted pronunciation training (capt)


    • Research-article

    Funding Sources

    • Ontario Centres of Excellence


    CHI '18

    Acceptance Rates

    CHI '18 Paper Acceptance Rate 666 of 2,590 submissions, 26%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Jan 2025

    Other Metrics


    Cited By

    View all
    • (2023)Evaluating a Conversational Agent for Second Language Learning Aligned with the School CurriculumArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-36336-8_22(142-147)Online publication date: 30-Jun-2023
    • (2022)DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00051(313-320)Online publication date: Dec-2022
    • (2022)Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphoneSpeech Communication10.1016/j.specom.2022.08.002144:C(42-56)Online publication date: 1-Oct-2022
    • (2021)Verbose : Designing a Context-based Educational System for Improving Communicative ExpressionsProceedings of the 23rd International Conference on Mobile Human-Computer Interaction10.1145/3447526.3472057(1-13)Online publication date: 27-Sep-2021
    • (2021)PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective FeedbackProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445490(1-14)Online publication date: 6-May-2021
    • (2021)Delivery Ghost: Effects of Language Immersion and Interactivity in a Language Learning GameExtended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411763.3451767(1-7)Online publication date: 8-May-2021
    • (2021)Smart Campus Implementation Effects towards Student Interest in Higher Education: A Systematic Literature Review2021 8th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE)10.1109/ICITACEE53184.2021.9617467(101-106)Online publication date: 23-Sep-2021
    • (2021)Exploration of Voice User Interfaces for Older Adults—A Pilot Study to Address Progressive Vision LossDigital Interaction and Machine Intelligence10.1007/978-3-030-74728-2_15(159-168)Online publication date: 26-Jun-2021
    • (2020)Multimodal, visuo-haptic games for abstract theory instruction: grabbing charged particlesJournal on Multimodal User Interfaces10.1007/s12193-020-00327-x15:1(1-10)Online publication date: 6-Jun-2020
    • (2018)Accessible Voice InterfacesCompanion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/3272973.3273006(441-446)Online publication date: 30-Oct-2018

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.








    Share this Publication link

    Share on social media