skip to main content
10.1145/2909824.3020229acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article

Child Speech Recognition in Human-Robot Interaction: Evaluations and Recommendations

Published: 06 March 2017 Publication History

Abstract

An increasing number of human-robot interaction (HRI) studies are now taking place in applied settings with children. These interactions often hinge on verbal interaction to effectively achieve their goals. Great advances have been made in adult speech recognition and it is often assumed that these advances will carry over to the HRI domain and to interactions with children. In this paper, we evaluate a number of automatic speech recognition (ASR) engines under a variety of conditions, inspired by real-world social HRI conditions. Using the data collected we demonstrate that there is still much work to be done in ASR for child speech, with interactions relying solely on this modality still out of reach. However, we also make recommendations for child-robot interaction design in order to maximise the capability that does currently exist.

References

[1]
P. Baxter, R. Wood, and T. Belpaeme. A touchscreen-based 'sandtray' to facilitate, mediate and contextualise human-robot social interaction. In Proceedings of the 7th annual ACM/IEEE international conference on Human-Robot Interaction, pages 105--106. ACM, 2012.
[2]
T. Belpaeme, P. Baxter, R. Read, R. Wood, H. Cuayáhuitl, B. Kiefer, S. Racioppa, I. Kruijff-Korbayová, G. Athanasopoulos, V. Enescu, R. Looije, M. Neerincx, Y. Demiris, R. Ros-Espinoza, A. Beck, L. Canamero, A. Hiolle, M. Lewis, I. Baroni, M. Nalin, P. Cosi, G. Paci, F. Tesser, G. Sommavilla, and R. Humbert. Multimodal Child-Robot Interaction: Building Social Bonds. Journal of Human-Robot Interaction, 1(2):33--53, 2012.
[3]
T. Belpaeme, J. Kennedy, P. Baxter, P. Vogt, E. E. Krahmer, S. Kopp, K. Bergmann, P. Leseman, A. C. Küntay, T. Göksun, A. K. Pandey, R. Gelin, P. Koudelkova, and T. Deblieck. L2TOR - second language tutoring using social robots. In Proceedings of the 1st International Workshop on Educational Robots, Paris, France, 2015.
[4]
R. A. Berman and D. I. Slobin. Relating events in narrative: A crosslinguistic developmental study. Psychology Press, 2013.
[5]
P. Cosi, M. Nicolao, G. Paci, G. Sommavilla, and F. Tesser. Comparing open source ASR toolkits on Italian children speech. In Proceedings of the Workshop on Child Computer Interaction, 2014.
[6]
S. Fernando, R. K. Moore, D. Cameron, E. C. Collins, A. Millings, A. J. Sharkey, and T. J. Prescott. Automatic recognition of child speech for robotic applications in noisy environments. arXiv preprint, arXiv:1611.02695, 2016.
[7]
W. T. Fitch and J. Giedd. Morphology and development of the human vocal tract: A study using magnetic resonance imaging. The Journal of the Acoustical Society of America, 106(3):1511--1522, 1999.
[8]
M. Gerosa, D. Giuliani, S. Narayanan, and A. Potamianos. A review of ASR technologies for children's speech. In Proceedings of the 2nd Workshop on Child, Computer and Interaction, pages 7:1--7:8. ACM, 2009.
[9]
P. Grill and J. Tucková. Speech databases of typical children and children with SLI. PloS one, 11(3):e0150365, 2016.
[10]
A. Hagen, B. Pellom, and R. Cole. Children's speech recognition with application to interactive books and tutors. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, 2003, pages 186--191. IEEE, 2003.
[11]
A. Hamalainen, S. Candeias, H. Cho, H. Meinedo, A. Abad, T. Pellegrini, M. Tjalve, I. Trancoso, and M. Sales Dias. Correlating ASR errors with developmental changes in speech production: A study of 3--10-year-old European Portuguese children's speech. In Proceedings of the Workshop on Child Computer Interaction, 2014.
[12]
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82--97, 2012.
[13]
C.-M. Huang, S. Andrist, A. Sauppé, and B. Mutlu. Using gaze patterns to predict task intent in collaboration. Frontiers in Psychology, 6, 2015.
[14]
J. Kennedy, P. Baxter, and T. Belpaeme. Nonverbal Immediacy as a Characterisation of Social Behaviour for Human-Robot Interaction. International Journal of Social Robotics, in press.
[15]
J. Kennedy, P. Baxter, E. Senft, and T. Belpaeme. Social Robot Tutoring for Child Second Language Learning. In Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction, pages 67--74. ACM, 2016.
[16]
J. Kennedy, S. Lemaignan, C. Montassier, P. Lavalade, B. Irfan, F. Papadopoulos, E. Senft, and T. Belpaeme. Children speech recording (English, spontaneous speech + pre-defined sentences). Data set, 2016.
[17]
L. F. Lamel, R. H. Kassel, and S. Seneff. Speech database development: Design and analysis of the acoustic-phonetic corpus. In Speech Input/Output Assessment and Speech Databases, 1989.
[18]
L. Lee and R. Rose. A frequency warping approach to speaker normalization. IEEE Transactions on Speech and Audio Processing, 6(1):49--60, Jan 1998.
[19]
J. F. Lehman. Robo fashion world: a multimodal corpus of multi-child human-computer interaction. In Proceedings of the 2014 Workshop on Understanding and Modeling Multiparty, Multimodal Interactions, pages 15--20. ACM, 2014.
[20]
I. Leite, H. Hajishirzi, S. Andrist, and J. Lehman. Managing chaos: models of turn-taking in character-multichild interactions. In Proceedings of the 15th ACM International Conference on Multimodal Interaction, pages 43--50. ACM, 2013.
[21]
H. Liao, G. Pundak, O. Siohan, M. Carroll, N. Coccaro, Q.-M. Jiang, T. N. Sainath, A. Senior, F. Beaufays, and M. Bacchiani. Large vocabulary automatic speech recognition for children. In Proceedings of Interspeech, 2015.
[22]
A. Potamianos and S. Narayanan. Robust recognition of children's speech. IEEE Transactions on Speech and Audio Processing, 11(6):603--616, 2003.
[23]
M. L. Seltzer, D. Yu, and Y. Wang. An investigation of deep neural networks for noise robust speech recognition. In Proccedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 7398--7402. IEEE, 2013.
[24]
R. Serizel and D. Giuliani. Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children. Natural Language Engineering, 1:1--26, 2016.
[25]
F. Tanaka, K. Isshiki, F. Takahashi, M. Uekusa, R. Sei, and K. Hayashi. Pepper learns together with children: Development of an educational application. In Proceedings of the IEEE-RAS 15th International Conference on Humanoid Robots, HUMANOIDS 2015, pages 270--275. IEEE, 2015.
[26]
D. Yu and L. Deng. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015.

Cited By

View all
  • (2025)Exploring discrete speech units for privacy-preserving and efficient speech recognition for school-aged and preschool childrenInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2025.103460(103460)Online publication date: Feb-2025
  • (2024)Enhanced AI Model to Improve Child Speech RecognitionJournal of Digital Contents Society10.9728/dcs.2024.25.2.54725:2(547-555)Online publication date: 28-Feb-2024
  • (2024)Navigating Education 5.0 Robotic Technique for Teaching Foreign Languages in Todays ClassroomPreconceptions of Policies, Strategies, and Challenges in Education 5.010.4018/979-8-3693-3041-8.ch008(118-138)Online publication date: 28-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '17: Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction
March 2017
510 pages
ISBN:9781450343367
DOI:10.1145/2909824
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 March 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic speech recognition
  2. child-robot interaction
  3. interaction design recommendations
  4. verbal interaction

Qualifiers

  • Research-article

Funding Sources

Conference

HRI '17
Sponsor:

Acceptance Rates

HRI '17 Paper Acceptance Rate 51 of 211 submissions, 24%;
Overall Acceptance Rate 268 of 1,124 submissions, 24%

Upcoming Conference

HRI '25
ACM/IEEE International Conference on Human-Robot Interaction
March 4 - 6, 2025
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)170
  • Downloads (Last 6 weeks)32
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Exploring discrete speech units for privacy-preserving and efficient speech recognition for school-aged and preschool childrenInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2025.103460(103460)Online publication date: Feb-2025
  • (2024)Enhanced AI Model to Improve Child Speech RecognitionJournal of Digital Contents Society10.9728/dcs.2024.25.2.54725:2(547-555)Online publication date: 28-Feb-2024
  • (2024)Navigating Education 5.0 Robotic Technique for Teaching Foreign Languages in Todays ClassroomPreconceptions of Policies, Strategies, and Challenges in Education 5.010.4018/979-8-3693-3041-8.ch008(118-138)Online publication date: 28-Jun-2024
  • (2024)Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine LearningBiomimetics10.3390/biomimetics90703919:7(391)Online publication date: 27-Jun-2024
  • (2024)Preschoolers' Interactions with Social Robots: Investigating the Potential for Eliciting Metatalk and Critical Technological ThinkingCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610978.3640654(1053-1057)Online publication date: 11-Mar-2024
  • (2024)A multi-site language study on child-robot dialoguesAdvanced Robotics10.1080/01691864.2024.238814538:19-20(1486-1500)Online publication date: 22-Aug-2024
  • (2024)Humanoid robot as an educational assistant – insights of speech recognition for online and offline mode of teachingBehaviour & Information Technology10.1080/0144929X.2024.2344726(1-18)Online publication date: 26-Apr-2024
  • (2024)Children and adults produce distinct technology- and human-directed speechScientific Reports10.1038/s41598-024-66313-514:1Online publication date: 6-Jul-2024
  • (2024)Comparison of Outcomes Between Robot-Assisted Language Learning System and Human Tutors: Focusing on Speaking AbilityInternational Journal of Social Robotics10.1007/s12369-024-01134-016:4(743-761)Online publication date: 11-Apr-2024
  • (2024)New Comer in the Bakery Store: A Long-Term Exploratory Study Toward Design of Useful Service Robot ApplicationsInternational Journal of Social Robotics10.1007/s12369-024-01119-z16:9-10(1901-1918)Online publication date: 8-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media