skip to main content
10.1145/2157689.2157835acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article

Multi-party human-robot interaction with distant-talking speech recognition

Published: 05 March 2012 Publication History

Abstract

Speech is one of the most natural medium for human communication, which makes it vital to human-robot interaction. In real environments where robots are deployed, distant-talking speech recognition is difficult to realize due to the effects of reverberation. This leads to the degradation of speech recognition and understanding, and hinders a seamless human-robot interaction. To minimize this problem, traditional speech enhancement techniques optimized for human perception are adopted to achieve robustness in human-robot interaction. However, human and machine perceive speech differently: an improvement in speech recognition performance may not automatically translate to an improvement in human-robot interaction experience (as perceived by the users). In this paper, we propose a method in optimizing speech enhancement techniques specifically to improve automatic speech recognition (ASR) with emphasis on the human-robot interaction experience. Experimental results using real reverberant data in a multi-party conversation, show that the proposed method improved human-robot interaction experience in severe reverberant conditions compared to the traditional techniques.

References

[1]
S. Vaseghi "Advanced Signal processing and Digital Noise reduction'', Wiley and Teubner, 1996.
[2]
R. Gomez and T. Kawahara, ''Robust Speech Recognition based on Dereverberation Parameter Optimization using Acoustic Model Likelihood'', In Proceedings IEEE Transactions Speech and Acoustics Processing, 2010.
[3]
S.F. Boll, ''Suppression of acoustic noise in speech using spectral subtraction'', In Proceedings IEEE Int. Conf. Acoust., Speech, Signal Proc. ICASSP pp. 208--211, 1979.
[4]
S. Kamath and P. Loizou,''A Multi-Band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise'', In Proceedings IEEE Int. Conf. Acoust., Speech, Signal Proc. ICASSP, 2002.
[5]
R. Gomez, J. Even, H. Saruwatari, and K. Shikano, ''Fast Dereverberation for Hands-Free Speech Recognition'', IEEE Workshop HSCMA, 2008.
[6]
R. Gomez, J. Even, H. Saruwatari, and K. Shikano, ''Distant-talking Robust Speech Recognition Using Late Reflection Components of Room Impulse Response'', In Proceedings IEEE Int. Conf. Acoust., Speech, Signal Proc. ICASSP, 2008.
[7]
D. L. Donoho, ''Denoising by soft thresholding'', IEEE Trans. Info. Theory, 1995.
[8]
Y. Suzuki, F. Asano, H.-Y. Kim, and T. Sone, "An Optimum Computer-generated Pulse Signal Suitable for the Measurement of Very Long Impulse Responses"
[9]
H. Kuttruff, ''Room Acoustics'', Spon Press, 2000.
[10]
R. Gomez and T. Kawahara, ''Optimizing Spectral Subtraction and Wiener Filtering for Robust Speech Recognition in Reverberant and Noisy Conditions'', ICASSP, 2010.
[11]
E. Ambikairajah and G. Tattersall, ''Wavelet Transform-based Speech Enhancement'', In Proceedings International Conference on Speech and Language Processing ICSLP, 1998.
[12]
H. Nakajima, K. Nakadai, Y. Hasegawa and H. Tsujino, ''Adaptive Step-size Parameter Control for real World Blind Source Separation'', In Proceedings IEEE Int. Conf. Acoust., Speech, Signal Proc. ICASSP, 2008.
[13]
R. Gomez, T. Kawahara, ''Optimization of Dereverberation Parameters based on Likelihood of Speech Recognizer'', In Proceedings of Interspeech, 2009.

Cited By

View all
  • (2020)Autonomously Learning One-To-Many Social Interaction Logic from Human-Human Interaction DataProceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3319502.3374798(419-427)Online publication date: 9-Mar-2020
  • (2020)Control Your Home With a SmartwatchIEEE Access10.1109/ACCESS.2020.30073288(131601-131613)Online publication date: 2020
  • (2019)Human Group Presence, Group Characteristics, and Group Norms Affect Human-Robot Interaction in Naturalistic SettingsFrontiers in Robotics and AI10.3389/frobt.2019.000486Online publication date: 27-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
March 2012
518 pages
ISBN:9781450310635
DOI:10.1145/2157689
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE-RAS: Robotics and Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dereverberation
  2. multi-party interaction
  3. robot audition
  4. robustness in speech recognition

Qualifiers

  • Research-article

Conference

HRI'12
Sponsor:
HRI'12: International Conference on Human-Robot Interaction
March 5 - 8, 2012
Massachusetts, Boston, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Upcoming Conference

HRI '25
ACM/IEEE International Conference on Human-Robot Interaction
March 4 - 6, 2025
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Autonomously Learning One-To-Many Social Interaction Logic from Human-Human Interaction DataProceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3319502.3374798(419-427)Online publication date: 9-Mar-2020
  • (2020)Control Your Home With a SmartwatchIEEE Access10.1109/ACCESS.2020.30073288(131601-131613)Online publication date: 2020
  • (2019)Human Group Presence, Group Characteristics, and Group Norms Affect Human-Robot Interaction in Naturalistic SettingsFrontiers in Robotics and AI10.3389/frobt.2019.000486Online publication date: 27-Jun-2019
  • (2019)Cooperative Heterogeneous Multi-Robot SystemsACM Computing Surveys10.1145/330384852:2(1-31)Online publication date: 9-Apr-2019
  • (2018)Intelligent grasping with natural human-robot interactionIndustrial Robot: An International Journal10.1108/IR-05-2017-008945:1(44-53)Online publication date: 15-Jan-2018
  • (2017)Enabling robots to understand indirect speech acts in task-based interactionsJournal of Human-Robot Interaction10.5898/JHRI.6.1.Briggs6:1(64-94)Online publication date: 26-May-2017
  • (2017)Exploring data augmentation methods in reverberant human-robot voice communication2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)10.1109/ROMAN.2017.8172449(1154-1158)Online publication date: Aug-2017
  • (2016)Computational Human-Robot InteractionFoundations and Trends in Robotics10.1561/23000000494:2-3(105-223)Online publication date: 20-Dec-2016
  • (2014)Multiparty Interaction Understanding Using Smart Multimodal Digital SignageIEEE Transactions on Human-Machine Systems10.1109/THMS.2014.232687344:5(625-637)Online publication date: Oct-2014
  • (2014)Five Traits of Performance Enhancement Using Cloud Robotics: A SurveyProcedia Computer Science10.1016/j.procs.2014.08.03337(220-227)Online publication date: 2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media