Ego noise cancellation of a robot using missing feature masks

Ince, Gökhan; Nakadai, Kazuhiro; Rodemann, Tobias; Tsujino, Hiroshi; Imura, Jun-ichi

doi:10.1007/s10489-011-0285-0

Ego noise cancellation of a robot using missing feature masks

Published: 29 March 2011

Volume 34, pages 360–371, (2011)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Gökhan Ince^1,3,
Kazuhiro Nakadai^1,3,
Tobias Rodemann²,
Hiroshi Tsujino¹ &
…
Jun-ichi Imura³

231 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

We describe an architecture that gives a robot the capability to recognize speech by cancelling ego noise, even while the robot is moving. The system consists of three blocks: (1) a multi-channel noise reduction block, comprising consequent stages of microphone-array-based sound localization, geometric source separation and post-filtering; (2) a single-channel noise reduction block utilizing template subtraction; and (3) an automatic speech recognition block. In this work, we specifically investigate a missing feature theory-based automatic speech recognition (MFT-ASR) approach in block (3). This approach makes use of spectro-temporal elements derived from (1) and (2) to measure the reliability of the acoustic features, and generates masks to filter unreliable acoustic features. We then evaluated this system on a robot using word correct rates. Furthermore, we present a detailed analysis of recognition accuracy to determine optimal parameters. Implementation of the proposed MFT-ASR approach resulted in significantly higher recognition performance than single or multi-channel noise reduction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abbreviations

ANN:: Artificial Neural Network
ASR:: Automatic Speech Recognition
BSS:: Blind Source Separation
DoA:: Direction of Arrival
GSS:: Geometric Source Separation
HMM:: Hidden Markov Model
MCRA:: Minima Controlled Recursive Averaging
MFCC:: Mel-Frequency Cepstral Coefficients
MFM:: Missing Feature Mask
MFT:: Missing Feature Theory
MMSE:: Minimum Mean Square Estimation
MSLS:: Mel-Scale Log Spectrum
MUSIC:: MUltiple SIgnal Classification
NN:: Nearest Neighbour
PF:: Post-Filtering
SE:: Speech Enhancement
SS:: Spectral Subtraction
SSL:: Sound Source Localization
SSS:: Sound Source Separation
TS:: Template Subtraction
WCR:: Word Correct Rate
WF:: Wiener Filtering

References

Sato M, Sugiyama A, Ohnaka S (2004) An adaptive noise canceller with low signal-distortion based on variable stepsize subfilters for human-robot communication. IEICE Trans Fundam Electron Commun Comput Sci E88-A(8):2055–2061
Article Google Scholar
Brandstein M, Ward D (2001) Microphone arrays: signal processing techniques and applications. Springer, Berlin
Google Scholar
Benesty J, Sondhi MM, Huang Y (2008) Springer handbook of speech processing. Springer, Berlin
Book Google Scholar
Ince G, Nakadai K, Rodemann T, Hasegawa Y, Tsujino H, Imura J (2010) A hybrid framework for ego noise cancellation of a robot. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 3623–3628
Google Scholar
Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process 27(2):113–120
Article Google Scholar
Cohen I (2002) Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process Lett 9(1):12–15
Article Google Scholar
Deller J (2000) Discrete-time processing of speech signals. IEEE Press, New York
Google Scholar
Martin R (1994) Spectral subtraction based on minimum statistics. In: Proceedings European signal processing, pp 1182–1185
Google Scholar
Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Signal Process 81:2403–2481
Article MATH Google Scholar
Nakajima H, Ince G, Nakadai K, Hasegawa Y (2010) An easily-configurable robot audition system using histogram-based recursive level estimation. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 958–963
Google Scholar
Nakadai K, Okuno HG, Kitano H (2000) Humanoid active audition system improved by the cover acoustics. In: PRICAI 2000 topics in artificial intelligence (sixth pacific rim international conference on artificial intelligence). Springer lecture notes in artificial intelligence, vol. 1886. Springer, Berlin, pp 544–554
Google Scholar
Ito A, Kanayama T, Suzuki M, Makino S (2005) Internal noise suppression for speech recognition by small robots. In: Proceedings of the interspeech, pp 2685–2688
Google Scholar
Ince G, Nakadai K, Rodemann T, Hasegawa Y, Tsujino H, Imura J (2009) Ego noise suppression of a robot using template subtraction. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 199–204
Google Scholar
Yamamoto S, Nakadai K, Nakano M, Tsujino H, Valin J-M, Komatani K, Ogata T, Okuno HG (2006) Real-time robot audition system that recognizes simultaneous speech in the real world. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 5333–5338
Google Scholar
Valin J-M, Rouat J, Michaud F (2004) Enhanced robot audition based on microphone array source separation with post-filter. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2123–2128
Google Scholar
Even J, Sawada H, Saruwatari H, Shikano K, Takatani T (2009) Semi-blind suppression of internal noise for hands-free robot spoken dialog system. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 659–663
Google Scholar
Mizumachi M, Nakamura S (2004) Passive subtractive beamformer for near-field sound sources. In: Proceedings of the IEEE sensor array and multichannel signal processing workshop, pp 74–78
Chapter Google Scholar
Zheng YR, Goubran RA, El-Tanany M (2003) A nested sensor array focusing on near field targets. In: Proceedings of the IEEE sensors, vol 2, pp 843–848
Chapter Google Scholar
Raj B, Stern RM (2005) Missing-feature approaches in speech recognition. IEEE Signal Process Mag 22:101–116
Article Google Scholar
Takahashi T, Yamamoto S, Nakadai K, Komatani K, Ogata T, Okuno HG (2008) Soft missing-feature mask generation for simultaneous speech recognition system in robots. In: Proceedings of the interspeech, pp 992–997
Google Scholar
Nishimura Y, Ishizuka M, Nakadai K, Nakano M, Tsujino H (2006) Speech recognition for a robot under its motor noises by selective application of missing feature theory and MLLR. In: Proceedings of the IEEE-RAS international conference on humanoid robots, pp 26–33
Chapter Google Scholar
Parra LC, Alvino CV (2002) Geometric source separation: merging convolutive source separation with geometric beamforming. IEEE Trans Speech Audio Process 10(6):352–362
Article Google Scholar
Schmidt R (1986) Multiple emitter location and signal parameter estimation. IEEE Trans Antennas Propag 34(3):276–280
Article Google Scholar
Nakajima H, Nakadai K, Hasegawa Y, Tsujino H (2008) Adaptive step-size parameter control for real-world blind source separation. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 149–152
Google Scholar
Nakadai K, Nakajima H, Hasegawa Y, Tsujino H (2009) Sound source separation of moving speakers for robot audition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3685–3688
Google Scholar
Ephraim Y, Malah D (1984) Speech enhancement using minimum mean-square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121
Article Google Scholar
Cohen I, Berdugo B (2002) Microphone array post-filtering for non-stationary noise suppression. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 901–904
Google Scholar
Nishimura Y, Shinozaki T, Iwano K, Furui S (2004) Noise-robust speech recognition using multi-band spectral features. In: Proceedings of the 148th acoustical society of America meetings 1aSC7
Google Scholar
Nakadai K, Takahashi T, Okuno H, Nakajima H, Hasegawa Y, Tsujino H (2010) Design and implementation of robot audition system “HARK”—open source software for listening to three simultaneous speakers. Adv Robot 24:739–761
Article Google Scholar

Download references

Author information

Authors and Affiliations

Honda Research Institute Japan Co., Ltd., 8-1 Honcho, Wako-shi, Saitama, 351-0188, Japan
Gökhan Ince, Kazuhiro Nakadai & Hiroshi Tsujino
Honda Research Institute Europe GmbH, Carl-Legien Strasse 30, 63073, Offenbach, Germany
Tobias Rodemann
Dept. of Mechanical and Environmental Informatics, Tokyo Institute of Technology, 2-12-1-W8-1, O-okayama, Meguro-ku, Tokyo, 152-8552, Japan
Gökhan Ince, Kazuhiro Nakadai & Jun-ichi Imura

Authors

Gökhan Ince
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Nakadai
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Rodemann
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Tsujino
View author publications
You can also search for this author in PubMed Google Scholar
Jun-ichi Imura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gökhan Ince.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ince, G., Nakadai, K., Rodemann, T. et al. Ego noise cancellation of a robot using missing feature masks. Appl Intell 34, 360–371 (2011). https://doi.org/10.1007/s10489-011-0285-0

Download citation

Published: 29 March 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10489-011-0285-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ego noise cancellation of a robot using missing feature masks

Abstract

Access this article

Similar content being viewed by others

Challenges in Adopting Speech Control for Assistive Robots

FPGA-Based Robust Wireless Speech Motion Control for Home Service Robot Subject to Environmental Noises

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ego noise cancellation of a robot using missing feature masks

Abstract

Access this article

Similar content being viewed by others

Challenges in Adopting Speech Control for Assistive Robots

FPGA-Based Robust Wireless Speech Motion Control for Home Service Robot Subject to Environmental Noises

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation