Relative Transfer Function Estimation from Speech Keywords

Corey, Ryan M.; Singer, Andrew C.

doi:10.1007/978-3-319-93764-9_23

Ryan M. Corey¹⁸ &
Andrew C. Singer¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10891))

Included in the following conference series:

International Conference on Latent Variable Analysis and Signal Separation

1765 Accesses
1 Citations

Abstract

Far-field speech capture systems rely on microphone arrays to spatially filter sound, attenuating unwanted interference and noise and enhancing a speech signal of interest. To design effective spatial filters, we must first estimate the acoustic transfer functions between the source and the microphones. It is difficult to estimate these transfer functions if the source signals are unknown. However, in systems that are activated by a particular speech phrase, we can use that phrase as a pilot signal to estimate the relative transfer functions. Here, we propose a method to estimate relative transfer functions from known speech phrases in the presence of background noise and interference using template matching and time-frequency masking. We find that the proposed method can outperform conventional estimation techniques, but its performance depends on the characteristics of the speech phrase.

Parts of this research were completed at Arm Research. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant Number DGE-1144245.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gannot, S., Vincent, E., Markovich-Golan, S., Ozerov, A.: A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 692–730 (2017)
Article Google Scholar
Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., Matassoni, M.: The second ‘CHiME’ speech separation and recognition challenge: datasets, tasks and baselines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 126–130 (2013)
Google Scholar
Doclo, S., Kellermann, W., Makino, S., Nordholm, S.E.: Multichannel signal enhancement algorithms for assisted listening devices. IEEE Sign. Process. Mag. 32(2), 18–30 (2015)
Article Google Scholar
Gannot, S., Burshtein, D., Weinstein, E.: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans. Sign. Process. 49(8), 1614–1626 (2001)
Article Google Scholar
Cohen, I.: Relative transfer function identification using speech signals. IEEE Trans. Speech Audio Process. 12(5), 451–459 (2004)
Article Google Scholar
Markovich, S., Gannot, S., Cohen, I.: Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans. Audio Speech Lang. Process. 17(6), 1071–1086 (2009)
Article Google Scholar
Makino, S., Lee, T.W., Sawada, H.: Blind Speech Separation. vol. 615. Springer, New York (2007)
Google Scholar
Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: Convolutive blind source separation methods. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 1065–1094. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_52
Chapter Google Scholar
Corey, R., Singer, A.: Real-world evaluation of multichannel audio enhancement using acoustic pilot signals. In: Asilomar Conference on Signals, Systens, and Computers. (2017)
Google Scholar
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091 (2014)
Google Scholar
Van Trees, H.: Optimum Array Processing. Wiley, New York (2004)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics Speech Sign. Process. 26(1), 43–49 (1978)
Article Google Scholar
Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Sign. Process. 52(7), 1830–1847 (2004)
Article MathSciNet Google Scholar
Araki, S., Okada, M., Higuchi, T., Ogawa, A., Nakatani, T.: Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 385–389 (2016)
Google Scholar
Koldovskỳ, Z., Málek, J., Gannot, S.: Spatial source subtraction based on incomplete measurements of relative transfer function. IEEE Trans. Audio Speech Lang. Process. 23(8), 1335–1347 (2015)
Article Google Scholar
Warden, P.: Speech commands: a public dataset for single-word speech recognition (2017). Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D.: TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Web Download (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Ryan M. Corey & Andrew C. Singer

Authors

Ryan M. Corey
View author publications
You can also search for this author in PubMed Google Scholar
Andrew C. Singer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryan M. Corey .

Editor information

Editors and Affiliations

Paul Sabatier University, Toulouse, France
Yannick Deville
Bar-Ilan University, Ramat Gan, Israel
Sharon Gannot
University of Surrey, Guildford, United Kingdom
Russell Mason
University of Surrey, Guildford, United Kingdom
Mark D. Plumbley
University of Surrey, Guildford, United Kingdom
Dominic Ward

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Corey, R.M., Singer, A.C. (2018). Relative Transfer Function Estimation from Speech Keywords. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-93764-9_23
Published: 06 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93763-2
Online ISBN: 978-3-319-93764-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics