Skip to main content

Relative Transfer Function Estimation from Speech Keywords

  • Conference paper
  • First Online:
Latent Variable Analysis and Signal Separation (LVA/ICA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10891))

Abstract

Far-field speech capture systems rely on microphone arrays to spatially filter sound, attenuating unwanted interference and noise and enhancing a speech signal of interest. To design effective spatial filters, we must first estimate the acoustic transfer functions between the source and the microphones. It is difficult to estimate these transfer functions if the source signals are unknown. However, in systems that are activated by a particular speech phrase, we can use that phrase as a pilot signal to estimate the relative transfer functions. Here, we propose a method to estimate relative transfer functions from known speech phrases in the presence of background noise and interference using template matching and time-frequency masking. We find that the proposed method can outperform conventional estimation techniques, but its performance depends on the characteristics of the speech phrase.

Parts of this research were completed at Arm Research. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant Number DGE-1144245.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gannot, S., Vincent, E., Markovich-Golan, S., Ozerov, A.: A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 692–730 (2017)

    Article  Google Scholar 

  2. Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., Matassoni, M.: The second ‘CHiME’ speech separation and recognition challenge: datasets, tasks and baselines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 126–130 (2013)

    Google Scholar 

  3. Doclo, S., Kellermann, W., Makino, S., Nordholm, S.E.: Multichannel signal enhancement algorithms for assisted listening devices. IEEE Sign. Process. Mag. 32(2), 18–30 (2015)

    Article  Google Scholar 

  4. Gannot, S., Burshtein, D., Weinstein, E.: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans. Sign. Process. 49(8), 1614–1626 (2001)

    Article  Google Scholar 

  5. Cohen, I.: Relative transfer function identification using speech signals. IEEE Trans. Speech Audio Process. 12(5), 451–459 (2004)

    Article  Google Scholar 

  6. Markovich, S., Gannot, S., Cohen, I.: Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans. Audio Speech Lang. Process. 17(6), 1071–1086 (2009)

    Article  Google Scholar 

  7. Makino, S., Lee, T.W., Sawada, H.: Blind Speech Separation. vol. 615. Springer, New York (2007)

    Google Scholar 

  8. Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: Convolutive blind source separation methods. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 1065–1094. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_52

    Chapter  Google Scholar 

  9. Corey, R., Singer, A.: Real-world evaluation of multichannel audio enhancement using acoustic pilot signals. In: Asilomar Conference on Signals, Systens, and Computers. (2017)

    Google Scholar 

  10. Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091 (2014)

    Google Scholar 

  11. Van Trees, H.: Optimum Array Processing. Wiley, New York (2004)

    Google Scholar 

  12. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics Speech Sign. Process. 26(1), 43–49 (1978)

    Article  Google Scholar 

  13. Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Sign. Process. 52(7), 1830–1847 (2004)

    Article  MathSciNet  Google Scholar 

  14. Araki, S., Okada, M., Higuchi, T., Ogawa, A., Nakatani, T.: Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 385–389 (2016)

    Google Scholar 

  15. Koldovskỳ, Z., Málek, J., Gannot, S.: Spatial source subtraction based on incomplete measurements of relative transfer function. IEEE Trans. Audio Speech Lang. Process. 23(8), 1335–1347 (2015)

    Article  Google Scholar 

  16. Warden, P.: Speech commands: a public dataset for single-word speech recognition (2017). Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

  17. Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D.: TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Web Download (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan M. Corey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Corey, R.M., Singer, A.C. (2018). Relative Transfer Function Estimation from Speech Keywords. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93764-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93763-2

  • Online ISBN: 978-3-319-93764-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics