Skip to main content

Signal Processing for Audio HCI

  • Chapter
  • First Online:
  • 2868 Accesses

Abstract

This chapter reviews recent advances in computer audio processing from the viewpoint of improving the human-computer interface. Microphone arrays are described as basic tools for untethered audio acquisition, and principles for the synthesis of realistic virtual audio are outlined. The influence of room acoustics on audio acquisition and production is also considered. The chapter finishes with a review of several relevant signal processing systems, including a fast head-related transfer function (HRTF) measurement system and a complete system for capture, visualization, and reproduction of auditory scenes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Brandstein and D. Ward (2001). “Microphone arrays: signal processing techniques and applications”, Springer, New York, NY.

    Google Scholar 

  2. J. Chen, J. Benesty, and Y. Huang (2006). “Time delay estimation in room acoustic environments: An overview”, EURASIP Journal on Applied Signal Processing, vol. 2006, no. 1.

    Google Scholar 

  3. M. S. Brandstein and H. F. Silverman (1997). “A robust method for speech signal time-delay estimation in reverberant rooms”, Proc. IEEE ICASSP 1997, Munich, Germany, pp. 375–378.

    Google Scholar 

  4. A. G. Piersol (1981). “Time delay estimation using phase data”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp. 471–477.

    Article  Google Scholar 

  5. B. Yegnanarayana, S. R. M. Prasanna, R. Duraiswami, and D. N. Zotkin (2005). ”Processing of reverberant speech for time-delay estimation”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 6, pp. 1110–1118.

    Article  Google Scholar 

  6. J. Dmochowski, J. Benesty, and S. Affes (2007). “Direction of arrival estimation using the parameterized spatial correlation matrix”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1327–1339.

    Article  Google Scholar 

  7. H. Wang and M. Kaveh (1985). “Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 33, no. 4, pp. 823–831.

    Article  Google Scholar 

  8. D. B. Ward and R. C. Williamson (2002). “Particle filter beamforming for acoustic source localization in a reverberant environment”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1777–1780.

    Google Scholar 

  9. D. N. Zotkin and R. Duraiswami (2004). ”Accelerated speech source localization via a hierarchical search of steered response power”, IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 499–508.

    Google Scholar 

  10. M. Wax and T. Kailath (1983). “Optimum localization of multiple sources by passive arrays”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 5, pp. 1210–1217.

    Article  MathSciNet  Google Scholar 

  11. M. F. Berger and H. F. Silverman (1991). “Microphone array optimization by stochastic region contraction”, IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2377–2386.

    Article  Google Scholar 

  12. B. D. van Veen and K. B. Buckley (1988). “Beamforming: A versatile approach to spatial filtering”, IEEE ASSP Magazine, vol. 5, no. 2, pp. 4–24.

    Article  Google Scholar 

  13. B. Rafaely (2005). “Analysis and design of spherical microphone arrays”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 1, pp. 135–143.

    Article  Google Scholar 

  14. C. Kyriakakis, P. Tsakalides, and T. Holman (1999). “Surrounded by sound: Immersive audio acquisition and rendering methods”, IEEE Signal Processing Magazine, vol. 16, no. 1, pp. 55–66.

    Article  Google Scholar 

  15. V. Pulkki (2002). “Compensating displacement of amplitude-panned virtual sources”, Proc. 22th AES Conference, Espoo, Finland, pp. 186–195.

    Google Scholar 

  16. D. N. Zotkin, R. Duraiswami, and L. S. Davis (2004). ”Rendering localized spatial audio in a virtual auditory space”, IEEE Transactions on Multimedia, vol. 6, no. 4, pp. 553–564.

    Article  Google Scholar 

  17. W. M. Hartmann (1999). “How we localize sound”, Physics Today, November 1999, pp. 24–29.

    Google Scholar 

  18. E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman (1993). “Localization using nonindividualized head-related transfer functions”, Journal of the Acoustical Society of America, vol. 94, no. 1, pp. 111–123.

    Article  Google Scholar 

  19. C. Jin, P. Leong, J. Leung, A. Corderoy, and S. Carlile (2000). “Enabling individualized virtual auditory space using morphological measurements”, Proceedings of the First IEEE Pacific- Rim Conference on Multimedia (2000 International Symposium on Multimedia Information Processing), pp. 235–238.

    Google Scholar 

  20. P. Runkle, A. Yendiki, and G. Wakefield (2000). “Active sensory tuning for immersive spatialized audio”, Proc. ICAD 2000, Atlanta, GA.

    Google Scholar 

  21. T. Xiao and Q.-H. Liu (2003). “Finite difference computation of head-related transfer function for human hearing”, Journal of the Acoustical Society of America, vol. 113, no. 5, pp. 2434–2441.

    Article  Google Scholar 

  22. M. Otani and S. Ise (2006). “Fast calculation system specialized for head-related transfer function based on boundary element method”, Journal of the Acoustical Society of America, vol. 119, no. 5, pp. 2589–2598.

    Article  Google Scholar 

  23. N. A. Gumerov and R. Duraiswami (2009). “A broadband fast multipole accelerated boundary element method for the 3D Helmholtz equation”, Journal of the Acoustical Society of America, vol. 125, no. 1, pp. 191–205.

    Article  Google Scholar 

  24. N. A. Gumerov, A. O’Donovan, R. Duraiswami, and D. N. Zotkin (2010). “Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation”, Journal of the Acoustical Society of America, vol. 127, no. 1, pp. 370–386.

    Article  Google Scholar 

  25. R. Duraiswami, D. N. Zotkin, and N. A. Gumerov (2007). ”Fast evaluation of the room transfer function using multipole expansion”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp, 565–576.

    Article  Google Scholar 

  26. J. B. Allen and D. A. Berkeley (1979). “Image method for efficiently simulating small-room acoustics”, Journal of the Acoustical Society of America, vol. 65, no, 4, pp. 943–950.

    Article  Google Scholar 

  27. N. A. Gumerov and R. Duraiswami (2004). “Fast multipole methods for the Helmholtz equation in three dimensions”, Elsevier Science, Amsterdam, The Netherlands.

    Google Scholar 

  28. N. F. Dixon and L. Spitz (1980). “The detection of auditory visual desynchrony”, Perception, vol. 9, no. 6, pp. 719–721.

    Article  Google Scholar 

  29. V. R. Algazi, R. O. Duda, D. P. Thompson, and C. Avendano (2001). “The CIPIC HRTF database”, Proc. IEEE WASPAA 2001, New Paltz, NY, pp. 99–102.

    Google Scholar 

  30. E. Grassi, J. Tulsi, and S. A. Shamma (2003). “Measurement of head-related transfer functions based on the empirical transfer function estimate”, Proc ICAD 2003, Boston, MA.

    Google Scholar 

  31. D. N. Zotkin, R. Duraiswami, E. Grassi, and N. A. Gumerov (2006). ”Fast head-related transfer function measurement via reciprocity”, Journal of the Acoustical Society of America, vol. 120, no. 4, pp. 2202–2215.

    Article  Google Scholar 

  32. P. M. Morse and K. U. Ingard (1968). “Theoretical Acoustics”, Princeton University Press, New Jersey.

    Google Scholar 

  33. V. R. Algazi, R. O. Duda, and D.M. Thompson (2002). “The use of head-and-torso models for improved spatial sound synthesis”, Proc. 113th AES convention, Los Angeles, CA, preprint #5712.

    Google Scholar 

  34. A. E. O’Donovan, D. N. Zotkin, and R. Duraiswami (2008). “Spherical microphone array based immersive audio scene rendering”, Proc. ICAD 2008, Paris, France.

    Google Scholar 

  35. J. Meyer and G. Elko (2002). “A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1781–1784.

    Google Scholar 

  36. D. N. Zotkin, R. Duraiswami, and N. A. Gumerov (2010). “Plane-wave decomposition of acoustical scenes via spherical and cylindrical microphone arrays”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 2–16.

    Article  Google Scholar 

  37. H. Teutsch (2007). “Modal array signal processing: principles and applications of acoustic wavefield decomposition”, Springer-Verlag, Berlin, Germany.

    MATH  Google Scholar 

  38. M. Park and B. Rafaely (2005). “Sound-field analysis by plane-wave decomposition using spherical microphone array”, Journal of the Acoustical Society of America, vol. 118, no. 5, pp. 3094–3103.

    Article  Google Scholar 

  39. Z. Li and R. Duraiswami (2007). “Flexible and optimal design of spherical microphone arrays for beamforming”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 702–714.

    Article  Google Scholar 

  40. R. Duraiswami, D. N. Zotkin, Z. Li, E. Grassi, N. A. Gumerov, and L. S. Davis (2005). ”High order spatial audio capture and its binaural head-tracked playback over headphones with HRTF cues”, Proc. 119th AES convention, New York, NY, preprint #6540.

    Google Scholar 

  41. A. E. O’Donovan, R. Duraiswami, and N. A. Gumerov (2007). “Real time capture of audio images and their use with video”, Proc. IEEE WASPAA 2007, New Paltz, NY, pp. 10-–13.

    Google Scholar 

  42. A. E. O’Donovan, R. Duraiswami, and D. N. Zotkin (2008). “Imaging concert hall acoustics using visual and audio cameras”, Proc. IEEE ICASSP 2008, Las Vegas, NV, April 2008, pp. 5284–5287.

    Google Scholar 

  43. A. E. O’Donovan, R. Duraiswami, and J. Neumann (2007). “Microphone arrays as generalized cameras for integrated audio-visual processing”, Proc. IEEE CVPR 2007, Minneapolis, MN.

    Google Scholar 

  44. NVIDIA, NVIDIA CUDA Programming Guide 2.3, 2009.

    Google Scholar 

  45. http://www.gpgpu.org/ - General-Purpose Computation on GPU.

  46. J. D. Owens et al. (2008). “GPU computing”, Proceedings of the IEEE, vol. 96, no. 5, pp. 879–899.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry N. Zotkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Zotkin, D.N., Duraiswami, R. (2010). Signal Processing for Audio HCI. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6345-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6345-1_10

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6344-4

  • Online ISBN: 978-1-4419-6345-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics