Skip to main content

An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR’06 Evaluation Dataset

  • Conference paper
Multimodal Technologies for Perception of Humans (CLEAR 2006)

Abstract

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR’06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. CLEAR 2006 Evaluation and Workshop Campaign, April 6-7, Southampton, UK (2006), http://clear-evaluation.org

  2. Brandstein, M.S.: A framework for speech source localization using sensor arrays. PhD thesis, Brown University, Providence, RI (May 1995)

    Google Scholar 

  3. Brandstein, M.S., Adcock, J.E., Silverman, H.F.: A closed-form location estimator for use with room environment microphone arrays. IEEE Trans. Speech Audio Proc. 5(1), 45–50 (1997)

    Article  Google Scholar 

  4. Checka, N., Wilson, K., Rangarajan, V., Darrell, T.: A probabilistic framework for multi-modal multi-person tracking. In: IEEE Workshop on Multi-Object Tracking (in conjunction with CVPR) (2003)

    Google Scholar 

  5. Chen, J., Benesty, J., Huang, Y.A.: Robust time delay estimation exploiting redundancy among multiple microphones. IEEE Trans. Speech Audio Proc. 11(6), 549–557 (2003)

    Article  Google Scholar 

  6. Gatica-Perez, D., Lathoud, G., McCowan, I., Odobez, J.-M.: A mixed-state i-particle filter for multi-camera speaker tracking. In: Proc. IEEE ICCV Workshop on Multimedia Technologies in E-Learning and Collaboration (ICCV-WOMTEC) (2003)

    Google Scholar 

  7. Huang, Y., Benesty, J., Elko, G.W., Mersereau, R.M.: Real-time passive source localization: A practical linear-correction least-squares approach. IEEE Trans. Speech Audio Proc. 9(8), 943–956 (2001)

    Article  Google Scholar 

  8. Isard, M., Blake, A.: Condensation–conditional density propagation for visual tracking. International Journal of Computer Vision 29(1), 5–28 (1998)

    Article  Google Scholar 

  9. Gehrig, T., Nickel, K., Ekenel, H.K., Klee, U., McDonough, J.: Kalman filters for audio-video source localization. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (Oct. 2005)

    Google Scholar 

  10. Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. EURASIP Special Issue on Multichannel Speech Processing, submitted for publication

    Google Scholar 

  11. Kruppa, H., Castrillon-Santana, M., Schiele, B.: Fast and robust face finding via local context. In: IEEE Intl. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (Oct. 2003)

    Google Scholar 

  12. Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In: ICIP, vol. 1, pp. 900–903 (Sept. 2002)

    Google Scholar 

  13. Mikic, I., Santini, S., Jain, R.: Tracking objects in 3d using multiple camera views. In: ACCV (2000)

    Google Scholar 

  14. Omologo, M., Svaizer, P.: Acoustic event localization using a crosspower-spectrum phase based technique. Proc. ICASSP 2, 273–276 (1994)

    Google Scholar 

  15. Vermaak, J., Gangnet, M., Blake, A., Pérez, P.: Sequential monte carlo fusion of sound and vision for speaker tracking. Proc. IEEE Intl. Conf. on Computer Vision 1, 741–746 (2001)

    Google Scholar 

  16. Viola, P., Jones, M.: Robust real-time object detection. In: ICCV Workshop on Statistical and Computation Theories of Vision (July 2001)

    Google Scholar 

  17. Ward, D.B., Lehmann, E.A., Williamson, R.C.: Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Trans. Speech Audio Proc. 11(6), 826–836 (2003)

    Article  Google Scholar 

  18. Wölfel, M., Nickel, K., McDonough, J.: Microphone Array Driven Speech Recognition: Influence of Localization on the Word Error Rate. In: 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, Edinburgh, 11-13 July (2005)

    Google Scholar 

  19. Zotkin, D., Duraiswami, R., Davis, L.: Joint audio-visual tracking using particle filters. EURASIP journal on Applied Signal Processing 11 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen John Garofolo

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Nickel, K., Gehrig, T., Ekenel, H.K., McDonough, J., Stiefelhagen, R. (2007). An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR’06 Evaluation Dataset. In: Stiefelhagen, R., Garofolo, J. (eds) Multimodal Technologies for Perception of Humans. CLEAR 2006. Lecture Notes in Computer Science, vol 4122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69568-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69568-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69567-7

  • Online ISBN: 978-3-540-69568-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics