Skip to main content

Tracking Multiple Speakers with Probabilistic Data Association Filters

  • Conference paper
Book cover Multimodal Technologies for Perception of Humans (CLEAR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4122))

Abstract

In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. In particular, the TDOAs comprised the observation associated with an iterated extended Kalman filter (IEKF) whose state corresponds to the speaker position. In other work, we followed the same approach to develop a system that could use both audio and video information to track a moving lecturer. While these systems functioned well, their utility was limited to scenarios in which a single speaker was to be tracked. In this work, we seek to remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, we demonstrate that the JPDAF provides tracking performance superior to the IEKF.

This work was sponsored by the European Union under the integrated project CHIL, Computers in the Human Interaction Loop, contract number 506909.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Omologo, M., Svaizer, P.: Acoustic event localization using a crosspower-spectrum phase based technique. In: Proc. ICASSP, vol. 2, pp. 273–276 (1994)

    Google Scholar 

  2. Kay, S.: Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Englewood Cliffs (1993)

    MATH  Google Scholar 

  3. Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. Journal of Advanced Signal Processing, Special Issue on Multi-Channel Speech Processing (to appear)

    Google Scholar 

  4. Brandstein, M.S., Adcock, J.E., Silverman, H.F.: A closed-form location estimator for use with room environment microphone arrays. IEEE Trans. Speech Audio Proc. 5(1), 45–50 (1997)

    Article  Google Scholar 

  5. Gehrig, T., Nickel, K., Ekenel, H.K., Klee, U., McDonough, J.: Kalman filters for audio-video source localization. In: Proc. Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York (2005)

    Google Scholar 

  6. Strobel, N., Spors, S., Rabenstein, R.: Joint audio-video signal processing for object localization and tracking. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays, Springer, Heidelberg (2001)

    Google Scholar 

  7. Welch, G., Bishop, G.: SCAAT: Incremental tracking with incomplete information. In: Proc. Computer Graphics and Interactive Techniques (Aug. (1997)

    Google Scholar 

  8. Gennari, G., Hager, G.D.: Probabilistic data association methods in the visual tracking of groups. In: Proc. CVPR, pp. 1063–1069 (2004)

    Google Scholar 

  9. Bechler, D.: Akustische Sprecherlokalisation mit Hilfe eines Mikrofonarrays. Ph.D. dissertation, Universität Karlsruhe, Karlsruhe, Germany (2006)

    Google Scholar 

  10. Bar-Shalom, Y., Fortmann, T.E.: Tracking and Data Association. Academic Press, San Diego (1988)

    MATH  Google Scholar 

  11. Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proc. ICASSP, pp. I–605–608 (2004)

    Google Scholar 

  12. Chen, J., Benesty, J., Huang, Y.A.: Robust time delay estimation exploiting redundancy among multiple microphones. IEEE Trans. Speech Audio Proc. 11(6), 549–557 (2003)

    Article  Google Scholar 

  13. Jazwinski, A.H.: Stochastic Processes and Filtering Theory. Academic Press, New York (1970)

    MATH  Google Scholar 

  14. Armani, L., Matassoni, M., Omologo, M., Svaizer, P.: Use of a CSP-based voice activity detector for distant-talking ASR. In: Proc. Eurospeech, vol. 2, pp. 501–504 (2003)

    Google Scholar 

  15. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Analysis Machine Intel. 22, 1330–1334 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen John Garofolo

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Gehrig, T., McDonough, J. (2007). Tracking Multiple Speakers with Probabilistic Data Association Filters. In: Stiefelhagen, R., Garofolo, J. (eds) Multimodal Technologies for Perception of Humans. CLEAR 2006. Lecture Notes in Computer Science, vol 4122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69568-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69568-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69567-7

  • Online ISBN: 978-3-540-69568-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics