Skip to main content

Multispeaker Localization and Tracking in Intelligent Environments

  • Conference paper
Book cover Multimodal Technologies for Perception of Humans (RT 2007, CLEAR 2007)

Abstract

Automatic speaker localization is an important task in several applications such as acoustic scene analysis, hands-free videoconferencing or speech enhancement. Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this work, we present the acoustic Person Tracking system developed at the UPC for the CLEAR’07 evaluation campaign. The designed system is able to track the estimated position of multiple speakers in a smart-room environment. Preliminary speaker locations are provided by the SRP-PHAT algorithm, which is known to perform robustly in most scenarios. Data association techniques based on trajectory prediction and spatizal clustering are used to match the raw positional estimates with potential speakers. These positional measurements are then finally spatially smoothed by means of Kalman filtering. Besides the technology description, experimental results obtained on the CLEAR’07 CHIL database are also reported.

This work has been partially sponsored by the EC-funded project CHIL (IST-2002-506909) and by the Spanish Government-funded project ACESCA (TIN2005-08852).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Omologo, M., Svaizer, P.: Use of the crosspower-spectrum phase in acoustic event location. IEEE Trans. on Speech and Audio Processing (1997)

    Google Scholar 

  2. Chen, J., Huang, Y.A., Benesty, J.: An adaptive blind SIMO identification approach to joint multichannel time delay estimation. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Montreal (May 2004)

    Google Scholar 

  3. Potamitis, I., Tremoulis, G., Fakotakis, N.: Multi-speaker doa tracking using interactive multiple models and probabilistic data association. In: Proceedings of Eurospeech 2003, Geneva (September 2003)

    Google Scholar 

  4. Sturim, D.E., Brandstein, M.S., Silverman, H.F.: Tracking multiple talkers using microphone-array measurements. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Munich (April 1997)

    Google Scholar 

  5. DiBiase, J., Silverman, H., Brandstein, M.: Microphone Arrays, ch. 8. In: Robust Localization in Reverberant Rooms, Springer, Heidelberg (2001)

    Google Scholar 

  6. CHIL Computers In the Human Interaction Loop. Integrated Project of the 6th European Framework Programme (506909) (2004-2007), http://chil.server.de/

  7. The Spring 2007 CLEAR Evaluation and Workshop, http://www.clear-evaluation.org/

  8. Brandstein, M.S.: A Framework for Speech Source Localization Using Sensor Arrays. Ph.D. Thesis, Brown University (1995)

    Google Scholar 

  9. Bernardin, K., Gehring, T., Stiefelhagen, R.: Multi- and Single View Multiperson Tracking for Smart Room Environments. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Vermaak, J., Blake, A.: Nonlinear filtering for speaker tracking in noisy and reverberant environments. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP) (2001)

    Google Scholar 

  11. Claudio, E., Parisi, R.: Multi-source localization strategies. In: Brandstein, M.S., Ward, D.B. (eds.) Microphone Arrays: Signal Processing Techniques and Applications, ch. 9, pp. 181–201. Springer, Heidelberg (2001)

    Google Scholar 

  12. Welch, G., Bishop, G.: An introduction to the Kalman filter. TR 95-041, Dept. of Computer Sc., Uni. of NC at Chapel Hill (2004)

    Google Scholar 

  13. Checka, N., Wilson, K., Siracusa, M., Darrell, T.: Multiple person and speaker activity tracking with a particle filter. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Montreal (May 2004)

    Google Scholar 

  14. Brandstein, M.S., Adcock, J.E., Silverman, H.F.: Microphone array localization error estimation with application to optimal sensor placement. J. Acoust. Soc. Am. 99(6), 3807–3816 (1996)

    Article  Google Scholar 

  15. Bar-Shalom, Y., Fortman, T.E.: Tracking and Data association. Academic Press, London (1988)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Segura, C., Abad, A., Hernando, J., Nadeu, C. (2008). Multispeaker Localization and Tracking in Intelligent Environments. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68585-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68584-5

  • Online ISBN: 978-3-540-68585-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics