Abstract
Automatic speaker localization is an important task in several applications such as acoustic scene analysis, hands-free videoconferencing or speech enhancement. Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this work, we present the acoustic Person Tracking system developed at the UPC for the CLEAR’07 evaluation campaign. The designed system is able to track the estimated position of multiple speakers in a smart-room environment. Preliminary speaker locations are provided by the SRP-PHAT algorithm, which is known to perform robustly in most scenarios. Data association techniques based on trajectory prediction and spatizal clustering are used to match the raw positional estimates with potential speakers. These positional measurements are then finally spatially smoothed by means of Kalman filtering. Besides the technology description, experimental results obtained on the CLEAR’07 CHIL database are also reported.
This work has been partially sponsored by the EC-funded project CHIL (IST-2002-506909) and by the Spanish Government-funded project ACESCA (TIN2005-08852).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Omologo, M., Svaizer, P.: Use of the crosspower-spectrum phase in acoustic event location. IEEE Trans. on Speech and Audio Processing (1997)
Chen, J., Huang, Y.A., Benesty, J.: An adaptive blind SIMO identification approach to joint multichannel time delay estimation. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Montreal (May 2004)
Potamitis, I., Tremoulis, G., Fakotakis, N.: Multi-speaker doa tracking using interactive multiple models and probabilistic data association. In: Proceedings of Eurospeech 2003, Geneva (September 2003)
Sturim, D.E., Brandstein, M.S., Silverman, H.F.: Tracking multiple talkers using microphone-array measurements. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Munich (April 1997)
DiBiase, J., Silverman, H., Brandstein, M.: Microphone Arrays, ch. 8. In: Robust Localization in Reverberant Rooms, Springer, Heidelberg (2001)
CHIL Computers In the Human Interaction Loop. Integrated Project of the 6th European Framework Programme (506909) (2004-2007), http://chil.server.de/
The Spring 2007 CLEAR Evaluation and Workshop, http://www.clear-evaluation.org/
Brandstein, M.S.: A Framework for Speech Source Localization Using Sensor Arrays. Ph.D. Thesis, Brown University (1995)
Bernardin, K., Gehring, T., Stiefelhagen, R.: Multi- and Single View Multiperson Tracking for Smart Room Environments. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, Springer, Heidelberg (2007)
Vermaak, J., Blake, A.: Nonlinear filtering for speaker tracking in noisy and reverberant environments. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP) (2001)
Claudio, E., Parisi, R.: Multi-source localization strategies. In: Brandstein, M.S., Ward, D.B. (eds.) Microphone Arrays: Signal Processing Techniques and Applications, ch. 9, pp. 181–201. Springer, Heidelberg (2001)
Welch, G., Bishop, G.: An introduction to the Kalman filter. TR 95-041, Dept. of Computer Sc., Uni. of NC at Chapel Hill (2004)
Checka, N., Wilson, K., Siracusa, M., Darrell, T.: Multiple person and speaker activity tracking with a particle filter. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Montreal (May 2004)
Brandstein, M.S., Adcock, J.E., Silverman, H.F.: Microphone array localization error estimation with application to optimal sensor placement. J. Acoust. Soc. Am. 99(6), 3807–3816 (1996)
Bar-Shalom, Y., Fortman, T.E.: Tracking and Data association. Academic Press, London (1988)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Segura, C., Abad, A., Hernando, J., Nadeu, C. (2008). Multispeaker Localization and Tracking in Intelligent Environments. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)