Abstract
This work addresses the problem of automatic speaker localization and tracking in a real lecture scenario. Evaluation criteria recently adopted under CHIL and NIST benchmarking are outlined. Two speaker localization systems are described, which are based on the use of Generalized Cross Correlation Phase Transform analysis and Global Coherence Field. Benchmarking results, obtained on a set of 13 lectures, showed an average RMS error of about 30 cm in the speaker localization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brandstein, M., Ward, D.: Microphone Arrays. Springer, Heidelberg (2001)
Knapp, C.H., Carter, C.: The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. on ASSP 24, 320–327 (1976)
Omologo, M., Svaizer, P.: Acoustic Event Localization using a Crosspower-Spectrum Phase based Techniques. Proc. IEEE ICASSP 2, 273–276 (Adelaide 1994)
De Mori, R.: Spoken Dialogues with Computers, ch. 2. Academic Press, London (1998)
Rabinkin, D.V., Ranomeron, R.J., French, J.C., Flanagan, J.L.: A DSP Implementation of Source Location using Microphone Arrays. In: Proc. of SPIE, vol. 2846 (1996)
Wang, H., Chu, P.: Voice Source Localization for Automatic Camera Pointing System in Videoconferencing. In: Proc. of ICASSP (1997)
Huang, Y.A., Benesty, J., Elko, G.W.: Microphone Arrays for Video Camera Steering. In: Gay, S.L., Benesty, J. (eds.) Acoustic Signal Processing for Telecommunication. Kluwer Academic Publishers, Dordrecht (2000)
Silverman, H.F., et al.: Performance of Real-Time Source Location Estimators for a Large-Aperture Microphone Array. IEEE Trans. on SAPÂ 13(4) (2005)
Van Trees, H.L.: Optimum Array Processing-Part IV. John Wiley & Sons, Chichester (2002)
Omologo, M., Svaizer, P.: Use of the Crosspower-Spectrum Phase in Acoustic Event Location. IEEE Trans. on SAP 5(3), 288–292 (May 1997)
Omologo, M., Svaizer, P.: Acoustic Source Localization in Noisy and Reverberant Environment using CSP Analysis. In: Proc. IEEE ICASSP (1996)
Chen, J., Benesty, J., Huang, Y.: Robust Time Delay Estimation exploting Redundancy among Multiple Microphones. IEEE Trans. on SAPÂ 11(6) (2003)
Macho, D., et al.: Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the CHIL Seminar Corpus. In: Proceedings of ICME (2005)
Buchner, H., et al.: Simultaneous Localization of Multiple Sound Sources using Blind Adaptive MIMO Filtering. In: Proc. of ICASSP (2005)
Alvarado, V.: Talker Localization and Optimal Placement of Microphones for a Linear Microphone Array using Stochastic Region Contraction, PhD Thesis, Technical Report LEMS-69, Brown University (1990)
Focken, D., Stiefelhagen, R.: Towards Vision-based 3-d People Tracking in a Smart Room. In: IEEE Int. Conf. Multimodal Interfaces (2002)
Champagne, B., Bedard, S., Stephenne, A.: Performance of Time Delay Estimation in the Presence of Room Reverberation. IEEE Trans. on SAPÂ 4 (1996)
Nishiura, T., Yamada, T., Nakamura, S., Shikano, K.: Localization of Multiple Sound Source based on a CSP analysis with a Microphone Array. In: ICASSP 2000 (2000)
Brutti, A., Omologo, M., Svaizer, P.: Oriented Global Coherence Field for the Estimation of the Head Orientation in Smart Rooms equipped with Distributed Microphone Arrays. In: Proc. of Interspeech (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Omologo, M., Svaizer, P., Brutti, A., Cristoforetti, L. (2006). Speaker Localization in CHIL Lectures: Evaluation Criteria and Results. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_40
Download citation
DOI: https://doi.org/10.1007/11677482_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32549-9
Online ISBN: 978-3-540-32550-5
eBook Packages: Computer ScienceComputer Science (R0)