Abstract
This paper presents the audio based tracking system designed at FBK-irst laboratories for the CLEAR 2007 evaluation campaign. The tracker relies on the Global Coherence Field theory that has proved to efficiently deal with the foreseen scenarios. Particular emphasis is given to the post-processing of localization hypotheses which guarantees smooth speaker trajectories and is crucial for the overall performance of the system. The system is also equipped with a speech activity detector based on Hidden Markov Models. The performance delivered by the proposed tracker presents a considerable gain with respect to the previous evaluation. An attempt to devise a multimodal tracker based on merging outputs of a video and an audio trackers is also described.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brandstein, M., Adcock, J., Silverman, H.: A closed-form location estimator for use with room environment microphone arrays. IEEE Transactions on Speech and Audio Processing 5(1), 45–50 (1997)
Brandstein, M., Ward, D. (eds.): Microphone Arrays. Springer, Heidelberg (2001)
Capon, J.: High-resolution frequency-wavenumber spectrum analysis. Proceeding of IEEE 57(8), 1408–1418 (1969)
Champagne, B., Bedard, S., Stephenne, A.: Performance of time-delay estimation in the presence of room reverberation. IEEE Transactions on Speech and Audio Processing 4(2), 148–152 (1996)
DiBiase, J.: A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. PhD thesis, Brown University (May 2000)
Doclo, S., Moonen, M.: Robust time-delay estimation in highly adverse acoustic environments. In: Proceeding of IEEE WASPAA, New Platz, NY, USA, October 21-24, 2001, pp. 59–62 (2001)
Abad, A., et al.: Audio person tracking in a smart-room environment. In: Proceedings of Interspeech, Pittsburgh, PA, USA, September 17-21, 2006, pp. 2590–2593 (2006)
Mostefa, D., et al.: The chil audiovisual corpus for lecture and meeting analysis inside smart rooms. Journal for Language Resources and Evaluation (2007)
Antonacci, F., et al.: Tracking multiple acoustic sources using particle filtering. In: Proceedings of the European Signal Processing Conference, Florence, Italy, September 4-8 (2006)
Brunelli, R., et al.: A generative approach to audio-visual person tracking. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 55–68. Springer, Heidelberg (2007)
Griebel, S., Brandstein, M.: Microphone array source localization using realizable delay vectors. In: IEEE WASPAA, New Platz, NY, USA, October 21-24, 2001, pp. 71–74 (2001)
Huang, Y., Benesty, J., Elko, G.: Adaptive eigenvalue decomposition algorithm for real time acoustic source localization system. In: Proceedings of IEEE ICASSP, Phoenix, AZ, USA, March 15-19, 1999, vol. 2, pp. 937–940 (1999)
Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. In: Proceedings of Interspeech, Lisbon, Portugal, September 4-8, 2005, pp. 2289–2292 (2005)
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustic, Speech and Signal Processing 24(4), 320–327 (1976)
Lanz, O., Brunelli, R.: An appearance-based particle filter for visual tracking in smart rooms. In: Second International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006. LNCS, Springer, Heidelberg (2007)
Omologo, M., Svaizer, P.: Use of Crosspower-Spectrum Phase in acoustic event location. IEEE Transactions on Speech and Audio Processing 5(3), 288–292 (1997)
Omologo, M., Svaizer, P., Brutti, A., Cristoforetti, L.: Speaker localization in CHIL lectures: Evaluation criteria and results. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 476–487. Springer, Heidelberg (2006)
Omologo, M., Svaizer, P., DeMori, R.: Spoken Dialogue with Computers, ch. 2. In: Acoustic Transduction, Academic Press, London (1998)
Omologo, M., Svaizer, P.: Use of the crosspower-spectrum phase in acoustic event localization. Technical Report 9303-13, ITC-irst Centro per la Ricerca Scientifica e Tecnologica (1993)
Schau, H., Robinson, A.: Passive source localization employing intersecting spherical surfaces from time-of-arrival differences. IEEE Transaction on Acoustics, Speech and Signal Processing 35(12), 1661–1669 (1987)
Schmidt, R.: A Signal Subspace Approach to Multiple Emitter Location and Spectral Estimation. PhD thesis, Stanford University (1981)
Smith, J., Abel, J.: Closed-form least-square source location estimation from range-difference measurements. IEEE Transaction on Acoustics, Speech and Signal Processing 35(12), 1661–1669 (1987)
Svaizer, P., Matassoni, M., Omologo, M.: Acoustic source location in a three-dimensional space using crosspower spectrum phase. In: Proceedings of IEEE ICASSP, Munich, Germany, April 21-24, 1997, vol. 1, pp. 231–234 (1997)
Zieger, C.: An HMM based system for acoustic event detection. In: Second International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2007. LNCS, vol. 4625. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brutti, A. (2008). A Person Tracking System for CHIL Meetings. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)