Abstract
In this paper the authors present the UPC speaker diarization system for the NIST Rich Transcription Evaluation (RT07s) [1] conducted on the conference environment. The presented system is based on the ICSI RT06s system, which employs agglomerative clustering with a modified Bayesian Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clusters [2]. This is the first participation of the UPC in the RT Speaker Diarization Evaluation and the purpose of this work has been the consolidation of a baseline system which can be used in the future for further research in the field of diarization. We have introduced, as prior modules before the diarization system, an Speech/Non-Speech detection module based on a Support Vector Machine from UPC and a Wiener Filtering from an implementation of the QIO front-end. In the speech parameterization a Frequency Filtering (FF) of the filter-bank energies is applied instead the classical Discrete Cosine Transform in the Mel-Cepstrum analysis. In addition, it is introduced a small changes in the complexity selection algorithm and a new post-processing technique which process the shortest clusters at the end of each Viterbi segmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
NIST: Rich transcription meeting recognition evaluation plan. RT-07s (2007)
Anguera, X., Wooters, C., Hernando, J.: Robust speaker diarization for meetings: Icsi rt06s evaluation system. In: ICSLP (2006)
Gauvain, J., Lamel, L., Adda, G.: Partitioning and transcription of broadcast news data. In: ICSLP, pp. 1335–1338 (1998)
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: DARPA BNTU Workshop (1998)
Gish, H., Siu, M., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: ICASSP (1991)
Adami, A., et al.: Qualcomm-icsi-cgi features for asr. In: ICSLP, pp. 21–24 (2002)
Anguera, X.: The acoustic robust beamforming toolkit (2005)
Temko, A., Macho, D., Nadeu, C.: Enhanced SVM Training for Robust Speech Activity Detection. In: Proc. ICCASP (2007)
Nadeu, C., Paches-Leal, P., Juang, B.H.: Filtering the time sequence of spectral parameters for speech recognition. Speech Communication 22, 315–332 (1997)
Flanagan, J., Johnson, J., Kahn, R., Elko, G.: Computer-steered microphone arrays for sound transduction in large rooms. ASAJ 78(5), 1508–1518 (1985)
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustic, Speech and Signal Processing 24(4), 320–327 (1976)
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Fung, G., Mangasarian, O.: Proximal Support Vector Machine Classifiers. In: Proc. KDDM, pp. 77–86 (2001)
Lebrun, G., Charrier, C., Cardot, H.: SVM Training Time Reduction using Vector Quantization. In: Proc. ICPR, pp. 160–163 (2004)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions ASSP (28), 357–366 (1980)
Luque, J., Hernando, J.: Robust Speaker Identification for Meetings: UPC CLEAR-07 Meeting Room Evaluation System. In: The same book (2007)
Nadeu, C., Macho, D., Hernando, J.: Time and Frequency Filtering of Filter-Bank Energies for Robust Speech Recognition. Speech Communication 34, 93–114 (2001)
Macho, D., Nadeu, C.: On the interaction between time and frequency filterinf of speech parameters for robust speech recognition. In: ICSLP, 1137 (1999)
Anguera, X., Hernando, J., Anguita, J.: Xbic: nueva medida para segmentación de locutor hacia el indexado automático de la señal de voz. JTH, 237–242 (2004)
Nadeu, C., Hernando, J., Gorricho, M.: On the Decorrelation of filter-Bank Energies in Speech Recognition. In: EuroSpeech, vol. 20, p. 417 (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luque, J., Anguera, X., Temko, A., Hernando, J. (2008). Speaker Diarization for Conference Room: The UPC RT07s Evaluation System. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)