Speaker Diarization for Conference Room: The UPC RT07s Evaluation System

Luque, Jordi; Anguera, Xavier; Temko, Andrey; Hernando, Javier

doi:10.1007/978-3-540-68585-2_50

Jordi Luque¹,
Xavier Anguera²,
Andrey Temko¹ &
…
Javier Hernando¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4625))

Included in the following conference series:

Abstract

In this paper the authors present the UPC speaker diarization system for the NIST Rich Transcription Evaluation (RT07s) [1] conducted on the conference environment. The presented system is based on the ICSI RT06s system, which employs agglomerative clustering with a modified Bayesian Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clusters [2]. This is the first participation of the UPC in the RT Speaker Diarization Evaluation and the purpose of this work has been the consolidation of a baseline system which can be used in the future for further research in the field of diarization. We have introduced, as prior modules before the diarization system, an Speech/Non-Speech detection module based on a Support Vector Machine from UPC and a Wiener Filtering from an implementation of the QIO front-end. In the speech parameterization a Frequency Filtering (FF) of the filter-bank energies is applied instead the classical Discrete Cosine Transform in the Mel-Cepstrum analysis. In addition, it is introduced a small changes in the complexity selection algorithm and a new post-processing technique which process the shortest clusters at the end of each Viterbi segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The use of long-term features for GMM- and i-vector-based speaker diarization systems

Article Open access 26 September 2018

Speaker Diarization in Vietnamese Voice

Ensemble of Incremental System Enhancements for Robust Speaker Diarization in Code-Switched Real-Life Audios

References

NIST: Rich transcription meeting recognition evaluation plan. RT-07s (2007)
Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Robust speaker diarization for meetings: Icsi rt06s evaluation system. In: ICSLP (2006)
Google Scholar
Gauvain, J., Lamel, L., Adda, G.: Partitioning and transcription of broadcast news data. In: ICSLP, pp. 1335–1338 (1998)
Google Scholar
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: DARPA BNTU Workshop (1998)
Google Scholar
Gish, H., Siu, M., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: ICASSP (1991)
Google Scholar
Adami, A., et al.: Qualcomm-icsi-cgi features for asr. In: ICSLP, pp. 21–24 (2002)
Google Scholar
Anguera, X.: The acoustic robust beamforming toolkit (2005)
Google Scholar
Temko, A., Macho, D., Nadeu, C.: Enhanced SVM Training for Robust Speech Activity Detection. In: Proc. ICCASP (2007)
Google Scholar
Nadeu, C., Paches-Leal, P., Juang, B.H.: Filtering the time sequence of spectral parameters for speech recognition. Speech Communication 22, 315–332 (1997)
Article Google Scholar
Flanagan, J., Johnson, J., Kahn, R., Elko, G.: Computer-steered microphone arrays for sound transduction in large rooms. ASAJ 78(5), 1508–1518 (1985)
Google Scholar
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustic, Speech and Signal Processing 24(4), 320–327 (1976)
Article Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Fung, G., Mangasarian, O.: Proximal Support Vector Machine Classifiers. In: Proc. KDDM, pp. 77–86 (2001)
Google Scholar
Lebrun, G., Charrier, C., Cardot, H.: SVM Training Time Reduction using Vector Quantization. In: Proc. ICPR, pp. 160–163 (2004)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions ASSP (28), 357–366 (1980)
Google Scholar
Luque, J., Hernando, J.: Robust Speaker Identification for Meetings: UPC CLEAR-07 Meeting Room Evaluation System. In: The same book (2007)
Google Scholar
Nadeu, C., Macho, D., Hernando, J.: Time and Frequency Filtering of Filter-Bank Energies for Robust Speech Recognition. Speech Communication 34, 93–114 (2001)
Article MATH Google Scholar
Macho, D., Nadeu, C.: On the interaction between time and frequency filterinf of speech parameters for robust speech recognition. In: ICSLP, 1137 (1999)
Google Scholar
Anguera, X., Hernando, J., Anguita, J.: Xbic: nueva medida para segmentación de locutor hacia el indexado automático de la señal de voz. JTH, 237–242 (2004)
Google Scholar
Nadeu, C., Hernando, J., Gorricho, M.: On the Decorrelation of filter-Bank Energies in Speech Recognition. In: EuroSpeech, vol. 20, p. 417 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Catalonia (UPC), Jordi Girona, 1-3 D5, 08034, Barcelona, Spain
Jordi Luque, Andrey Temko & Javier Hernando
Multilinguism group, Telefónica I+D, 08021, Barcelona, Spain
Xavier Anguera

Authors

Jordi Luque
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Anguera
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Temko
View author publications
You can also search for this author in PubMed Google Scholar
Javier Hernando
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luque, J., Anguera, X., Temko, A., Hernando, J. (2008). Speaker Diarization for Conference Room: The UPC RT07s Evaluation System. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_50

Download citation

DOI: https://doi.org/10.1007/978-3-540-68585-2_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics