Skip to main content

Speaker Diarization for Conference Room: The UPC RT07s Evaluation System

  • Conference paper
Multimodal Technologies for Perception of Humans (RT 2007, CLEAR 2007)

Abstract

In this paper the authors present the UPC speaker diarization system for the NIST Rich Transcription Evaluation (RT07s) [1] conducted on the conference environment. The presented system is based on the ICSI RT06s system, which employs agglomerative clustering with a modified Bayesian Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clusters [2]. This is the first participation of the UPC in the RT Speaker Diarization Evaluation and the purpose of this work has been the consolidation of a baseline system which can be used in the future for further research in the field of diarization. We have introduced, as prior modules before the diarization system, an Speech/Non-Speech detection module based on a Support Vector Machine from UPC and a Wiener Filtering from an implementation of the QIO front-end. In the speech parameterization a Frequency Filtering (FF) of the filter-bank energies is applied instead the classical Discrete Cosine Transform in the Mel-Cepstrum analysis. In addition, it is introduced a small changes in the complexity selection algorithm and a new post-processing technique which process the shortest clusters at the end of each Viterbi segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. NIST: Rich transcription meeting recognition evaluation plan. RT-07s (2007)

    Google Scholar 

  2. Anguera, X., Wooters, C., Hernando, J.: Robust speaker diarization for meetings: Icsi rt06s evaluation system. In: ICSLP (2006)

    Google Scholar 

  3. Gauvain, J., Lamel, L., Adda, G.: Partitioning and transcription of broadcast news data. In: ICSLP, pp. 1335–1338 (1998)

    Google Scholar 

  4. Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: DARPA BNTU Workshop (1998)

    Google Scholar 

  5. Gish, H., Siu, M., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: ICASSP (1991)

    Google Scholar 

  6. Adami, A., et al.: Qualcomm-icsi-cgi features for asr. In: ICSLP, pp. 21–24 (2002)

    Google Scholar 

  7. Anguera, X.: The acoustic robust beamforming toolkit (2005)

    Google Scholar 

  8. Temko, A., Macho, D., Nadeu, C.: Enhanced SVM Training for Robust Speech Activity Detection. In: Proc. ICCASP (2007)

    Google Scholar 

  9. Nadeu, C., Paches-Leal, P., Juang, B.H.: Filtering the time sequence of spectral parameters for speech recognition. Speech Communication 22, 315–332 (1997)

    Article  Google Scholar 

  10. Flanagan, J., Johnson, J., Kahn, R., Elko, G.: Computer-steered microphone arrays for sound transduction in large rooms. ASAJ 78(5), 1508–1518 (1985)

    Google Scholar 

  11. Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustic, Speech and Signal Processing 24(4), 320–327 (1976)

    Article  Google Scholar 

  12. Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  13. Fung, G., Mangasarian, O.: Proximal Support Vector Machine Classifiers. In: Proc. KDDM, pp. 77–86 (2001)

    Google Scholar 

  14. Lebrun, G., Charrier, C., Cardot, H.: SVM Training Time Reduction using Vector Quantization. In: Proc. ICPR, pp. 160–163 (2004)

    Google Scholar 

  15. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions ASSP (28), 357–366 (1980)

    Google Scholar 

  16. Luque, J., Hernando, J.: Robust Speaker Identification for Meetings: UPC CLEAR-07 Meeting Room Evaluation System. In: The same book (2007)

    Google Scholar 

  17. Nadeu, C., Macho, D., Hernando, J.: Time and Frequency Filtering of Filter-Bank Energies for Robust Speech Recognition. Speech Communication 34, 93–114 (2001)

    Article  MATH  Google Scholar 

  18. Macho, D., Nadeu, C.: On the interaction between time and frequency filterinf of speech parameters for robust speech recognition. In: ICSLP, 1137 (1999)

    Google Scholar 

  19. Anguera, X., Hernando, J., Anguita, J.: Xbic: nueva medida para segmentación de locutor hacia el indexado automático de la señal de voz. JTH, 237–242 (2004)

    Google Scholar 

  20. Nadeu, C., Hernando, J., Gorricho, M.: On the Decorrelation of filter-Bank Energies in Speech Recognition. In: EuroSpeech, vol. 20, p. 417 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luque, J., Anguera, X., Temko, A., Hernando, J. (2008). Speaker Diarization for Conference Room: The UPC RT07s Evaluation System. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68585-2_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68584-5

  • Online ISBN: 978-3-540-68585-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics