Abstract
This paper describes the I2R/NTU system submitted for the NIST Rich Transcription 2007 (RT-07) Meeting Recognition evaluation Multiple Distant Microphone (MDM) task. In our system, speaker turn detection and clustering is done using Direction of Arrival (DOA) information. Purification of the resultant speaker clusters is then done by performing GMM modeling on acoustic features. As a final step, non-speech & silence removal is done. Our system achieved a competitive overall DER of 15.32% for the NIST Rich Transcription 2007 evaluation task.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Spring 2007 (RT-07) Rich Transcription Meeting Recognition Evaluation Plan (2007), http://www.nist.gov/speech/tests/rt/rt2007/docs/rt07-meeting-eval-plan-v2.pdf
Anguera, X., Wooters, C., Peskin, B., Aguilo, M.: Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 402–414. Springer, Heidelberg (2006)
Leeuwen, D.A.v., Huijbregts, M.: The AMI speaker diarization system for NIST RT06s meeting data. In: Proc. NIST Rich Transcription 2006 Spring Meeting Recognition Evaluation Workshop, Washington DC, pp. 371–384 (2006)
Istrate, D., Fredouille, C., Meignier, S., Besacier, L., Bonastre, J.F.: NIST RT’05S Evaluation: Pre-processing Techniques and Speaker Diarization on Multiple Microphone Meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 428–439. Springer, Heidelberg (2006)
Brandstein, M.S., Silverman, H.F.: A robust method for speech signal time-delay estimation in reverberant rooms. In: Proc. International Conference on Acoustics, Speech, and Signal Processing, Munich, pp. 375–378 (1997)
Anguera, X., Wooters, C., Hernando, J.: Speaker Diarization for Multi-Party Meetings Using Acoustic Fusion. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop, San Juan (2005)
Anguera, X., Wooters, C., Pardo, J.: Robust Speaker Diarization for Meetings: ICSI RT06s evaluation system. In: Proc. Interspeech 2006 ICSLP, Pittsburgh (2006)
Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (2002)
Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. In: Proc. International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, pp. 381–384 (1990)
Hirsch, H.G.: Estimation of noise spectrum and its application to SNR-estimation and speech enhancement. Technical report tr-93-012, ICSI, Berkeley (1993)
Brayda, L., Bertotti, C., Cristoforetti, L., Omologo, M., Svaizer, P.: Modifications on NIST MarkIII array to improve coherence properties among input signals. Journal of Audio Engineering Society (2005)
Rochet, C.: Technical Documentation of the Microphone Array Mark III (September 2005), http://www.nist.gov/smartspace/cmaiii.html
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Nwe, T.L., Foo, S.W., Silva, L.C.D.: Stress classification using subband based features. IEICE Trans. Information and Systems E86-D, 565–573 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koh, E.C.W. et al. (2008). Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)