The 2006 Athens Information Technology Speech Activity Detection and Speaker Diarization Systems

Rentzeperis, Elias; Stergiou, Andreas; Boukis, Christos; Pnevmatikakis, Aristodemos; Polymenakos, Lazaros C.

doi:10.1007/11965152_34

Elias Rentzeperis¹⁹,
Andreas Stergiou¹⁹,
Christos Boukis¹⁹,
Aristodemos Pnevmatikakis¹⁹ &
…
Lazaros C. Polymenakos¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

767 Accesses
6 Citations

Abstract

This paper describes the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) systems that were developed by the Athens Information Technology in the scope of the NIST RT-06S evaluations. The SAD system performs classification of recorded frames into speech and non-speech, using Linear Discriminant Analysis (LDA), while the SPKR one initially segments recordings into speech intervals based on the Bayesian Information Criterion (BIC), and then applies a two-step clustering strategy to group segments from the same speaker together. Following a discussion of the intrinsics of the two systems, we report and comment on our results on the RT-06S corpus [20].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Weiser, M.: The Computer for the 21st Century. Scientific American 265(3), 66–75 (1991)
Article Google Scholar
Waibel, A., Steusloff, H., Stiefelhagen, R., et al.: CHIL: Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Lisbon, Portugal (April 2004)
Google Scholar
Pnevmatikakis, A., Talantzis, F., Soldatos, J., Polymenakos, L.: Robust Multimodal Audio-Visual Processing for Advanced Context Awareness in Smart Spaces. In: Maglogiannis, I., Karpouzis, K., Bramer, M. (eds.) Artificial Intelligence Applications and Innovations (AIAI 2006), pp. 290–301. Springer, Heidelberg (2006)
Chapter Google Scholar
http://www.clear-evaluation.org/
Katsarakis, N., Souretis, G., Talantzis, F., Pnevmatikakis, A., Polymenakos, L.: 3D Audiovisual Person Tracking Using Kalman Filtering and Information Theory. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 45–54. Springer, Heidelberg (2007)
Chapter Google Scholar
Stergiou, A., Pnevmatikakis, A., Polymenakos, L.: A Decision Fusion System across Time and Classifiers for Audio-visual Person Identification. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122. Springer, Heidelberg (2007)
Chapter Google Scholar
Stergiou, A., Pnevmatikakis, A., Polymenakos, L.: Enhancing the Performance of a GMM-based Speaker Identification System in a Multi-Microphone Setup. In: INTERSPEECH 2006, Pittsburgh (accepted, September 2006)
Google Scholar
Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal 54, 297 (1975)
Google Scholar
Li, K., Swamy, N.S., Ahmad, M.O.: An Improved Voice Activity Detection Using Higher Order Statistics. IEEE Transactions on Speech and Audio Processing 13(5) (September 2005)
Google Scholar
Stegmann, J., Schroeder, G.: Robust Voice Activity Detection Based on the Wavelet Transform. In: Proc. IEEE Workshop on Speech Coding For Telecommunications, Pocono Manor, Pennsylvania, USA, pp. 99–100 (September 1997)
Google Scholar
Reynolds, D.A., Rose, R.C., Smith, M.J.T.: PC-Based TMS320C30 Implementation of the Gaussian Mixture Model Text-Independent Speaker Recognition System. In: International Conference on Signal Processing Applications and Technology, Hyatt Regency, Cambridge, Massachusetts, pp. 967–973 (November 1992)
Google Scholar
Martin, A., Charlet, C., Mauary, L.: Robust Speech/Non- Speech Detection Using LDA Applied to MFCC. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City (2001)
Google Scholar
Duda, R., Hart, R., Stork, D.: Pattern Classification. Wiley-Interscience, New York (2001)
MATH Google Scholar
Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice Hall Series in Signal Processing (September 1978)
Google Scholar
Wu, T.-Y., Lu, L., Chen, K., Zhang, H.-J.: Universal Background Models for Real-Time Speaker Change Detection. In: MMM 2003, pp. 135–149 (2003)
Google Scholar
Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.-F.: The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada (2004)
Google Scholar
Gauvain, J.L., Lamel, L., Adda, G.: Partitioning and transcription of broadcast news data. In: International Conference on Speech and Language Processing, Sydney, Australia, vol. 4, pp. 1335–1338 (December 1998)
Google Scholar
Tritschler, A., Gopinath, R.: Improved speaker segmentation and segments clustering using the Bayesian Information Criterion. In: Proc. of Eurospeech, pp. 679–682 (1999)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)
Article Google Scholar
Fiscus, J.: Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation Plan (v2) (2006), http://www.nist.gov/speech/tests/rt/rt2006/spring/docs/rt06s-meeting-eval-plan-V2.pdf

Download references

Author information

Authors and Affiliations

Autonomic & Grid Computing Group, Athens Information Technology, Athens, Greece
Elias Rentzeperis, Andreas Stergiou, Christos Boukis, Aristodemos Pnevmatikakis & Lazaros C. Polymenakos

Authors

Elias Rentzeperis
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Stergiou
View author publications
You can also search for this author in PubMed Google Scholar
Christos Boukis
View author publications
You can also search for this author in PubMed Google Scholar
Aristodemos Pnevmatikakis
View author publications
You can also search for this author in PubMed Google Scholar
Lazaros C. Polymenakos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rentzeperis, E., Stergiou, A., Boukis, C., Pnevmatikakis, A., Polymenakos, L.C. (2006). The 2006 Athens Information Technology Speech Activity Detection and Speaker Diarization Systems. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_34

Download citation

DOI: https://doi.org/10.1007/11965152_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics