Annotation in the SpeechDat Projects

van den Heuvel, Henk; Boves, Louis; Moreno, Asuncion; Omologo, Maurizio; Richard, Gaël; Sanders, Eric

doi:10.1023/A:1011375311203

Annotation in the SpeechDat Projects

Published: June 2001

Volume 4, pages 127–143, (2001)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Henk van den Heuvel¹,
Louis Boves¹,
Asuncion Moreno²,
Maurizio Omologo³,
Gaël Richard⁴ &
…
Eric Sanders¹

50 Accesses
9 Citations
Explore all metrics

Abstract

A large set of spoken language resources (SLR) for various European languages is being compiled in several SpeechDat projects with the aim to train and test speech recognizers for voice driven services, mainly over telephone lines. This paper is focused on the annotation conventions applied for the Speechdat SLR. These SLR contain typical examples of short monologue speech utterances with simple orthographic transcriptions in a hierarchically simple annotation structure. The annotation conventions and their underlying principles are described and compared to approaches used for related SLR. The synchronization of the orthographic transcriptions with the corresponding speech files is addressed, and the impact of the selected approach for capturing specific phonological and phonetic phenomena is discussed. In the SpeechDat projects a number of tools have been developed to carry out the transcription of the speech. In this paper, a short description of these tools and their properties is provided. For all SpeechDat projects, an internal validity check of the databases and their annotations is carried out. The procedure of this validation campaign, the performed evaluations, and some of the results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Baum, M., Erbach, G., and Kubin, G. (2000). SpeechDat-AT: A telephone speech database for Austrian German. In Proc.LREC'2000 Satellite Workshop XLDB—Very large Telephone Speech Databases, 29 May 2000, Athens, Greece, pp. 51-56.
Bernstein, J., Taussig, K., and Godfrey, J. (1994). Macrophone: An American English telephone speech corpus for the Polyphone project. Proc. ICASSP-94, Adelaide, pp. 81-83.
Bird, S. and Liberman, M. (1999). A formal framework for linguistic annotation (Technical Report MS-CIS-99-01). Department of Computer and Information Science, University of Pennsylvania.
Bonafonte, A., Moreno, A., Draxler, C., Van den Heuvel, H., and Yli-Hietanen, J. (1998). Annotation tools (SpeechDat Car Technical Report SD3.1.2).
Brugnara, F., Falavigna, D., and Omologo, M. (1993). Automatic segmentation and labeling of speech based on Hidden Markov models. Speech Communication, 12:357-370.
Google Scholar
Contantinescu, A., Caloz, G., Draxler, C., Van den Heuvel, H., Sanders, E., Winsky, R., Nataf, A., Chatzi, I., Senia, F., Moreno, A., and Johansen, F. (1997). Report on developed tools (SpeechDat Technical Report SD3.1.2).
Cremelie, N. and Martens, J.P. (1998). In search of pronunciation rules. In “Modeling Pronunciation Variation for Automatic Speech Recognition” Rolduc, pp. 23-27.
Cristoforetti, L., Matassoni, M., Omologo, M., Svaizer, P., and Zovato, E. (2000). Annotation of a multichannel noisy speech corpus. In Proc. of the Second International Conference on Language Resources and Evaluation, Athens, pp. 1547-1550.
Den Os, E.A. den, Boogaart, T.I., Boves, L., and Klabbers, E. (1995). The Dutch Polyphone corpus. In Proc. Eurospeech-95, Madrid, Spain, pp. 825-828.
Draxler, C. (1998). WWWSigTranscribe. A JAVA extension of the WWWTranscribe toolbox. In Proc. of the First International Conference on Language Resources and Evaluation. Granada, Spain, pp. 1313-1316.
Draxler, C. (1999). Specification of database interchange format (SpeechDat-Car Technical Report D1.3.3).
Draxler, C. (2000). Speech databases. In F. Van Eynde and D. Gibbon (Eds.), Lexicon development for Speech and Language Processing. Dordrecht, Boston, London: Kluwer Academic Publishers, pp. 169-204.
Google Scholar
Draxler, C., Van den Heuvel, H., and Tropf, H.S. (1998). Speech-Dat experiences in creating large multilingual speech databases Annotation in the SpeechDat Projects 143 for teleservices. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 361-366.
Fonollosa, J.A.R. and Moreno, A. (1998). Automatic database acquisition software for ISDN PC cards and analogue boards. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 1325-1328.
Gibbon, D., Moore, R., and Winski, R. (Eds.) (1997). Handbook of Standards and Resources for Spoken Language Systems. Berlin, New York: Mouton, de Gruyter.
Google Scholar
Höge, H., Draxler, C., Heuvel, H. van den, Johansen, F.T., Sanders, E., and Tropf, H.S. (1999). Speechdat multilingual speech databases for teleservices: Across the finish line. In Proc. EUROSPEECH'99, Budapest, Hungary, 5-9 Sept. 1999, pp. 2699-2702.
Google Scholar
Kessens, J.M., Strik, H., and Cucchiarini, C. (2000). A bottom-upmethod for obtaining information about pronunciation variation. In Proc. of ICSLP 2000, Beijing, China, pp. 274-277.
Lamel, L., Kassel, R.H., and Seneff, S. (1986). Speech database development: Design and analysis of the acoustic-phonetic corpus. Proc. DARPA Speech Recognition Workshop, pp. 100-109.
Lindberg, B., Comeyne, R., Draxler, C., and Senia, F. (1998). Speaker recruitment methods and speaker coverage. Experiences from a large multilingual speech database collection. In Proc. ICSLP 98, Sydney, pp. 2731-2734.
Mengel, A. and Heid, U. (1999). Enhancing reusability of speech corpora by hyperlinked query output. In Proc. Eurospeech 99, Budapest, pp. 2703-2706.
Moreno, A., Höge, H., Koehler, J., and Marino, J. (1998). Speech-Dat across Latin America. Project SALA. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 367-370.
Nogueiras, A. and Moreno, A. (1998). NaniBD: A set of tools for transcribing and validating speech databases. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 1359-1365.
Omologo, M. and Svaizer, P. (1997). Use of the cross-power spectrum phase in acoustic event location. IEEE Trans. on SAP, 5(3):288-292.
Google Scholar
Sala, M., Sanchez, F., Wengelnik, H., Van den Heuvel, H., Moreno, A., Le Chevalier, E., Deregibus, E., and Richard, G. (1999). Speechdat-Car: Speech databases for voice driven teleservices and control of in-car applications. In Proc. EAEC 99, Barcelona, pp. 90-98.
SAM (1992). User guide to ETR tools. SAM: Multi-lingual speech Input/Output Assessment, Methodology and Standardisation. Ref: SAM-UCL-G007.
Senia, F. (1997). Specification of speech database interchange format (SpeechDat Technical Report SD1.3.1).
Senia, F. and Van Velden, J. (1997). Specification of orthographic transcription and lexicon conventions (SpeechDat Technical Report SD1.3.3).
Shriberg, L., Price, P., Garofolo, J., and Fisher, W. (1993). ATIS. SR output (“.sro”) transcription conventions. http://www.ldc.upenn. edu/Catalog/readme files/atis3/sro spec.html.
Strik, H. and Cucchiarini, C. (1999). Modeling pronunciation variationfor ASR: A survey of the literature. Speech Communication, 29:225-246.
Google Scholar
Taussig, K. (1997). Macrophone transcription. http://www.ldc. upenn.edu/Catalog/readme files/macrophone/transcrp.html.
Van den Heuvel, H. (1997).Validation criteria (SpeechDat Technical Report SD1.3.3).
Van den Heuvel, H. (1999). Validation criteria (SpeechDat Car Technical Report D1.3.1).
Van den Heuvel, H. (2000a). SLR validation: Evaluation of the SpeechDat approach. In Proc. LREC'2000 Satellite workshop XLDB—Very large Telephone Speech Databases, 29 May 2000, Athens, Greece, pp. 40-45.
Van den Heuvel, H. (2000b). The art of validation. ELRA Newsletter, 5(4):4-6.
Google Scholar
Van den Heuvel, H., Bonafonte, A., Boudy, J., Dufour, S., Lockwood, Ph., Moreno, A., and Richard, G. (1999). SpeechDat-Car: Towards a collection of speech databases for automotive environments. In Proc. of the Workshop for Robust Methods for Speech Recognition in Adverse Conditions, Tampere, pp. 135-138.
Van den Heuvel, H., Boudy, J., Comeyne, R., Euler, S., Moreno, A., and Richard, G. (1999). The SpeechDat-Car multiligual speech databases for in-car applications: Some first validation results. In Proc. Eurospeech 99, Budapest, pp. 2279-2282.
Wells, J. (1997). Standards, Assessment, and methods: Phonetic Alphabets. London: University College.
Google Scholar

Web References

Speechdat Family: http://www.speechdat.org/
SpeechDat: http://www.speechdat.org/SpeechDat/
SpeechDat Car: http://www.speechdat.org/SP-CAR
SpeechDat East: http://www.fee.vutbr.cz/SPEECHDAT-E/
SALA: http://gps-tsc.upc.es/veu/sala/
ELRA: http://www.icp.inpg.fr/ELRA/home.htm

Download references

Author information

Authors and Affiliations

SPEX, A2RT, University of Nijmegen, The Netherlands
Henk van den Heuvel, Louis Boves & Eric Sanders
UPC, Barcelona, Spain
Asuncion Moreno
IRST, Trento, Italy
Maurizio Omologo
Philips Consumer Communications, Montrouge, France
Gaël Richard

Authors

Henk van den Heuvel
View author publications
You can also search for this author in PubMed Google Scholar
Louis Boves
View author publications
You can also search for this author in PubMed Google Scholar
Asuncion Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Omologo
View author publications
You can also search for this author in PubMed Google Scholar
Gaël Richard
View author publications
You can also search for this author in PubMed Google Scholar
Eric Sanders
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

van den Heuvel, H., Boves, L., Moreno, A. et al. Annotation in the SpeechDat Projects. International Journal of Speech Technology 4, 127–143 (2001). https://doi.org/10.1023/A:1011375311203

Download citation

Issue Date: June 2001
DOI: https://doi.org/10.1023/A:1011375311203

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Annotation in the SpeechDat Projects

Abstract

Access this article

Similar content being viewed by others

ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata

The SPPAS Participation to the Forced-Alignment Task of Evalita 2011

Case Study: The AusTalk Corpus

References

Web References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Annotation in the SpeechDat Projects

Abstract

Access this article

Similar content being viewed by others

ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata

The SPPAS Participation to the Forced-Alignment Task of Evalita 2011

Case Study: The AusTalk Corpus

References

Web References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation