Abstract
A large set of spoken language resources (SLR) for various European languages is being compiled in several SpeechDat projects with the aim to train and test speech recognizers for voice driven services, mainly over telephone lines. This paper is focused on the annotation conventions applied for the Speechdat SLR. These SLR contain typical examples of short monologue speech utterances with simple orthographic transcriptions in a hierarchically simple annotation structure. The annotation conventions and their underlying principles are described and compared to approaches used for related SLR. The synchronization of the orthographic transcriptions with the corresponding speech files is addressed, and the impact of the selected approach for capturing specific phonological and phonetic phenomena is discussed. In the SpeechDat projects a number of tools have been developed to carry out the transcription of the speech. In this paper, a short description of these tools and their properties is provided. For all SpeechDat projects, an internal validity check of the databases and their annotations is carried out. The procedure of this validation campaign, the performed evaluations, and some of the results are presented.
Similar content being viewed by others
References
Baum, M., Erbach, G., and Kubin, G. (2000). SpeechDat-AT: A telephone speech database for Austrian German. In Proc.LREC'2000 Satellite Workshop XLDB—Very large Telephone Speech Databases, 29 May 2000, Athens, Greece, pp. 51-56.
Bernstein, J., Taussig, K., and Godfrey, J. (1994). Macrophone: An American English telephone speech corpus for the Polyphone project. Proc. ICASSP-94, Adelaide, pp. 81-83.
Bird, S. and Liberman, M. (1999). A formal framework for linguistic annotation (Technical Report MS-CIS-99-01). Department of Computer and Information Science, University of Pennsylvania.
Bonafonte, A., Moreno, A., Draxler, C., Van den Heuvel, H., and Yli-Hietanen, J. (1998). Annotation tools (SpeechDat Car Technical Report SD3.1.2).
Brugnara, F., Falavigna, D., and Omologo, M. (1993). Automatic segmentation and labeling of speech based on Hidden Markov models. Speech Communication, 12:357-370.
Contantinescu, A., Caloz, G., Draxler, C., Van den Heuvel, H., Sanders, E., Winsky, R., Nataf, A., Chatzi, I., Senia, F., Moreno, A., and Johansen, F. (1997). Report on developed tools (SpeechDat Technical Report SD3.1.2).
Cremelie, N. and Martens, J.P. (1998). In search of pronunciation rules. In “Modeling Pronunciation Variation for Automatic Speech Recognition” Rolduc, pp. 23-27.
Cristoforetti, L., Matassoni, M., Omologo, M., Svaizer, P., and Zovato, E. (2000). Annotation of a multichannel noisy speech corpus. In Proc. of the Second International Conference on Language Resources and Evaluation, Athens, pp. 1547-1550.
Den Os, E.A. den, Boogaart, T.I., Boves, L., and Klabbers, E. (1995). The Dutch Polyphone corpus. In Proc. Eurospeech-95, Madrid, Spain, pp. 825-828.
Draxler, C. (1998). WWWSigTranscribe. A JAVA extension of the WWWTranscribe toolbox. In Proc. of the First International Conference on Language Resources and Evaluation. Granada, Spain, pp. 1313-1316.
Draxler, C. (1999). Specification of database interchange format (SpeechDat-Car Technical Report D1.3.3).
Draxler, C. (2000). Speech databases. In F. Van Eynde and D. Gibbon (Eds.), Lexicon development for Speech and Language Processing. Dordrecht, Boston, London: Kluwer Academic Publishers, pp. 169-204.
Draxler, C., Van den Heuvel, H., and Tropf, H.S. (1998). Speech-Dat experiences in creating large multilingual speech databases Annotation in the SpeechDat Projects 143 for teleservices. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 361-366.
Fonollosa, J.A.R. and Moreno, A. (1998). Automatic database acquisition software for ISDN PC cards and analogue boards. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 1325-1328.
Gibbon, D., Moore, R., and Winski, R. (Eds.) (1997). Handbook of Standards and Resources for Spoken Language Systems. Berlin, New York: Mouton, de Gruyter.
Höge, H., Draxler, C., Heuvel, H. van den, Johansen, F.T., Sanders, E., and Tropf, H.S. (1999). Speechdat multilingual speech databases for teleservices: Across the finish line. In Proc. EUROSPEECH'99, Budapest, Hungary, 5-9 Sept. 1999, pp. 2699-2702.
Kessens, J.M., Strik, H., and Cucchiarini, C. (2000). A bottom-upmethod for obtaining information about pronunciation variation. In Proc. of ICSLP 2000, Beijing, China, pp. 274-277.
Lamel, L., Kassel, R.H., and Seneff, S. (1986). Speech database development: Design and analysis of the acoustic-phonetic corpus. Proc. DARPA Speech Recognition Workshop, pp. 100-109.
Lindberg, B., Comeyne, R., Draxler, C., and Senia, F. (1998). Speaker recruitment methods and speaker coverage. Experiences from a large multilingual speech database collection. In Proc. ICSLP 98, Sydney, pp. 2731-2734.
Mengel, A. and Heid, U. (1999). Enhancing reusability of speech corpora by hyperlinked query output. In Proc. Eurospeech 99, Budapest, pp. 2703-2706.
Moreno, A., Höge, H., Koehler, J., and Marino, J. (1998). Speech-Dat across Latin America. Project SALA. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 367-370.
Nogueiras, A. and Moreno, A. (1998). NaniBD: A set of tools for transcribing and validating speech databases. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 1359-1365.
Omologo, M. and Svaizer, P. (1997). Use of the cross-power spectrum phase in acoustic event location. IEEE Trans. on SAP, 5(3):288-292.
Sala, M., Sanchez, F., Wengelnik, H., Van den Heuvel, H., Moreno, A., Le Chevalier, E., Deregibus, E., and Richard, G. (1999). Speechdat-Car: Speech databases for voice driven teleservices and control of in-car applications. In Proc. EAEC 99, Barcelona, pp. 90-98.
SAM (1992). User guide to ETR tools. SAM: Multi-lingual speech Input/Output Assessment, Methodology and Standardisation. Ref: SAM-UCL-G007.
Senia, F. (1997). Specification of speech database interchange format (SpeechDat Technical Report SD1.3.1).
Senia, F. and Van Velden, J. (1997). Specification of orthographic transcription and lexicon conventions (SpeechDat Technical Report SD1.3.3).
Shriberg, L., Price, P., Garofolo, J., and Fisher, W. (1993). ATIS. SR output (“.sro”) transcription conventions. http://www.ldc.upenn. edu/Catalog/readme files/atis3/sro spec.html.
Strik, H. and Cucchiarini, C. (1999). Modeling pronunciation variationfor ASR: A survey of the literature. Speech Communication, 29:225-246.
Taussig, K. (1997). Macrophone transcription. http://www.ldc. upenn.edu/Catalog/readme files/macrophone/transcrp.html.
Van den Heuvel, H. (1997).Validation criteria (SpeechDat Technical Report SD1.3.3).
Van den Heuvel, H. (1999). Validation criteria (SpeechDat Car Technical Report D1.3.1).
Van den Heuvel, H. (2000a). SLR validation: Evaluation of the SpeechDat approach. In Proc. LREC'2000 Satellite workshop XLDB—Very large Telephone Speech Databases, 29 May 2000, Athens, Greece, pp. 40-45.
Van den Heuvel, H. (2000b). The art of validation. ELRA Newsletter, 5(4):4-6.
Van den Heuvel, H., Bonafonte, A., Boudy, J., Dufour, S., Lockwood, Ph., Moreno, A., and Richard, G. (1999). SpeechDat-Car: Towards a collection of speech databases for automotive environments. In Proc. of the Workshop for Robust Methods for Speech Recognition in Adverse Conditions, Tampere, pp. 135-138.
Van den Heuvel, H., Boudy, J., Comeyne, R., Euler, S., Moreno, A., and Richard, G. (1999). The SpeechDat-Car multiligual speech databases for in-car applications: Some first validation results. In Proc. Eurospeech 99, Budapest, pp. 2279-2282.
Wells, J. (1997). Standards, Assessment, and methods: Phonetic Alphabets. London: University College.
Web References
Speechdat Family: http://www.speechdat.org/
SpeechDat: http://www.speechdat.org/SpeechDat/
SpeechDat Car: http://www.speechdat.org/SP-CAR
SpeechDat East: http://www.fee.vutbr.cz/SPEECHDAT-E/
SALA: http://gps-tsc.upc.es/veu/sala/
ELRA: http://www.icp.inpg.fr/ELRA/home.htm
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
van den Heuvel, H., Boves, L., Moreno, A. et al. Annotation in the SpeechDat Projects. International Journal of Speech Technology 4, 127–143 (2001). https://doi.org/10.1023/A:1011375311203
Issue Date:
DOI: https://doi.org/10.1023/A:1011375311203